Conditional mutual information based feature selection filter
calling praznik::MIM()
in package praznik.
This filter supports partial scoring (see Filter).
Details
As the scores calculated by the praznik package are not monotone due
to the greedy forward fashion, the returned scores simply reflect the selection order:
1
, (k-1)/k
, ..., 1/k
where k
is the number of selected features.
Threading is disabled by default (hyperparameter threads
is set to 1).
Set to a number >= 2
to enable threading, or to 0
for auto-detecting the number
of available cores.
References
Kursa MB (2021). “Praznik: High performance information-based feature selection.” SoftwareX, 16, 100819. doi:10.1016/j.softx.2021.100819 .
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839 .
See also
PipeOpFilter for filter-based feature selection.
Other Filter:
Filter
,
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
Super class
mlr3filters::Filter
-> FilterMIM
Examples
if (requireNamespace("praznik")) {
task = mlr3::tsk("iris")
filter = flt("mim")
filter$calculate(task, nfeat = 2)
as.data.table(filter)
}
#> feature score
#> <char> <num>
#> 1: Petal.Width 1
#> 2: Petal.Length 0
#> 3: Sepal.Length NA
#> 4: Sepal.Width NA
if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart", "praznik"), quietly = TRUE)) {
library("mlr3pipelines")
task = mlr3::tsk("spam")
# Note: `filter.frac` is selected randomly and should be tuned.
graph = po("filter", filter = flt("mim"), filter.frac = 0.5) %>>%
po("learner", mlr3::lrn("classif.rpart"))
graph$train(task)
}
#> $classif.rpart.output
#> NULL
#>