Base class for filters. Predefined filters are stored in the dictionary mlr_filters. A Filter calculates a score for each feature of a task. Important features get a large value and unimportant features get a small value. Note that filter scores may also be negative.
Details
Some features support partial scoring of the feature set:
If nfeat
is not NULL
, only the best nfeat
features are guaranteed to
get a score. Additional features may be ignored for computational reasons,
and then get a score value of NA
.
See also
Other Filter:
mlr_filters
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_boruta
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmi
,
mlr_filters_jmim
,
mlr_filters_kruskal_test
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_univariate_cox
,
mlr_filters_variance
Public fields
id
(
character(1)
)
Identifier of the object. Used in tables, plot and text output.label
(
character(1)
)
Label for this object. Can be used in tables, plot and text output instead of the ID.task_types
(
character()
)
Set of supported task types, e.g."classif"
or"regr"
. Can be set to the scalar valueNA
to allow any task type.For a complete list of possible task types (depending on the loaded packages), see
mlr_reflections$task_types$type
.task_properties
(
character()
)
mlr3::Tasktask properties.param_set
(paradox::ParamSet)
Set of hyperparameters.feature_types
(
character()
)
Feature types of the filter.packages
(
character()
)
Packages which this filter is relying on.man
(
character(1)
)
String in the format[pkg]::[topic]
pointing to a manual page for this object. Defaults toNA
, but can be set by child classes.scores
Stores the calculated filter score values as named numeric vector. The vector is sorted in decreasing order with possible
NA
values last. The more important the feature, the higher the score. Tied values (this includesNA
values) appear in a random, non-deterministic order.
Active bindings
properties
(
character()
)
Properties of the filter. Currently, only"missings"
is supported. A filter has the property"missings"
, iff the filter can handle missing values in the features in a graceful way. Otherwise, an assertion is thrown if missing values are detected.hash
(
character(1)
)
Hash (unique identifier) for this object.phash
(
character(1)
)
Hash (unique identifier) for this partial object, excluding some components which are varied systematically during tuning (parameter values) or feature selection (feature names).
Methods
Method new()
Create a Filter object.
Arguments
id
(
character(1)
)
Identifier for the filter.task_types
(
character()
)
Types of the task the filter can operator on. E.g.,"classif"
or"regr"
. Can be set to scalarNA
to allow any task type.task_properties
(
character()
)
Required task properties, see mlr3::Task. Must be a subset ofmlr_reflections$task_properties
.param_set
(paradox::ParamSet)
Set of hyperparameters.feature_types
(
character()
)
Feature types the filter operates on. Must be a subset ofmlr_reflections$task_feature_types
.packages
(
character()
)
Set of required packages. Note that these packages will be loaded viarequireNamespace()
, and are not attached.label
(
character(1)
)
Label for the new instance.man
(
character(1)
)
String in the format[pkg]::[topic]
pointing to a manual page for this object. The referenced help package can be opened via method$help()
.
Method calculate()
Calculates the filter score values for the provided mlr3::Task and
stores them in field scores
. nfeat
determines the minimum number of
features to score (see details), and defaults to the number
of features in task
. Loads required packages and then calls
private$.calculate()
of the respective subclass.
This private method is is expected to return a numeric vector, uniquely named
with (a subset of) feature names. The returned vector may have missing
values.
Features with missing values as well as features with no calculated
score are automatically ranked last, in a random order.
If the task has no rows, each feature gets the score NA
.
Arguments
task
(mlr3::Task)
mlr3::Task to calculate the filter scores for.nfeat
(
integer()
)
The minimum number of features to calculate filter scores for.