Skip to contents

Simple correlation filter calling stats::cor(). The filter score is the absolute value of the correlation.

References

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839 .

Super class

mlr3filters::Filter -> FilterCorrelation

Methods

Inherited methods


Method new()

Create a FilterCorrelation object.

Usage


Method clone()

The objects of this class are cloneable with this method.

Usage

FilterCorrelation$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

## Pearson (default)
task = mlr3::tsk("mtcars")
filter = flt("correlation")
filter$calculate(task)
as.data.table(filter)
#>     feature     score
#>  1:      wt 0.8676594
#>  2:     cyl 0.8521620
#>  3:    disp 0.8475514
#>  4:      hp 0.7761684
#>  5:    drat 0.6811719
#>  6:      vs 0.6640389
#>  7:      am 0.5998324
#>  8:    carb 0.5509251
#>  9:    gear 0.4802848
#> 10:    qsec 0.4186840

## Spearman
filter = FilterCorrelation$new()
filter$param_set$values = list("method" = "spearman")
filter$calculate(task)
as.data.table(filter)
#>     feature     score
#>  1:     cyl 0.9108013
#>  2:    disp 0.9088824
#>  3:      hp 0.8946646
#>  4:      wt 0.8864220
#>  5:      vs 0.7065968
#>  6:    carb 0.6574976
#>  7:    drat 0.6514555
#>  8:      am 0.5620057
#>  9:    gear 0.5427816
#> 10:    qsec 0.4669358
if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("boston_housing")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("correlation"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("regr.rpart"))

  graph$train(task)
}
#> $regr.rpart.output
#> NULL
#>