Skip to contents

Kruskal-Wallis rank sum test filter calling stats::kruskal.test().

The filter value is -log10(p) where p is the \(p\)-value. This transformation is necessary to ensure numerical stability for very small \(p\)-values.

References

For a benchmark of filter methods:

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi:10.1016/j.csda.2019.106839 .

Super class

mlr3filters::Filter -> FilterKruskalTest

Methods

Inherited methods


Method new()

Create a FilterKruskalTest object.

Usage


Method clone()

The objects of this class are cloneable with this method.

Usage

FilterKruskalTest$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

task = mlr3::tsk("iris")
filter = flt("kruskal_test")
filter$calculate(task)
as.data.table(filter)
#>         feature    score
#> 1:  Petal.Width 28.48654
#> 2: Petal.Length 28.31840
#> 3: Sepal.Length 21.04970
#> 4:  Sepal.Width 13.80430

# transform to p-value
10^(-filter$scores)
#>  Petal.Width Petal.Length Sepal.Length  Sepal.Width 
#> 3.261796e-29 4.803974e-29 8.918734e-22 1.569282e-14 

if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) {
  library("mlr3pipelines")
  task = mlr3::tsk("spam")

  # Note: `filter.frac` is selected randomly and should be tuned.

  graph = po("filter", filter = flt("kruskal_test"), filter.frac = 0.5) %>>%
    po("learner", mlr3::lrn("classif.rpart"))

  graph$train(task)
}
#> $classif.rpart.output
#> NULL
#>