This function is a wrapper over Fast Correlation Based Filter for k-mer data.

filter_fcbf(target, kmers, thresh = 0.25)

Arguments

target

a numeric response variable

kmers

a matrix of kmers with named columns or an object obtained via generate_kmer_data function.

thresh

a threshold for symmetrical uncertainty between a k-mer and a target variable. Default to 0.25.

Value

a character vector of names of selected kmers

Details

This function uses fcbf

Examples

n_seq <- 20
sequence_length <- 20
alph <- letters[1:4]
motifs <- generate_motifs(alph, 4, 4, 4, 6)
kmers <- generate_kmer_data(n_seq, sequence_length, alph,
                            motifs, n_injections = 4)
target <- get_target_additive(kmers)
filter_fcbf(target, kmers)
#> [1] "Number of features features =  15214"
#> [1] "Number of prospective features =  89"
#> [1] "Number of final features =  19"
#>  [1] "c.a.b_5.1"     "d.b.c_2.3"     "a.c.a_0.1"     "d.d.a_0.4"    
#>  [5] "c.a.d_3.2"     "b.d.a_4.1"     "b.c.c_1.1"     "c.b.c_1.3"    
#>  [9] "a.a.c_2.3"     "c.d.d.b_0.0.0" "a.b.d.b_2.4.0" "c.a.a.b_0.0.5"
#> [13] "c.d.d_6.0"     "d.d.d_0.1"     "d.a.a_2.1"     "d.a.a_4.1"    
#> [17] "b.b.d_2.0"     "c.d.c_4.1"     "b.d.a_1.3"