This function samples target variable according to the logistic model with additive impact

get_target_additive(
  kmer_dat,
  weights = NULL,
  zero_weight = NULL,
  binary = TRUE
)

Arguments

kmer_dat

output of generate_kmer_data

weights

a vector of weights of motifs' impact on the outcome. The length of weights should be the same as the number of motifs provided during sequences generation (it is the motifs parameter in the generate_kmer_data function). If weights parameter is NULL, then weights will be sampled from the uniform distribution on 0-1 interval. The probability of success for target sampling will be calculated based on the formula provided in details section. Default to NULL.

zero_weight

a single value denoting the weight of no-motifs case. If NULL, then we sample the weight from the uniform distribution on the [-2, -1] interval. Default to NULL.

binary

logical, indicating whether the produced target variable should be binary or continuous.

Value

a binary vector of target variable sampled based on additive model.

Details

This function assumes the following additive binomial model:

\(g(EY) = w_0 + w_1 X_{m_1} + w_2 X_{m_2} + \ldots + w_m X_{m_m}\)

where \(w_1, \ldots, w_m\) are weights related to motifs.

In the case when weights is NULL we calculate the probabilities based on the formula

\( exp(1 + x_i)/(1 + exp(1 + x_i))\) where xi denotes the sum of weights of motifs occurring in ith sequence.

Examples

n_seq <- 20
sequence_length <- 20
alph <- letters[1:4]
motifs <- generate_motifs(alph, 4, 4, 4, 6)
results <- generate_kmer_data(n_seq, sequence_length, alph,
                              motifs, n_injections = 4)
get_target_additive(results)
#>  [1] 0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 1