This function samples target variable according to the logiistic model with interactions

get_target_interactions(kmer_dat, zero_weight = NULL, binary = TRUE)

Arguments

kmer_dat

output of generate_kmer_data

zero_weight

a single value denoting the weight of no-motifs case. If NULL, then we sample the weight from the uniform distribution on the [-2, -1] interval. Default to NULL.

binary

logical, indicating whether the produced target variable should be binary or continuous.

Value

a binary vector of target variable sampled based on interaction model and provided/calculated probabilities.

Details

approach is based on logistic regression with interactions indicating that the effect of one predictor depends on the value of another predictor. Let's define maximum number of motifs per sequence \(k = \max\lbrace k_i, i = 1, \ldots, n\rbrace\). Let \(w_{1}, \ldots, w_{k}\) denote weights of single effects. Namely:

\(g(EY) = w_0 + \sum_{i = 1}^{k} w_{i} X_{m_i} + \left(\sum_{i = 1}^{k-1}\sum_{j = i + 1}^{k} w_{ij} X_{m_i}X_{m_j}\right) + \ldots + w_{1\ldots k} X_{m_1}\ldots X_{m_k}\)

In the case when probs is NULL we calculate the probabilities based on the formula \( exp(x_i)/(1 + exp(x_i))\) where \(x_i\) denotes the number of motifs in ith sequence.

Examples

n_seq <- 20
sequence_length <- 20
alph <- letters[1:4]
motifs <- generate_motifs(alph, 4, 4, 4, 6)
results <- generate_kmer_data(n_seq, sequence_length, alph,
                              motifs, n_injections = 4)
get_target_interactions(results)
#>  [1] 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0