This function samples target variable according to the logic regression model (assuming that the occurrence of certain combinations of motifs affects the feature). In the case of logical models, simulating a binary variable involves defining logical conditions that determine the variable's value based on motifs, e.g., the binary variable takes the value 1 if certain l ogical criteria are met, and 0 if these criteria are not met.

  random = TRUE,
  zero_weight = NULL,
  weights = NULL,
  n_exp = NULL,
  max_exp_depth = NULL,
  expressions = NULL,
  binary = TRUE



output of generate_kmer_data


a logical. Indicating whether expressions have to be generated randomly. Default to TRUE.


a single value denoting the weight of no-motifs case. If NULL, then we sample the weight from the uniform distribution on the [-2, -1] interval. Default to NULL.


a vector of weights of considered logic expression based on available motifs. The length of weights should be the same as the provided number of expressions to use n_exp. If weights parameter is NULL, then weights will be sampled from the uniform distribution on 0-1 interval. The probability of success for target sampling will be calculated based on the formula provided in details section. Default to NULL.


number of random logic expressions to create. It is used only when random equals TRUE.


a maximum number of motifs used in a logic expression. Default to 3.


a matrix of binary variables corresponding to custom logic expressions. You can create them based on motifs. It's dimension should be related to the length of weights vector if it's provided. Default to NULL. If NULL, random logic expressions will be created.


logical, indicating whether the produced target variable should be binary or continuous.


Here, we consider new variables, \(L_1, \ldots, L_l\) where each of them is a logic expression based on a subset of motifs \(m_1, \ldots, m_m\). For example,

\(L_1(m_1, m_2, m_3) = (X_{m_1} \land X_{m_2}) \lor X_{m_3}.\)

Each variable \(L_i\) obtains its own weight in the model. Our model is following:

\(g(EY) = w_0 + \sum_{i = 1}^{l} w_i L_i.\)


n_seq <- 20
sequence_length <- 20
alph <- letters[1:4]
motifs <- generate_motifs(alph, 4, 4, 4, 6)
results <- generate_kmer_data(n_seq, sequence_length, alph,
                              motifs, n_injections = 4)
#>  [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 0