Test sq object for presence of given motifs

Tests if elements of a sq object contain given motifs.

x %has% y

Arguments

x: [sq]
An object this function is applied to.
y: [character]
Motifs to be searched for.

Value

A logical vector of the same length as input sq, indicating which elements contain all given motifs.

Details

This function allows testing if elements of a sq object contain the given motif or motifs. It returns a logical value for every element of the sq object - TRUE if tested sequence contains searched motif and FALSE otherwise. When multiple motifs are searched, TRUE will be returned only for sequences that contain all given motifs.

This function only indicates if a motif is present within a sequence, to find all motifs and their positions within sequences use find_motifs.

Motif capabilities and restrictions

There are more options than to simply create a motif that is a string representation of searched subsequence. For example, when using this function with any of standard types, i.e. ami, dna or rna, the user can create a motif with ambiguous letters. In this case the engine will try to match any of possible meanings of this letter. For example, take "B" from extended DNA alphabet. It means "not A", so it can be matched with "C", "G" and "T", but also "B", "Y" (either "C" or "T"), "K" (either "G" or "T") and "S" (either "C" or "G").

Full list of ambiguous letters with their meaning can be found on IUPAC site.

Motifs are also restricted in that the alphabets of sq objects on which search operations are conducted cannot contain "^" and "$" symbols. These two have a special meaning - they are used to indicate beginning and end of sequence respectively and can be used to limit the position of matched subsequences.

Examples

# Creating objects to work on:
sq_dna <- sq(c("ATGCAGGA", "GACCGNBAACGAN", "TGACGAGCTTAG"),
             alphabet = "dna_bsc")
sq_ami <- sq(c("MIAANYTWIL","TIAALGNIIYRAIE", "NYERTGHLI", "MAYXXXIALN"),
             alphabet = "ami_ext")
sq_atp <- sq(c("mAmYmY", "nbAnsAmA", ""),
             alphabet = c("mA", "mY", "nbA", "nsA"))

# Testing if DNA sequences contain motif "ATG":
sq_dna %has% "ATG"
#> [1]  TRUE FALSE FALSE

# Testing if DNA sequences begin with "ATG":
sq_dna %has% "^ATG"
#> [1]  TRUE FALSE FALSE

# Testing if DNA sequences end with "TAG" (one of the stop codons):
sq_dna %has% "TAG$"
#> [1] FALSE FALSE  TRUE

# Test if amino acid sequences contain motif of two alanines followed by
# aspartic acid or asparagine ("AAB" motif matches "AAB", "AAD" and "AAN"):
sq_ami %has% "AAB"
#> [1]  TRUE FALSE FALSE FALSE

# Test if amino acid sequences contain both motifs:
sq_ami %has% c("AAXG", "MAT")
#> [1] FALSE FALSE FALSE FALSE

# Test for sequences with multicharacter alphabet:
sq_atp %has% c("nsA", "mYmY$")
#> [1] FALSE FALSE FALSE