Tests if elements of a sq
object
contain given motifs.
x %has% y
[sq
]
An object this function is applied to.
[character
]
Motifs to be searched for.
A logical
vector of the same length as input sq
,
indicating which elements contain all given motifs.
This function allows testing if elements of a sq
object contain the
given motif or motifs. It returns a logical
value for every element
of the sq
object - TRUE
if tested sequence contains searched
motif and FALSE
otherwise. When multiple motifs are searched,
TRUE
will be returned only for sequences that contain all given
motifs.
This function only indicates if a motif is present within a sequence, to find
all motifs and their positions within sequences use
find_motifs
.
There are more options than to simply create a motif that is a string representation of searched subsequence. For example, when using this function with any of standard types, i.e. ami, dna or rna, the user can create a motif with ambiguous letters. In this case the engine will try to match any of possible meanings of this letter. For example, take "B" from extended DNA alphabet. It means "not A", so it can be matched with "C", "G" and "T", but also "B", "Y" (either "C" or "T"), "K" (either "G" or "T") and "S" (either "C" or "G").
Full list of ambiguous letters with their meaning can be found on IUPAC site.
Motifs are also restricted in that the alphabets of sq
objects on
which search operations are conducted cannot contain "^" and "$" symbols.
These two have a special meaning - they are used to indicate beginning and
end of sequence respectively and can be used to limit the position of matched
subsequences.
Functions interpreting sq in biological context:
complement()
,
find_motifs()
,
translate()
# Creating objects to work on:
sq_dna <- sq(c("ATGCAGGA", "GACCGNBAACGAN", "TGACGAGCTTAG"),
alphabet = "dna_bsc")
sq_ami <- sq(c("MIAANYTWIL","TIAALGNIIYRAIE", "NYERTGHLI", "MAYXXXIALN"),
alphabet = "ami_ext")
sq_atp <- sq(c("mAmYmY", "nbAnsAmA", ""),
alphabet = c("mA", "mY", "nbA", "nsA"))
# Testing if DNA sequences contain motif "ATG":
sq_dna %has% "ATG"
#> [1] TRUE FALSE FALSE
# Testing if DNA sequences begin with "ATG":
sq_dna %has% "^ATG"
#> [1] TRUE FALSE FALSE
# Testing if DNA sequences end with "TAG" (one of the stop codons):
sq_dna %has% "TAG$"
#> [1] FALSE FALSE TRUE
# Test if amino acid sequences contain motif of two alanines followed by
# aspartic acid or asparagine ("AAB" motif matches "AAB", "AAD" and "AAN"):
sq_ami %has% "AAB"
#> [1] TRUE FALSE FALSE FALSE
# Test if amino acid sequences contain both motifs:
sq_ami %has% c("AAXG", "MAT")
#> [1] FALSE FALSE FALSE FALSE
# Test for sequences with multicharacter alphabet:
sq_atp %has% c("nsA", "mYmY$")
#> [1] FALSE FALSE FALSE