This function replaces sequences with ambiguous elements by
empty (NULL
) sequences or removes ambiguous elements from sequences
in an sq
object.
remove_ambiguous(x, by_letter = FALSE, ...)
# S3 method for sq
remove_ambiguous(
x,
by_letter = FALSE,
...,
NA_letter = getOption("tidysq_NA_letter")
)
[sq_dna_bsc || sq_rna_bsc || sq_dna_ext || sq_rna_ext || sq_ami_bsc || sq_ami_ext
]
An object this function is applied to.
[logical(1)
]
If FALSE
, filter condition is applied to sequence as a whole. If
TRUE
, each letter is applied filter to separately.
further arguments to be passed from or to other methods.
[character(1)
]
A string that is used to interpret and display NA
value in the
context of sq class
. Default value equals to
"!
".
An sq
object with the _bscversion of inputted type.
Biological sequences, whether of DNA, RNA or amino acid elements, are not always exactly determined. Sometimes the only information the user has about an element is that it's one of given set of possible elements. In this case the element is described with one of special letters, here called ambiguous.
The inclusion of these letters is the difference between extended and basic alphabets (and, conversely, types). For amino acid alphabet these letters are: B, J, O, U, X, Z; whereas for DNA and RNA: W, S, M, K, R, Y, B, D, H, V, N.
remove_ambiguous()
is used to create sequences without any of the
elements above. Depending on value of by_letter
argument, the function
either replaces "ambiguous" sequences with empty sequences (if
by_letter
is equal to TRUE
) or shortens original sequence by
retaining only unambiguous letters (if opposite is true).
Functions that clean sequences:
is_empty_sq()
,
remove_na()
# Creating objects to work on:
sq_ami <- sq(c("MIAANYTWIL","TIAALGNIIYRAIE", "NYERTGHLI", "MAYXXXIALN"),
alphabet = "ami_ext")
sq_dna <- sq(c("ATGCAGGA", "GACCGAACGAN", "TGACGAGCTTA", "ACTNNAGCN"),
alphabet = "dna_ext")
# Removing whole sequences with ambiguous elements:
remove_ambiguous(sq_ami)
#> basic amino acid sequences list:
#> [1] MIAANYTWIL <10>
#> [2] TIAALGNIIYRAIE <14>
#> [3] NYERTGHLI <9>
#> [4] <NULL> <0>
remove_ambiguous(sq_dna)
#> basic DNA sequences list:
#> [1] ATGCAGGA <8>
#> [2] <NULL> <0>
#> [3] TGACGAGCTTA <11>
#> [4] <NULL> <0>
# Removing ambiguous elements from sequences:
remove_ambiguous(sq_ami, by_letter = TRUE)
#> basic amino acid sequences list:
#> [1] MIAANYTWIL <10>
#> [2] TIAALGNIIYRAIE <14>
#> [3] NYERTGHLI <9>
#> [4] MAYIALN <7>
remove_ambiguous(sq_dna, by_letter = TRUE)
#> basic DNA sequences list:
#> [1] ATGCAGGA <8>
#> [2] GACCGAACGA <10>
#> [3] TGACGAGCTTA <11>
#> [4] ACTAGC <6>
# Analysis of the result
sq_clean <- remove_ambiguous(sq_ami)
is_empty_sq(sq_clean)
#> [1] FALSE FALSE FALSE TRUE
sq_type(sq_clean)
#> [1] "ami_bsc"