Replaces all occurrences of a letter with another.
substitute_letters(x, encoding, ...)
# S3 method for sq
substitute_letters(x, encoding, ..., NA_letter = getOption("tidysq_NA_letter"))
[sq
]
An object this function is applied to.
[character
|| numeric
]
A dictionary (named vector), where names are letters to be replaced and
elements are their respective replacements.
further arguments to be passed from or to other methods.
[character(1)
]
A string that is used to interpret and display NA
value in the
context of sq class
. Default value equals to
"!
".
An sq
object of atp type with
updated alphabet.
substitute_letters
allows to replace unwanted letters in any sequence
with user-defined or IUPAC symbols. Letters can also be replaced with
NA
values, so that they can be later removed from the sequence
by remove_na
function.
It doesn't matter whether replaced or replacing letter is single or multiple character. However, the user cannot replace multiple letters with one nor one letter with more than one.
Of course, multiple different letters can be encoded to the same symbol, so
c(A = "rep1", H = "rep1", G = "rep1")
is allowed, but
c(AHG = "rep1")
is not (unless there is a letter "AHG
" in
the alphabet). By doing that any information of separateness of original
letters is lost, so it isn't possible to retrieve original sequence after
this operation.
All encoding names must be letters contained within the alphabet, otherwise an error will be thrown.
Functions that manipulate type of sequences:
find_invalid_letters()
,
is.sq()
,
sq_type()
,
typify()
# Creating objects to work on:
sq_dna <- sq(c("ATGCAGGA", "GACCGAACGAN", "TGACGAGCTTA", "ACTNNAGCN"),
alphabet = "dna_ext")
sq_ami <- sq(c("MIOONYTWIL","TIOOLGNIIYROIE", "NYERTGHLI", "MOYXXXIOLN"),
alphabet = "ami_ext")
sq_atp <- sq(c("mALPVQAmAmA", "mAmAPQ"), alphabet = c("mA", LETTERS))
# Not all letters must have their encoding specified:
substitute_letters(sq_dna, c(T = "t", A = "a", C = "c", G = "g"))
#> atp (atypical alphabet) sequences list:
#> [1] atgcagga <8>
#> [2] gaccgaacgaN <11>
#> [3] tgacgagctta <11>
#> [4] actNNagcN <9>
substitute_letters(sq_ami, c(M = "X"))
#> atp (atypical alphabet) sequences list:
#> [1] XIOONYTWIL <10>
#> [2] TIOOLGNIIYROIE <14>
#> [3] NYERTGHLI <9>
#> [4] XOYXXXIOLN <10>
# Multiple character letters are supported in encodings:
substitute_letters(sq_atp, c(mA = "-"))
#> atp (atypical alphabet) sequences list:
#> [1] -LPVQA-- <8>
#> [2] --PQ <4>
substitute_letters(sq_ami, c(I = "ough", O = "eau"))
#> atp (atypical alphabet) sequences list:
#> [1] M ough eau eau N Y T W ough L <10>
#> [2] T ough eau eau L G N ough ough Y R eau ough E <14>
#> [3] N Y E R T G H L ough <9>
#> [4] M eau Y X X X ough eau L N <10>
# Numeric substitutions are allowed too, these are coerced to characters:
substitute_letters(sq_dna, c(N = 9, G = 7))
#> atp (atypical alphabet) sequences list:
#> [1] AT7CA77A <8>
#> [2] 7ACC7AAC7A9 <11>
#> [3] T7AC7A7CTTA <11>
#> [4] ACT99A7C9 <9>
# It's possible to replace a letter with NA value:
substitute_letters(sq_ami, c(X = NA_character_))
#> atp (atypical alphabet) sequences list:
#> [1] M I O O N Y T W I L <10>
#> [2] T I O O L G N I I Y R O I E <14>
#> [3] N Y E R T G H L I <9>
#> [4] M O Y NA NA NA I O L N <10>