Replaces all occurrences of a letter with another.

substitute_letters(x, encoding, ...)

# S3 method for sq
substitute_letters(x, encoding, ..., NA_letter = getOption("tidysq_NA_letter"))

Arguments

x

[sq]
An object this function is applied to.

encoding

[character || numeric]
A dictionary (named vector), where names are letters to be replaced and elements are their respective replacements.

...

further arguments to be passed from or to other methods.

NA_letter

[character(1)]
A string that is used to interpret and display NA value in the context of sq class. Default value equals to "!".

Value

An sq object of atp type with updated alphabet.

Details

substitute_letters allows to replace unwanted letters in any sequence with user-defined or IUPAC symbols. Letters can also be replaced with NA values, so that they can be later removed from the sequence by remove_na function.

It doesn't matter whether replaced or replacing letter is single or multiple character. However, the user cannot replace multiple letters with one nor one letter with more than one.

Of course, multiple different letters can be encoded to the same symbol, so c(A = "rep1", H = "rep1", G = "rep1") is allowed, but c(AHG = "rep1") is not (unless there is a letter "AHG" in the alphabet). By doing that any information of separateness of original letters is lost, so it isn't possible to retrieve original sequence after this operation.

All encoding names must be letters contained within the alphabet, otherwise an error will be thrown.

See also

Functions that manipulate type of sequences: find_invalid_letters(), is.sq(), sq_type(), typify()

Examples

# Creating objects to work on:
sq_dna <- sq(c("ATGCAGGA", "GACCGAACGAN", "TGACGAGCTTA", "ACTNNAGCN"),
             alphabet = "dna_ext")
sq_ami <- sq(c("MIOONYTWIL","TIOOLGNIIYROIE", "NYERTGHLI", "MOYXXXIOLN"),
             alphabet = "ami_ext")
sq_atp <- sq(c("mALPVQAmAmA", "mAmAPQ"), alphabet = c("mA", LETTERS))

# Not all letters must have their encoding specified:
substitute_letters(sq_dna, c(T = "t", A = "a", C = "c", G = "g"))
#> atp (atypical alphabet) sequences list:
#> [1] atgcagga                                                                 <8>
#> [2] gaccgaacgaN                                                             <11>
#> [3] tgacgagctta                                                             <11>
#> [4] actNNagcN                                                                <9>
substitute_letters(sq_ami, c(M = "X"))
#> atp (atypical alphabet) sequences list:
#> [1] XIOONYTWIL                                                              <10>
#> [2] TIOOLGNIIYROIE                                                          <14>
#> [3] NYERTGHLI                                                                <9>
#> [4] XOYXXXIOLN                                                              <10>

# Multiple character letters are supported in encodings:
substitute_letters(sq_atp, c(mA = "-"))
#> atp (atypical alphabet) sequences list:
#> [1] -LPVQA--                                                                 <8>
#> [2] --PQ                                                                     <4>
substitute_letters(sq_ami, c(I = "ough", O = "eau"))
#> atp (atypical alphabet) sequences list:
#> [1] M ough eau eau N Y T W ough L                                           <10>
#> [2] T ough eau eau L G N ough ough Y R eau ough E                           <14>
#> [3] N Y E R T G H L ough                                                     <9>
#> [4] M eau Y X X X ough eau L N                                              <10>

# Numeric substitutions are allowed too, these are coerced to characters:
substitute_letters(sq_dna, c(N = 9, G = 7))
#> atp (atypical alphabet) sequences list:
#> [1] AT7CA77A                                                                 <8>
#> [2] 7ACC7AAC7A9                                                             <11>
#> [3] T7AC7A7CTTA                                                             <11>
#> [4] ACT99A7C9                                                                <9>

# It's possible to replace a letter with NA value:
substitute_letters(sq_ami, c(X = NA_character_))
#> atp (atypical alphabet) sequences list:
#> [1] M I O O N Y T W I L                                                     <10>
#> [2] T I O O L G N I I Y R O I E                                             <14>
#> [3] N Y E R T G H L I                                                        <9>
#> [4] M O Y NA NA NA I O L N                                                  <10>