This function replaces sequences with NA
values by
empty (NULL
) sequences or removes NA
values from sequences
in an sq
object.
remove_na(x, by_letter = FALSE, ...)
# S3 method for sq
remove_na(x, by_letter = FALSE, ..., NA_letter = getOption("tidysq_NA_letter"))
[sq
]
An object this function is applied to.
[logical(1)
]
If FALSE
, filter condition is applied to sequence as a whole. If
TRUE
, each letter is applied filter to separately.
further arguments to be passed from or to other methods.
[character(1)
]
A string that is used to interpret and display NA
value in the
context of sq class
. Default value equals to
"!
".
An sq
object with the same type as the
input type. Sequences that do not contain any NA
values are left
unchanged.
NA
may be introduced as a result of using functions like
substitute_letters
or bite
. They can also appear
in sequences if the user reads FASTA file using read_fasta
or
constructs sq
object from character
vector with
sq
function without safe_mode
turned on - and there are
letters in file or strings other than specified in the alphabet.
remove_na()
is used to filter out sequences or elements that have
NA
value(s). By default, if any letter in a sequence is NA
,
then whole sequence is replaced by empty (NULL
) sequence. However, if
by_letter
parameter is set to TRUE
, then sequences are
only shortened by excluding NA
values.
Functions that clean sequences:
is_empty_sq()
,
remove_ambiguous()
# Creating objects to work on:
sq_ami <- sq(c("MIAANYTWIL","TIAALGNIIYRAIE", "NYERTGHLI", "MAYXXXIALN"),
alphabet = "ami_ext")
sq_dna <- sq(c("ATGCAGGA", "GACCGAACGAN", "TGACGAGCTTA", "ACTNNAGCN"),
alphabet = "dna_ext")
# Substituting some letters with NA
sq_ami_sub <- substitute_letters(sq_ami, c(E = NA_character_, R = NA_character_))
sq_dna_sub <- substitute_letters(sq_dna, c(N = NA_character_))
# Biting sequences out of range
sq_bitten <- bite(sq_ami, 1:15)
#> Warning: some sequences are subsetted with index bigger than length - NA introduced
# Printing the sequences
sq_ami_sub
#> atp (atypical alphabet) sequences list:
#> [1] M I A A N Y T W I L <10>
#> [2] T I A A L G N I I Y NA A I NA <14>
#> [3] N Y NA NA T G H L I <9>
#> [4] M A Y X X X I A L N <10>
sq_dna_sub
#> atp (atypical alphabet) sequences list:
#> [1] A T G C A G G A <8>
#> [2] G A C C G A A C G A NA <11>
#> [3] T G A C G A G C T T A <11>
#> [4] A C T NA NA A G C NA <9>
# Removing sequences containing NA
remove_na(sq_ami_sub)
#> atp (atypical alphabet) sequences list:
#> [1] M I A A N Y T W I L <10>
#> [2] T I A A L G N I I Y NA A I NA <14>
#> [3] N Y NA NA T G H L I <9>
#> [4] M A Y X X X I A L N <10>
remove_na(sq_dna_sub)
#> atp (atypical alphabet) sequences list:
#> [1] A T G C A G G A <8>
#> [2] G A C C G A A C G A NA <11>
#> [3] T G A C G A G C T T A <11>
#> [4] A C T NA NA A G C NA <9>
remove_na(sq_bitten)
#> extended amino acid sequences list:
#> [1] <NULL> <0>
#> [2] <NULL> <0>
#> [3] <NULL> <0>
#> [4] <NULL> <0>
# Removing only NA elements
remove_na(sq_ami_sub, by_letter = TRUE)
#> atp (atypical alphabet) sequences list:
#> [1] M I A A N Y T W I L <10>
#> [2] T I A A L G N I I Y NA A I NA <14>
#> [3] N Y NA NA T G H L I <9>
#> [4] M A Y X X X I A L N <10>
remove_na(sq_dna_sub, TRUE)
#> atp (atypical alphabet) sequences list:
#> [1] A T G C A G G A <8>
#> [2] G A C C G A A C G A NA <11>
#> [3] T G A C G A G C T T A <11>
#> [4] A C T NA NA A G C NA <9>
remove_na(sq_bitten, TRUE)
#> extended amino acid sequences list:
#> [1] MIAANYTWIL <10>
#> [2] TIAALGNIIYRAIE <14>
#> [3] NYERTGHLI <9>
#> [4] MAYXXXIALN <10>