This function replaces sequences with NA values by
empty (NULL) sequences or removes NA values from sequences
in an sq object.
remove_na(x, by_letter = FALSE, ...)
# S3 method for sq
remove_na(x, by_letter = FALSE, ..., NA_letter = getOption("tidysq_NA_letter"))[sq]
An object this function is applied to.
[logical(1)]
If FALSE, filter condition is applied to sequence as a whole. If
TRUE, each letter is applied filter to separately.
further arguments to be passed from or to other methods.
[character(1)]
A string that is used to interpret and display NA value in the
context of sq class. Default value equals to
"!".
An sq object with the same type as the
input type. Sequences that do not contain any NA values are left
unchanged.
NA may be introduced as a result of using functions like
substitute_letters or bite. They can also appear
in sequences if the user reads FASTA file using read_fasta or
constructs sq object from character vector with
sq function without safe_mode turned on - and there are
letters in file or strings other than specified in the alphabet.
remove_na() is used to filter out sequences or elements that have
NA value(s). By default, if any letter in a sequence is NA,
then whole sequence is replaced by empty (NULL) sequence. However, if
by_letter parameter is set to TRUE, then sequences are
only shortened by excluding NA values.
Functions that clean sequences:
is_empty_sq(),
remove_ambiguous()
# Creating objects to work on:
sq_ami <- sq(c("MIAANYTWIL","TIAALGNIIYRAIE", "NYERTGHLI", "MAYXXXIALN"),
alphabet = "ami_ext")
sq_dna <- sq(c("ATGCAGGA", "GACCGAACGAN", "TGACGAGCTTA", "ACTNNAGCN"),
alphabet = "dna_ext")
# Substituting some letters with NA
sq_ami_sub <- substitute_letters(sq_ami, c(E = NA_character_, R = NA_character_))
sq_dna_sub <- substitute_letters(sq_dna, c(N = NA_character_))
# Biting sequences out of range
sq_bitten <- bite(sq_ami, 1:15)
#> Warning: some sequences are subsetted with index bigger than length - NA introduced
# Printing the sequences
sq_ami_sub
#> atp (atypical alphabet) sequences list:
#> [1] M I A A N Y T W I L <10>
#> [2] T I A A L G N I I Y NA A I NA <14>
#> [3] N Y NA NA T G H L I <9>
#> [4] M A Y X X X I A L N <10>
sq_dna_sub
#> atp (atypical alphabet) sequences list:
#> [1] A T G C A G G A <8>
#> [2] G A C C G A A C G A NA <11>
#> [3] T G A C G A G C T T A <11>
#> [4] A C T NA NA A G C NA <9>
# Removing sequences containing NA
remove_na(sq_ami_sub)
#> atp (atypical alphabet) sequences list:
#> [1] M I A A N Y T W I L <10>
#> [2] T I A A L G N I I Y NA A I NA <14>
#> [3] N Y NA NA T G H L I <9>
#> [4] M A Y X X X I A L N <10>
remove_na(sq_dna_sub)
#> atp (atypical alphabet) sequences list:
#> [1] A T G C A G G A <8>
#> [2] G A C C G A A C G A NA <11>
#> [3] T G A C G A G C T T A <11>
#> [4] A C T NA NA A G C NA <9>
remove_na(sq_bitten)
#> extended amino acid sequences list:
#> [1] <NULL> <0>
#> [2] <NULL> <0>
#> [3] <NULL> <0>
#> [4] <NULL> <0>
# Removing only NA elements
remove_na(sq_ami_sub, by_letter = TRUE)
#> atp (atypical alphabet) sequences list:
#> [1] M I A A N Y T W I L <10>
#> [2] T I A A L G N I I Y NA A I NA <14>
#> [3] N Y NA NA T G H L I <9>
#> [4] M A Y X X X I A L N <10>
remove_na(sq_dna_sub, TRUE)
#> atp (atypical alphabet) sequences list:
#> [1] A T G C A G G A <8>
#> [2] G A C C G A A C G A NA <11>
#> [3] T G A C G A G C T T A <11>
#> [4] A C T NA NA A G C NA <9>
remove_na(sq_bitten, TRUE)
#> extended amino acid sequences list:
#> [1] MIAANYTWIL <10>
#> [2] TIAALGNIIYRAIE <14>
#> [3] NYERTGHLI <9>
#> [4] MAYXXXIALN <10>