Remove sequences that contain NA values

This function replaces sequences with NA values by empty (NULL) sequences or removes NA values from sequences in an sq object.

remove_na(x, by_letter = FALSE, ...)

# S3 method for sq
remove_na(x, by_letter = FALSE, ..., NA_letter = getOption("tidysq_NA_letter"))

Arguments

x: [sq]
An object this function is applied to.
by_letter: [logical(1)]
If FALSE, filter condition is applied to sequence as a whole. If TRUE, each letter is applied filter to separately.
...: further arguments to be passed from or to other methods.
NA_letter: [character(1)]
A string that is used to interpret and display NA value in the context of sq class. Default value equals to "!".

Value

An sq object with the same type as the input type. Sequences that do not contain any NA values are left unchanged.

Details

NA may be introduced as a result of using functions like substitute_letters or bite. They can also appear in sequences if the user reads FASTA file using read_fasta or constructs sq object from character vector with sq function without safe_mode turned on - and there are letters in file or strings other than specified in the alphabet.

remove_na() is used to filter out sequences or elements that have NA value(s). By default, if any letter in a sequence is NA, then whole sequence is replaced by empty (NULL) sequence. However, if by_letter parameter is set to TRUE, then sequences are only shortened by excluding NA values.

Examples

# Creating objects to work on:
sq_ami <- sq(c("MIAANYTWIL","TIAALGNIIYRAIE", "NYERTGHLI", "MAYXXXIALN"),
             alphabet = "ami_ext")
sq_dna <- sq(c("ATGCAGGA", "GACCGAACGAN", "TGACGAGCTTA", "ACTNNAGCN"),
             alphabet = "dna_ext")

# Substituting some letters with NA
sq_ami_sub <- substitute_letters(sq_ami, c(E = NA_character_, R = NA_character_))
sq_dna_sub <- substitute_letters(sq_dna, c(N = NA_character_))

# Biting sequences out of range
sq_bitten <- bite(sq_ami, 1:15)
#> Warning: some sequences are subsetted with index bigger than length - NA introduced

# Printing the sequences
sq_ami_sub
#> atp (atypical alphabet) sequences list:
#> [1] M I A A N Y T W I L                                                     <10>
#> [2] T I A A L G N I I Y NA A I NA                                           <14>
#> [3] N Y NA NA T G H L I                                                      <9>
#> [4] M A Y X X X I A L N                                                     <10>
sq_dna_sub
#> atp (atypical alphabet) sequences list:
#> [1] A T G C A G G A                                                          <8>
#> [2] G A C C G A A C G A NA                                                  <11>
#> [3] T G A C G A G C T T A                                                   <11>
#> [4] A C T NA NA A G C NA                                                     <9>

# Removing sequences containing NA
remove_na(sq_ami_sub)
#> atp (atypical alphabet) sequences list:
#> [1] M I A A N Y T W I L                                                     <10>
#> [2] T I A A L G N I I Y NA A I NA                                           <14>
#> [3] N Y NA NA T G H L I                                                      <9>
#> [4] M A Y X X X I A L N                                                     <10>
remove_na(sq_dna_sub)
#> atp (atypical alphabet) sequences list:
#> [1] A T G C A G G A                                                          <8>
#> [2] G A C C G A A C G A NA                                                  <11>
#> [3] T G A C G A G C T T A                                                   <11>
#> [4] A C T NA NA A G C NA                                                     <9>
remove_na(sq_bitten)
#> extended amino acid sequences list:
#> [1] <NULL>                                                                   <0>
#> [2] <NULL>                                                                   <0>
#> [3] <NULL>                                                                   <0>
#> [4] <NULL>                                                                   <0>

# Removing only NA elements
remove_na(sq_ami_sub, by_letter = TRUE)
#> atp (atypical alphabet) sequences list:
#> [1] M I A A N Y T W I L                                                     <10>
#> [2] T I A A L G N I I Y NA A I NA                                           <14>
#> [3] N Y NA NA T G H L I                                                      <9>
#> [4] M A Y X X X I A L N                                                     <10>
remove_na(sq_dna_sub, TRUE)
#> atp (atypical alphabet) sequences list:
#> [1] A T G C A G G A                                                          <8>
#> [2] G A C C G A A C G A NA                                                  <11>
#> [3] T G A C G A G C T T A                                                   <11>
#> [4] A C T NA NA A G C NA                                                     <9>
remove_na(sq_bitten, TRUE)
#> extended amino acid sequences list:
#> [1] MIAANYTWIL                                                              <10>
#> [2] TIAALGNIIYRAIE                                                          <14>
#> [3] NYERTGHLI                                                                <9>
#> [4] MAYXXXIALN                                                              <10>

Remove sequences that contain NA values

Arguments

Value

Details

See also

Examples