Finds elements in given sequence not contained in amino acid or nucleotide alphabet.

find_invalid_letters(x, dest_type, ...)

# S3 method for sq
find_invalid_letters(
  x,
  dest_type,
  ...,
  NA_letter = getOption("tidysq_NA_letter")
)

Arguments

x

[sq]
An object this function is applied to.

dest_type

[character(1)]
The name of destination type - one of "dna_bsc", "dna_ext", "rna_bsc", "rna_ext", "ami_bsc" and "ami_ext".

...

further arguments to be passed from or to other methods.

NA_letter

[character(1)]
A string that is used to interpret and display NA value in the context of sq class. Default value equals to "!".

Value

A list of mismatched elements for every sequence from sq object.

Details

Amino acid, DNA and RNA standard alphabets have predefined letters. This function allows the user to check which letters from input sequences are not contained in selected one of these alphabets.

Returned list contains a character vector for each input sequence. Each element of a vector is a letter that appear in corresponding sequence and not in the target alphabet.

You can check which letters are valid for specified type in alphabet documentation.

See also

alphabet()

Functions that manipulate type of sequences: is.sq(), sq_type(), substitute_letters(), typify()

Examples

# Creating objects to work on:
sq_unt <- sq(c("ACGPOIUATTAGACG","GGATFGHA"), alphabet = "unt")
sq_ami <- sq(c("QWERTYUIZXCVBNM","LKJHGFDSAZXCVBN"), alphabet = "ami_ext")

# Mismatched elements might be from basic type:
find_invalid_letters(sq_ami, "ami_bsc")
#> [[1]]
#> [1] "B" "U" "X" "Z"
#> 
#> [[2]]
#> [1] "B" "J" "X" "Z"
#> 

# But also from type completely unrelated to the current one:
find_invalid_letters(sq_unt, "dna_ext")
#> [[1]]
#> [1] "I" "O" "P" "U"
#> 
#> [[2]]
#> [1] "F"
#>