Subset sequences from sq objects

Extracts a defined range of elements from all sequences.

bite(x, indices, ...)

# S3 method for sq
bite(
  x,
  indices,
  ...,
  NA_letter = getOption("tidysq_NA_letter"),
  on_warning = getOption("tidysq_on_warning")
)

Arguments

x: [sq]
An object this function is applied to.
indices: [integer]
Indices to extract from each sequence. The function follows the normal R conventions for indexing vectors, including negative indices.
...: further arguments to be passed from or to other methods.
NA_letter: [character(1)]
A string that is used to interpret and display NA value in the context of sq class. Default value equals to "!".
on_warning: ["silent" || "message" || "warning" || "error"]
Determines the method of handling warning message. Default value is "warning".

Value

sq object of the same type as input sq, where each element is a subsequence created by indexing corresponding sequence from input sq object with input indices.

Details

bite function allows user to access specific elements from multiple sequences at once.

By passing positive indices the user can choose, which elements they want from each sequence. If a sequence is shorter than an index, then NA value is inserted into the result in this place and a warning is issued. The user can specify behavior of R in this case by specifying on_warning parameter.

Negative indices are supported as well. Their interpretation is "to select all elements except those on positions specified by these negative indices". This means that e.g. c(-1, -3, -5) vector will be used to bite all sequence elements except the first, the third and the fifth. If a sequence is shorter than any index, then nothing happens, as it's physically impossible to extract an element at said index.

As per normal R convention, it isn't accepted to mix positive and negative indices, because there is no good interpretation possible for that.

Examples

# Creating objects to work on:
sq_dna <- sq(c("ATGCAGGA", "GACCGNBAACGAN", "TGACGAGCTTA"),
             alphabet = "dna_bsc")
sq_ami <- sq(c("MIAANYTWIL","TIAALGNIIYRAIE", "NYERTGHLI", "MAYXXXIALN"),
             alphabet = "ami_ext")
sq_unt <- sq(c("ATGCAGGA?", "TGACGAGCTTA", "", "TIAALGNIIYRAIE"))

# Extracting first five letters:
bite(sq_dna, 1:5)
#> basic DNA sequences list:
#> [1] ATGCA                                                                    <5>
#> [2] GACCG                                                                    <5>
#> [3] TGACG                                                                    <5>

# If a sequence is shorter than 5, then NA is introduced:
bite(sq_unt, 1:5)
#> Warning: some sequences are subsetted with index bigger than length - NA introduced
#> unt (unspecified type) sequences list:
#> [1] ATGCA                                                                    <5>
#> [2] TGACG                                                                    <5>
#> [3] !!!!!                                                                    <5>
#> [4] TIAAL                                                                    <5>

# Selecting fourth, seventh and fourth again letter:
bite(sq_ami, c(4, 7, 4))
#> extended amino acid sequences list:
#> [1] ATA                                                                      <3>
#> [2] ANA                                                                      <3>
#> [3] RHR                                                                      <3>
#> [4] XIX                                                                      <3>

# Selecting all letters except first four:
bite(sq_dna, -1:-4)
#> basic DNA sequences list:
#> [1] AGGA                                                                     <4>
#> [2] G!!AACGA!                                                                <9>
#> [3] GAGCTTA                                                                  <7>

Subset sequences from sq objects

Arguments

Value

Details

See also

Examples