Reads a FASTA file that contains nucleotide or amino acid sequences and returns a tibble with obtained data.

read_fasta(
  file_name,
  alphabet = NULL,
  NA_letter = getOption("tidysq_NA_letter"),
  safe_mode = getOption("tidysq_safe_mode"),
  on_warning = getOption("tidysq_on_warning"),
  ignore_case = FALSE
)

Arguments

file_name

[character(1)]
Absolute path to file or url to read from.

alphabet

[character]
If provided value is a single string, it will be interpreted as type (see details). If provided value has length greater than one, it will be treated as atypical alphabet for sq object and sq type will be atp. If provided value is NULL, type guessing will be performed (see details).

NA_letter

[character(1)]
A string that is used to interpret and display NA value in the context of sq class. Default value equals to "!".

safe_mode

[logical(1)]
Default value is FALSE. When turned on, safe mode guarantees that NA appears within a sequence if and only if input sequence contains value passed with NA_letter. This means that resulting type might be different to the one passed as argument, if there are letters in a sequence that does not appear in the original alphabet.

on_warning

["silent" || "message" || "warning" || "error"]
Determines the method of handling warning message. Default value is "warning".

ignore_case

[logical(1)]
If turned on, lowercase letters are turned into respective uppercase ones and interpreted as such. If not, either sq object must be of type unt or all lowercase letters are interpreted as NA values. Default value is FALSE. Ignoring case does not work with atp alphabets.

Value

A tibble with number of rows equal to the number of sequences and two columns:

  • namespecifies name of a sequence, used in functions like find_motifs

  • sqcontains extracted sequence itself

Details

All rules of creating sq objects are the same as in sq.

See also

readLines

Functions from input module: import_sq(), random_sq(), sq()

Examples

fasta_file <- system.file(package = "tidysq", "examples/example_aa.fasta")

# In this case, these two calls are equivalent in result:
read_fasta(fasta_file)
#> Warning: Non-standard IUPAC symbols detected for DNA: 1549 characters were converted to N.
#> DNA vector of 421 sequences
#> AMY1|K19|T-Protei...  NGGGKVNNVYKNV
#> AMY9|K19Gluc41|T-...  NNKHNNGGGKVNNVYKNVDNSKVTSKCGSNGNNHHKNGGGNVN
#> AMY14|K19Gluc782|...  NNKHNNGGGKVNNVYKNVD
#> AMY17|PHF8|T-Prot...  GKVNNVYK
#> AMY18|PHF6|T-Prot...  VNNVYK
#> AMY22|Whole|Amylo...  DANNRHDSGYNVHHNKNVNNANDVGSNKGANNGNMVGGVV
#> AMY23|HABP1|Amylo...  VNHNKNVNNANDVGS
#> AMY24|HABP2|Amylo...  VHNNKNVNNANDVGS
#> AMY25|HABP3|Amylo...  VHHNKNVNNANDVGS
#> AMY26|HABP4|Amylo...  VHHNNNVNNANDVGS
#> AMY32|HABP10|Amyl...  KKNVNNNND
#> AMY34|HABP12|Amyl...  VHHNNKNVNNANNVGS
#> ... with 409 more sequences.
read_fasta(fasta_file, alphabet = "ami_bsc")
#> Error in read_fasta(fasta_file, alphabet = "ami_bsc"): unused argument (alphabet = "ami_bsc")

if (FALSE) {
# It's possible to read FASTA file from URL:
read_fasta("https://www.uniprot.org/uniprot/P28307.fasta")
}