Reads a FASTA file that contains nucleotide or amino acid
sequences and returns a tibble
with obtained data.
[character(1)
]
Absolute path to file or url to read from.
[character
]
If provided value is a single string, it will be interpreted as type (see
details). If provided value has length greater than one, it will be treated
as atypical alphabet for sq
object and sq
type will be
atp
. If provided value is NULL
, type
guessing will be performed (see details).
[character(1)
]
A string that is used to interpret and display NA
value in the
context of sq class
. Default value equals to
"!
".
[logical(1)
]
Default value is FALSE
. When turned on, safe mode guarantees that
NA
appears within a sequence if and only if input sequence contains
value passed with NA_letter
. This means that resulting type might be
different to the one passed as argument, if there are letters in a sequence
that does not appear in the original alphabet.
["silent" || "message" || "warning" || "error"
]
Determines the method of handling warning message. Default value is
"warning"
.
[logical(1)
]
If turned on, lowercase letters are turned into respective uppercase ones
and interpreted as such. If not, either sq
object must be of type
unt or all lowercase letters are interpreted as NA
values.
Default value is FALSE
. Ignoring case does not work with atp
alphabets.
A tibble
with number of rows equal to the
number of sequences and two columns:
namespecifies name of a sequence, used in functions like
find_motifs
sqcontains extracted sequence itself
All rules of creating sq
objects are the same as in sq
.
fasta_file <- system.file(package = "tidysq", "examples/example_aa.fasta")
# In this case, these two calls are equivalent in result:
read_fasta(fasta_file)
#> Warning: Non-standard IUPAC symbols detected for DNA: 1549 characters were converted to N.
#> DNA vector of 421 sequences
#> AMY1|K19|T-Protei... NGGGKVNNVYKNV
#> AMY9|K19Gluc41|T-... NNKHNNGGGKVNNVYKNVDNSKVTSKCGSNGNNHHKNGGGNVN
#> AMY14|K19Gluc782|... NNKHNNGGGKVNNVYKNVD
#> AMY17|PHF8|T-Prot... GKVNNVYK
#> AMY18|PHF6|T-Prot... VNNVYK
#> AMY22|Whole|Amylo... DANNRHDSGYNVHHNKNVNNANDVGSNKGANNGNMVGGVV
#> AMY23|HABP1|Amylo... VNHNKNVNNANDVGS
#> AMY24|HABP2|Amylo... VHNNKNVNNANDVGS
#> AMY25|HABP3|Amylo... VHHNKNVNNANDVGS
#> AMY26|HABP4|Amylo... VHHNNNVNNANDVGS
#> AMY32|HABP10|Amyl... KKNVNNNND
#> AMY34|HABP12|Amyl... VHHNNKNVNNANNVGS
#> ... with 409 more sequences.
read_fasta(fasta_file, alphabet = "ami_bsc")
#> Error in read_fasta(fasta_file, alphabet = "ami_bsc"): unused argument (alphabet = "ami_bsc")
if (FALSE) {
# It's possible to read FASTA file from URL:
read_fasta("https://www.uniprot.org/uniprot/P28307.fasta")
}