All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Add Base.print(::IO, ::Index)
- Touch up documentation
- Bump TranscodingStreams to 0.10
- Add short-form show for records
- Migrate to Automa v1
- Drop ReTest test dep
- Allow non-PHRED quality scores, such as Solexa scores, which can be negative (#104)
- Fix doc examples for writer with do-syntax (#100)
- Implement
Base.copy!
forFASTQRecord
andFASTARecord
- Fix
Base.read!(::FASTQReader, ::FASTQRecord)
(issue #95)
Version 2 is a near-complete rewrite of FASTX. It brings strives to provide an easier and more consistent API, while also being faster, more memory efficient, and better tested than v1.
The changes are comprehensive, but code should only need a few minor tweaks to work with v2. I recommend upgrading your packages using a static analysis tool like JET.jl.
description
has changed meaning: In v1, it meant the part of the header after the '>' symbol and up until first whitespace. Now it extends to the whole header line until the ending newline. This implies the identifier is a prefix of the description.header
has been removed, and is now replaced bydescription
.- All
Record
objects now have an identifier, a description and a sequence, and allFASTQRecord
s have a quality. These may be empty, but will not throw an error when accessing them. - As a consequence, all "checker" functions like
hassequence
,isfilled
,hasdescription
and so on has been removed, since the answer now is trivially "yes" in all cases. identifier
,description
,sequence
andquality
now returns anAbstractString
by default. Although it is an implementation detail, it uses zero-copy string views for performance.- You can no longer construct a record using e.g.
Record(::String)
. Instead, useparse(Record, ::String)
. seqlen
is renamedseqsize
to emphasize that it returns the data size of the sequence, not necessarily its length.
- All readers/writers now take any other arguments than the main IO as a keyword for clarity and consistency.
- FASTQ.Writers will no longer by default modify
FASTQ.Records
's second header. An optional keyword forces the reader to always write/skip second header if set totrue
orfalse
, but it defaults tonothing
, meaning it leaves it intact. - FASTQ writers now can no longer fill in ambiguous bases in Records transparently, or otherwise transform records, when writing. If the user wishes to transform records, they must do it my manually calling a function that transforms the records.
FASTQ.Read
has been removed. To subset a read, extract the sequence and quality, and construct a new Record object from these.transcribe
has been removed, as it is now trivial to do the same thing. It may be added in a future release with new functionality.
- Function
quality_scores
return the qualities of a FASTQ record as a lazy, validating iterator of PHRED quality scores. - New object:
QualityEncoding
can be used to construct custom PHRED/ASCII quality encodings. accessing quality scores uses an existing default object. - Readers now have a keyword
copy
that defaults totrue
. If set tofalse
, iterating over a reader will overwrite the same record for performance. Use with care. This makes the oldwhile !eof(reader)
-idiom obsolete in favor of iterating over a reader constructed withcopy=false
. - Users can now use the following syntax to make processing gzipped readers easier:
this is a change in BioGenerics.jl, but is guaranteed to work in FASTX.jl v2.
Reader(GzipDecompressorStream(open(path)); kwargs...) do reader # stuff end
- FAI (FASTX index) files can now be written as well as read.
- FASTA files can now be indexed with the new function
faidx
. - Function
extract
can extract parts of a sequence from an indexed FASTA reader without loading the entire sequence into memory. You can use this to e.g. extract a small part of a large chromosome. (see #29) - New functions
validate_fasta
andvalidate_fastq
validates if anIO
is formatted validly, faster and more memory-efficiently than loading in the file.
- All practically useful functions and types are now exported directly from FASTX,
so users don't need to prepend identifiers with
FASTA.
orFASTQ.
. - FASTA readers are more liberal in what formats they will accept (#73)
- The method
FASTA.sequence(::FASTA.Record)
has been removed, since the auto-detection of sequence type chould not be made reliable enough.
header(::Union{FASTA.Record, FASTQ.Record})
returns the full header line.sequence_iter(::Union{FASTA.Record, FASTQ.Record})
returns a no-copy iterator over the sequence. If the record is mutated, this iterator will be in an invalid state.quality_iter(::FASTQ.Record)
- same as above, but for PHRED quality.- New type
FASTQRead
stores the same data as a FASTQ record, but in a Julia native format instead of a ASCII-encoding byte vector. (PR #35)
- Allow trailing newlines after last record of FASTA and FASTQ
- Fix parser FSM ambiguity
- Fix off-by-one error in line counting of FASTQ files
- Various small fixes to the internal parsing regex
- Writers are now parametric and buffered for increased writing speed
- Fixed a bug where Windows-style newlines would break the parser [4;1386;2550t]
1.1.0 - 2019-08-07
Base.copyto!
methods for copying record data to LongSequences.FASTA.seqlen
&FASTQ.seqlen
for getting the length of a sequence in a record.
- Use BioSequence.jl v2.0 or higher.
- Use TranscodingStreams v0.9.5.
1.0.0 - 2019-06-30
- FASTA submodule.
- FASTQ submodule.
- User manual.
- API reference.