-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASP support #263
ASP support #263
Conversation
…w the primary header
@misanthropealoupe Thanks! It would be great if you could also either post or e-mail a script you're currently using to read your data, and a link to where your data is located. We might not want to create a test suite with it (or do we?), but it would help my understanding greatly. |
you'll find a sample file here: baseband/data/sample_asp.asp my workflow is basically just using an ASPStreamReader to read frames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone through base.py
and made some preliminary structural comments. More to come. For now though, doing:
from baseband import asp
from baseband.data import SAMPLE_ASP
fh_raw = io.open(SAMPLE_ASP)
fh = asp.ASPStreamReader(fh_raw)
fh.read(10)
returns a TypeError because the stream reader doesn't know what the sample shape is. Moreover the header doesn't seem to have frame_nbytes
, shape
, payload_nbytes
, get_time
, set_time
, etc. required for the stream reader.
|
||
def read_frame(self): | ||
frame = ASPFrame.fromfile(self._fh_raw) | ||
# associate the original file header with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nitpicky, but I've been capitalizing and adding trailing periods to all new and existing comments. (I don't think Marten's mandated that, though.)
class ASPStreamReaderBase(VLBIStreamReaderBase): | ||
|
||
# unfinished | ||
def __init__(self, fh_raw, header0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formats like VDIF have a stream base class because there are routines common to the reader and writer to inherit. Since ASP has no writer, it doesn't make sense to separate this class from the ASPStreamReader
class below. I would unify the two classes, and inherit directly from VLBIStreamReaderBase
.
It would be reasonable, though, to create an ASPFileReader
class that includes read_frame
, read_header
, etc., like with VDIF. Doing so would make the code cleaner and fit better with the other formats, and make writing a tutorial revolving around adding ASP support more understandable.
|
||
fh_raw.seek(pos) | ||
|
||
super(ASPStreamReader, self).__init__(fh_raw, header0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even as a hack, it would be useful to have an open
function (not built by vlbi_base.base.make_opener
, since that allows for 'rb' and 'w' modes). The reason is because the stream reader currently doesn't open a string filename with io.open
, so the user has to do it themselves.
open
can be as simple as this:
def open(name):
# Open file using io if not already opened.
got_fh = hasattr(name, 'read')
if not got_fh:
name = io.open(name, 'rb')
# Wrap file pointer with stream reader.
try:
return ASPStreamReader(name)
except Exception as exc:
if not got_fh:
try:
name.close()
except Exception: # pragma: no cover
pass
raise exc
This simply tries to open the filename. If you want automatic file sequence opening, you can do:
from ..helpers import sequentialfile as sf
def open(name):
# If sequentialfile object, check that it's opened properly.
if isinstance(name, sf.SequentialFileBase):
assert name.mode == 'rb', (
"open only accepts sequential files opened in 'rb' "
"mode for reading.")
# If passed some kind of list, open a sequentialfile object.
if isinstance(name, (tuple, list, sf.FileNameSequencer)):
name = sf.open(name, 'rb')
# Open file using io if not already opened.
got_fh = hasattr(name, 'read')
if not got_fh:
name = io.open(name, 'rb')
# Wrap file pointer with stream reader.
try:
return ASPStreamReader(name)
except Exception as exc:
if not got_fh:
try:
name.close()
except Exception: # pragma: no cover
pass
raise exc
'open'] | ||
|
||
|
||
def fr_open(path, mode='rb'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since fr_open
and FileReader
are used only by the ASP format, they should go in the ASP folder, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some further thoughts: I don't think at_start
is a property that's currently called by anything in asp
. As far as I can tell, it's meant to allow the stream reader to seek to the beginning of file to read the file header, but this is only useful when the stream reader is passed an offset file pointer. Currently, no format can properly handle opening these. For example, with VDIF:
from baseband import vdif
from baseband.data import SAMPLE_VDIF
import io
fh_raw = io.open(SAMPLE_VDIF, 'rb')
fh_raw.seek(134)
fh = vdif.open(fh_raw, 'rs')
will return an EOFError. Doing the same thing with DADA returns a KeyError.
Instead of having at_start
, it may make more sense just to let ASPStreamReader
try and fail to read the header, since it's consistent with the other formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now had a look at the entire code. Most of my comments are for base.py in order to get the stream reader's read
, tell
, etc. methods reading properly, and fixes to header.py
to properly tell header and payload sizes. Some outstanding issues include sequential file support and a small quirk of the sample file; see code comments.
Once the code itself (and test suite) is finalized, we should think a bit about documentation and presentation. I don't think it necessary to write a briefing or have examples as with the DADA docs, but it might be reasonable to have asp.open
documented like dada.open
is, including parameters and what they control. I also think an ASP entry should be made under File Formats in the docs, with a quick note on the ASP page itself that the format is read-only. Lastly, the notes
file should be converted into a full-fledged tutorial, though that can be done in a separate pull request.
I'm personally a sticker for every commit included in master to be working code with a descriptive commit comment and without commented-out blocks or incomplete functions (though some of my own early PRs completely fail those requirements), so we may either squash and merge or do an interactive rebase before merging.
Also, we need to get your name on AUTHORS.rst!
'open'] | ||
|
||
|
||
def fr_open(path, mode='rb'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some further thoughts: I don't think at_start
is a property that's currently called by anything in asp
. As far as I can tell, it's meant to allow the stream reader to seek to the beginning of file to read the file header, but this is only useful when the stream reader is passed an offset file pointer. Currently, no format can properly handle opening these. For example, with VDIF:
from baseband import vdif
from baseband.data import SAMPLE_VDIF
import io
fh_raw = io.open(SAMPLE_VDIF, 'rb')
fh_raw.seek(134)
fh = vdif.open(fh_raw, 'rs')
will return an EOFError. Doing the same thing with DADA returns a KeyError.
Instead of having at_start
, it may make more sense just to let ASPStreamReader
try and fail to read the header, since it's consistent with the other formats.
import numpy as np | ||
from astropy.utils import lazyproperty | ||
|
||
__all__ = ['FileNameSequencer', 'SequentialFileReader', 'SequentialFileWriter', | ||
__all__ = ['FileReader', 'FileNameSequencer', 'SequentialFileReader', 'SequentialFileWriter', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bigger problem with sequential files is that there's a file header for every file in an ASP sequence, which we want to skip over by seeking past the file header bytes for all files other than the first. This requires some reworking of both SequentialFileBase
(notably, _open
, _file_sizes
and the way it's treated) and SequentialFileReader
(possibly file_size
, for consistency).
I think the modified versions of SequentialFileBase
and SequentialFileReader
- call them ASPSequentialFileBase
, etc., should go in asp/base.py
. Note that you can take advantage of diamond inheritance by doing:
class ASPSequentialFileBase(helpers.SequentialFileBase):
...
class ASPSequentialFileReader(helpers.SequentialFileReader, ASPSequentialFileBase):
...
which will let you override the SequentialFileBase
methods as needed without having to rewrite all of the SequentialFileReader
methods that depend on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be done with help of getting the raw offset from the frame index
@@ -0,0 +1,32 @@ | |||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain for ASP whether it's necessary to test anything other than the stream reader itself. @mhvk, what do you think - should the payload, header and frame classes also be tested?
We should also consider testing sequential files.
@@ -0,0 +1,29 @@ | |||
notes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is quite useful!
I'm hoping for these notes to eventually be turned into a design document and tutorial within the docs. It should be left out of the code folder itself.
('epoch', '<f4'), | ||
('pad', 'V2')]) # manual padding hack | ||
|
||
_header_parser = HeaderParser(make_parser_from_dtype(_dtype)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me why the header parser is needed. Currently fromvalues
and __setitem__
don't work (for good reasons), and __getitem__
has been over written. For keys, it's possible to just do _dtype.names
to recover a tuple of names.
|
||
|
||
NPOL = 2 | ||
NDIM = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell these are also hardcoded into dspsr, but the sample file's file header gives nchan = 8, as does /mnt/scratch-lustre/mahajan/Ar_P2067/P2067_1/p2067/node16/2006_05_18_01/000007.asp.raw
. Are the file headers not to be trusted?
@@ -111,3 +111,10 @@ def _full_path(name, dirname=_path.dirname(_path.abspath(__file__))): | |||
First ten frames extracted from b0329 DRAO corrupted raw data file | |||
0059000.dat. | |||
""" | |||
|
|||
SAMPLE_ASP = _full_path('sample_asp.asp') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file header contains imjd and fmjd values:
imjd: 53873,
fmjd: 0.32216575686339866,
while the frame header seems to always have the less accurate:
iMJD: 53873.0,
fMJD: 0.3225,
Reading /mnt/scratch-lustre/mahajan/Ar_P2067/P2067_1/p2067/node16/2006_05_18_01/000007.asp.raw
, however, I see that the frame header has just as accurate a fractional MJD value as the file header. Is this the case for all ASP files, and if so, can the sample file be altered to match?
pos = pos2 | ||
|
||
fh_raw.seek(pos) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get read
, tell
, etc., working in units of time, we could add:
@property
def start_time(self):
return self.header0.file_header.time
@lazyproperty
def stop_time(self):
"""Time at the end of the file, just after the last sample.
See also `start_time` for the start time of the file, and `time` for
the time of the sample pointer's current offset.
"""
return (self.header0.file_header.time +
(self._nsample / self.sample_rate).to(u.s))
@lazyproperty
def _nframe(self):
# May misbehave for partial last frames...
curr_pos = self.fh_raw.tell()
nframe = ((self.fh_raw.seek(0, 2) - self.header0.file_header.nbytes) //
self.header0.frame_nbytes)
self.fh_raw.seek(curr_pos)
return nframe
@lazyproperty
def _nsample(self):
"""Number of complete samples in the stream data."""
return self._nframe * self.header0.samples_per_frame
def _read_frame(self, index):
self.fh_raw.seek(index * self.header0.frame_nbytes +
self.header0.file_header.nbytes)
return self.read_frame()
closing in favour of #439, which builds on this PR, addressing the comments, and providing more complete support and better integration. |
basic asp StreamReader support