Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusual multi-FCS files #24

Open
photocyte opened this issue Sep 10, 2020 · 8 comments
Open

Unusual multi-FCS files #24

photocyte opened this issue Sep 10, 2020 · 8 comments

Comments

@photocyte
Copy link

Hi there,

I've come across FCS files (From the Luminex Muse), which implement multi-FCS by simple concatenating single FCS files together. This was my solution to split them:

files = glob.glob("ADM_*.VIA.FCS")
for f in files:
    handle = open(f,"rb")
    data = handle.read()
    ##Some FCS files are just literal concatenations of single FCS files, this splits them.
    split_data = data.split(b"FCS3.0")
    for s in range(1,len(split_data)):
        handle = open(f+"_"+str(s)+".FCS","wb")
        handle.write(b"FCS3.0"+split_data[s])
        handle.close()

Once these multi-FCS files are split, fcsparser works perfectly, as far as I can tell. But it might be nice for the library to be able to detect these files by default! See attached for an example FCS:
ADM_09SEP2020_181310.VIA.FCS.zip

@maaikesangster
Copy link

Hello,

I had the same issue, thank you for this solution! I am using the cytoflow package for my parsing and analysis of FC data and I wanted to raise the issue with them as well. Do you mind if I use your example?

@photocyte
Copy link
Author

@maaikesangster Please feel free

@bpteague
Copy link
Contributor

Do note that fcsparser supports choosing which dataset in the file to parse out. You can use the data_set keyword argument to the FCSParser constructor. It's 0-indexed -- so data_set = 0 is the first data set, data_set = 1 is the second, etc.

@bpteague
Copy link
Contributor

(And @maaikesangster , cytoflow exposes the same functionality in ImportOp)

@photocyte
Copy link
Author

photocyte commented Jan 12, 2022

For me, for a file with 4x concatenated FCS files, this works for data_set=0 and data_set=1 , but for data_set=2 & data_set=3, it fails:

meta , data = fcsparser.parse(f,data_set=2)
Encountered an illegal utf-8 byte in the header.
 Illegal utf-8 characters will be ignored.
'utf-8' codec can't decode byte 0x8c in position 0: invalid start byte
20220112_files/ADM_12JAN2022_112816.VIA.FCS

All 4 files can be opened successfully when first separated via this approach (#24 (comment)) . Happy to share all 5 files (original + 4 split) if desired.

edit: here is the full error message

~/miniconda3/lib/python3.9/site-packages/fcsparser/api.py in parse(path, meta_data_only, compensate, channel_naming, reformat_meta, data_set, dtype)
    538     read_data = not meta_data_only
    539 
--> 540     fcs_parser = FCSParser(path, read_data=read_data, channel_naming=channel_naming,
    541                            data_set=data_set)
    542 

~/miniconda3/lib/python3.9/site-packages/fcsparser/api.py in __init__(self, path, read_data, channel_naming, data_set)
    105         if path:
    106             with open(path, 'rb') as f:
--> 107                 self.load_file(f, data_set=data_set, read_data=read_data)
    108 
    109     def load_file(self, file_handle, data_set=0, read_data=True):

~/miniconda3/lib/python3.9/site-packages/fcsparser/api.py in load_file(self, file_handle, data_set, read_data)
    117         while data_segments <= data_set:
    118             self.read_header(file_handle, nextdata_offset)
--> 119             self.read_text(file_handle)
    120             if '$NEXTDATA' in self.annotation:
    121                 data_segments += 1

~/miniconda3/lib/python3.9/site-packages/fcsparser/api.py in read_text(self, file_handle)
    215         #####
    216         # Parse the TEXT segment of the FCS file into a python dictionary
--> 217         delimiter = raw_text[0]
    218 
    219         if raw_text[-1] != delimiter:

IndexError: string index out of range

It seems data_set is looking to split on the string $NEXTDATA, whereas the example FCS file I've uploaded are just whole separate files that are concatenated, so they are instead separated by the FCS start bytes FCS3.0 .

@bpteague
Copy link
Contributor

@photocyte I'd love to add it to my collection of weird FCS files (: And if I can figure out the fix, I'll submit a pull request to @eyurtsev .

@photocyte
Copy link
Author

Thanks @bpteague ! See linked zip file below. That has the _1,_2,_3,_4 split off FCS files, plus the original FCS file ADM_12JAN2022_112816.VIA.FCS.

20220112_files.zip

I also realized I previously uploaded a file here (#24 (comment)) that should have the same phenomena, but maybe it isn't already split out.

@bpteague
Copy link
Contributor

bpteague commented Jan 13, 2022

@photocyte Thanks for the file. I found the problem, and the fix is easy. In fcsparser.api, on line 125, replace

nextdata_offset = self.annotation['$NEXTDATA']

with

nextdata_offset += self.annotation['$NEXTDATA']

@eyurtsev, I'll put together a test case and a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants