Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the Collision Energy from spectral Data to override CE from the INI file. #316

Open
sneumann opened this issue Aug 11, 2022 · 2 comments

Comments

@sneumann
Copy link
Member

sneumann commented Aug 11, 2022

Hi,
The collision energy information is used in two places in a MassBank record, the AC$MASS_SPECTROMETRY: COLLISION_ENERGY and (optionally) in the Title. Hence, the INI file defines the ce long form for the former, and a ces short form that can be used in the title generator. This requires that all CEs are the same across all the input files.

I think it would be great to use the collision energy information from the mzML or MSP input files. @achimmiri has some examples in MSP files.

A question is: should we 1) use the CE from spectral data to override the information in the INI files. Or 2) should we use the CE from the spectral data by default, and only fall back to the info in the INI file if it is missing in the spectral data ? IIRC the original reason for the CE (and resolution) info in the INI was because UFZ had quite nice and fixed instrument methods cycling through a few combinations, and resolution is certainly not included in the mzML.

Thoughts ? The least invasive approach would be to get CE parsing into the readMSP() and some if/else into the record creation if that CE information is present.

Yours,
Steffen

@schymane
Copy link
Member

It would be great ... but it is not always represented accurately iirc, hence the manual input option is still desirable to avoid errors / to maintain accuracy. Originally it was not available at all, but now some information is available in some cases in mzML I think. While I am not sure what the current status is wrt CE and mzML, last time I checked ramping was still displayed incorrectly (but also represented in a misleading manner in the raw files) for Thermo, for instance.

Perhaps someone could look into this to see how far things can be automated (for which vendor / acquisition types), and which cases should be overruled manually? Not sure if @meowcat has a suitable range of files available to do this, or someone in Halle? I don't offhand (sorry).

@meowcat
Copy link

meowcat commented Aug 24, 2022

  • As @schymane said, the CE info in the mzML files is not perfect, e.g. stepped collision energies (say, 10, 30, 60) are represented by the mean (33.3)
  • Further, there is the legacy problem that CEs were mapped to accession numbers. Somewhat less of an issue with the new customizable ACCESSIONs but we need a sensible default.

Probably something like the following should work:

  • a new spectraListMode (better name?) parameter with options
    • auto: no CE list required, just automatically use value from mzML.
    • map: match the CE from mzML with an entry from spectraList, and fail if there is no match. This will be the new default, it is safer than the existing option because it makes sure nothing is accidentally mismatched
    • manual: ignore all data from mzML and assume CEs are in the listed order. This is the fallback for cases where there is no useful info in the mzML
  • additionally, in spectraList entries there is an optional map parameter which tells us what apparent CE this spectrum will have in the raw data. (specify tolerance?). Example
  - mode: HCD
    ces: 10%, 30%, 60% (stepped)
    ce: 10%, 30%, 60% (stepped)
    res: 7500 
    map: 33.3 # this tells us that the spectra for this CE settings have value 33.3 in the mzML
  • For the case of spectraListMode: auto, use the condition_hash in the ACCESSION by default; this is a 4-character hash derived from INSTRUMENT_TYPE, POLARITY, COLLISION_ENERGY etc.:

    RMassBank/R/buildRecord.R

    Lines 464 to 469 in 73f172e

    variables$condition <- glue_data(
    variables$metadata,
    "{INSTRUMENT_TYPE}${MS_TYPE}${ION_MODE}${IONIZATION}${FRAGMENTATION_MODE}${COLLISION_ENERGY}")
    variables$condition_hash <- substr(
    toupper(digest(variables$condition, serialize=FALSE)),
    1, .adductHashSize)
  • For the other cases, keep ACCESSION generation unchanged for backward "compatibility"? This can be discussed.

Only few cases wouldn't work with map: specifically you can construct cases where different stepped-CE settings give the same average and so are indistinguishable from the mzML. I don't think it would be much of a problem in practice.

What I consider a more annoying issue is how to represent CEs in a machine-readable way. Do the CVs have provisions for stepped, ramped etc cases? How can we include those?

@tsufz tsufz modified the milestones: UI / UX, Improve configuration option Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants