Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marker management repo for CL and CL-KG #14

Closed
dosumis opened this issue Jul 25, 2024 · 6 comments
Closed

Marker management repo for CL and CL-KG #14

dosumis opened this issue Jul 25, 2024 · 6 comments
Assignees

Comments

@dosumis
Copy link
Contributor

dosumis commented Jul 25, 2024

We will generate a new Cell Marker ODK repo to manage markers for CL & CL-KG : - cellmark

The first aim for this repo will be to generate NS-Forest markers following standard patterns developed for BDSO (see also obophenotype/cell-ontology#2439).

MVP:

  • cellmark id space (register)
  • Generates reference lung dataset NS-Forest markers for ingestion to CL
  • Specifies standard TSVs + location in repo for Submission new NS-Forest markers
  • Python populates 2 data files to drive templates
  • templates based on BDSO for:
    • marker sets
    • CL term comments with marker details

Challenges:

  • Namespace
    • OPTION1: register cellmark ID space with OBO Foundry
    • OPTION1: re-use CL
  • Managing Reference gene files
    • For now we will work with reference gene sets from the vars of AnnData files. A script will pull the reference AnnData files & extract reference gene tables (name and ID) from the vars--> TSV. Some slicing mechanism will pull genes that are referenced in template data files --> generate OWL files.
    • Longer term - we will need to develop a robust plan for management, working with Richard's group with advice from Chris and his team.
  • Managing which content goes to CL
    • Option 1: separate products - one for CL and one for KG. CL product becomes component of CL build. We will do this for now.
    • Option 2: Single product used by CL import pipeline. This is appealing, but may hit scaling issues in future for CL import build.
@hkir-dev
Copy link
Collaborator

New repo created: https://github.com/Cellular-Semantics/CellMark (development ongoing)

LungMAP is using ensemble genes:
image

Lung cell atlas as well:
image

But in the template we have ncbi genes, in the DOSDP template https://github.com/Cellular-Semantics/CellMark/blob/main/src/markers/NSForestMarkersSource.tsv

Should we use ensemble genes in the DOSDP template as well? Who can help me for picking the correct genes?

@dosumis
Copy link
Contributor Author

dosumis commented Jul 29, 2024

What did we get from Renne? She should be able to provide a mapping if she is providing NCBI gene IDs.

@hkir-dev
Copy link
Collaborator

I'm uncertain about the origin of these minimal markers. I believed that one of our curators had manually extracted them.

@dosumis
Copy link
Contributor Author

dosumis commented Jul 29, 2024

They come from Renne's analysis. I will dig out the emails. One thing we need for the marker repo is a standard place to put files like this.

@dosumis
Copy link
Contributor Author

dosumis commented Jul 29, 2024

Adding links and file here for now.

Renne's Zenodo pub - notebooks - this gives us a DOI to ref for analysis.

Highlights of this analysis are in HLCA_CellRef_MarkerPerformance_forDOS.xlsx

An older HLCA NS Forest analysis can be found here -
exactMatch2CL_definitionAdditions.xlsx + related ticket ( obophenotype/cell-ontology#2313) - but I believe this is superseded by analysis in Renne's pub above.

@ubyndr ubyndr closed this as completed Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants