Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to determine the value of the n_gems parameter #23

Open
wolfQK opened this issue Dec 4, 2024 · 2 comments
Open

How to determine the value of the n_gems parameter #23

wolfQK opened this issue Dec 4, 2024 · 2 comments

Comments

@wolfQK
Copy link

wolfQK commented Dec 4, 2024

Hi axelalmet ~
I noticed that your analysis code only uses two values for n_gems, 10 and 20. Could you please explain what criteria you used to determine the appropriate value for the n_gems parameter? Thanks! @axelalmet

fs.pp.construct_gems_using_pyliger(adata,
                                n_gems = 10,
                                layer_key = 'counts',
                                condition_key = condition_key)
@axelalmet
Copy link
Owner

Hi wolfQK,

This is a good question. For the applications considered in the paper, setting n_gems to be 10 or 20 worked pretty well in terms of capturing meaningful differences with respect to 1) cell state heterogeneity or 2) capturing spatially meaningful modules (that lined up with cell region annotation) 3) different biological conditions, e.g., healthy vs moderate vs severe COVID-19. This was evaluated by me looking at mean cellwise membership for each of the modules with respect to meaningful cell labels like condition or cell type annotation. When I originally was analysing the datasets, I considered the case where I had set 5, 10, 15, ... etc GEMs and found that, often, 10 or 20 worked best.

But in general, there's no reason you have to pick 10 or 20 GEMs. I think picking the right number of GEMs is an incredibly non-trivial exercise, and, to the best of my knowledge, there's no single good method for choosing the number of factors for a matrix factorisation-based method. I know cNMF uses the silhouette score, but the literature has shown that this can have its drawbacks.

Best wishes,
Axel.

@majeex233
Copy link

Hi axelalmet,
Thanks for your inspired answer! I am very interested to know which aspects of the results will be impacted after altering the parameters of nGEM?
Hoping for your earliest reply!

Best wishes,
Bella.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants