Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running CNAqc analyze_peaks by chromosome does not populate the object for fragmented samples #40

Open
kmavrommatis opened this issue Jan 14, 2025 · 1 comment

Comments

@kmavrommatis
Copy link

kmavrommatis commented Jan 14, 2025

Hi, thank you for your help so far with #39

Going over your code and regarding the last point of checking by chromosome, in a different sample I have two versions of CNV predictions, from the same CNV caller with different arguments. One is very fragmented, the other much less. I would like to use CNAqc to distinguish which version of the segmentation is (more) correct - in this case I know the overfragmented is not correct, from orthogonal data, but I am trying to devise a methodology to apply to similar situations where I don't have any prior knowledge of the ploidy/segmentation of the sample and have to rely on what the CNV method is producing

For the fragmented version of the sample I reach a situation that now all karyotypes have PASSed qc after analyze_peaks, however, when i run by chromosome only a couple chromosomes have results. In the resulting object, even for chromosomes that don't have peak analysis, the x$cna object has QC_PASS = TRUE for all segments having the karyotypes tested.
The less fragmented version produces results for all chromosomes.

Does this mean that in the case of the fragmented sample the overall QC estimation of the sample is determined by a few peaks, on a couple chromosomes only?
should this be an indication of weak (?) estimation and be weighted by chromosome size or some other metric?

Thanks again for your help

cnvs-notfragmented.rds.gz
mutations.rds.gz
cnvs-fragmented.rds.gz

require(dplyr)
require(CNAqc)

mut = readRDS("~/Downloads/mutations.rds")
cnvs.f= readRDS("~/Downloads/cnas-fragmented.rds")
cnvs.u= readRDS("~/Downloads/cnas-notfragmented.rds")

x.f=CNAqc::init(
  mutations=mut,  # mutations predicted using GATK Mutect2 on tumor/matched normal 
  cna=cnvs.f,
  purity=0.33,  
  sample='test',
  ref='hg38'
)

x.u=CNAqc::init(
  mutations=mut,  # mutations predicted using GATK Mutect2 on tumor/matched normal 
  cna=cnvs.u,
  purity=0.33,  
  sample='test',
  ref='hg38'
)
x.u = CNAqc::analyze_peaks(x.u, n_bootstrap = 10)
plot_peaks_analysis(x.u)
x.f = CNAqc::analyze_peaks(x.f, n_bootstrap = 10)
plot_peaks_analysis(x.f)


x_chr.f = x.f %>% 
  split_by_chromosome() %>% 
  lapply(function(w) {analyze_peaks(w)})

x_chr.u = x.u %>% 
  split_by_chromosome() %>% 
  lapply(function(w) {analyze_peaks(w)})


x_chr.u$chr2 %>% plot_peaks_analysis() # works

x_chr.f$chr2 %>% plot_peaks_analysis() # does not produce anything
Warning message:
In plot_peaks_analysis(.) :
  Input does not have peaks, see ?peaks_analysis to run peaks analysis.
@caravagn
Copy link
Collaborator

Hi, I see your points. Base on MY experience, whereas fragmented chromosome do exist, the patterns of over-fragmentation are usually localised. For instance, chromotripsis is often localised in a few megabases, and involves up to a handful of chromosomes.

Your fragmented solution has

x.f=CNAqc::init(
  mutations=mut,  # mutations predicted using GATK Mutect2 on tumor/matched normal 
  cna=cnvs.f,
  purity=0.33,  
  sample='test',
  ref='hg38'
) %>% detect_arm_overfragmentation()

x.f %>% plot_segments()
x.f %>% plot_arm_fragmentation()

21 overfragmented chromosomes or so, which to me it's very suspicious. On the other hand, your fragmented solution suggests a low purity sample to work with, so the situation is complicated.

QC per chromosome is difficult here for one reason: right now the algorithm pools down mutations on chromosome segments with at least N mutations. If I try to subset your segments they are so tiny that nothing remains, that's mainly a filtering issue.

x.f=CNAqc::init(
  mutations=mut,  # mutations predicted using GATK Mutect2 on tumor/matched normal 
  cna=cnvs.f,
  purity=0.33,  
  sample='test',
  ref='hg38'
) %>% detect_arm_overfragmentation() %>% 
  subset_by_segment_minmutations(50)

x.f %>% plot_segments()

I think we have to find specific parameters/ workaround for your samples -- it should not be impossible. You work in a difficult setup though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants