Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polish code with flake8 output & add functional test 5 and 6 #63

Merged
merged 5 commits into from
Aug 29, 2023
Merged

Polish code with flake8 output & add functional test 5 and 6 #63

merged 5 commits into from
Aug 29, 2023

Conversation

XingerTang
Copy link
Contributor

Polish code according to AlphaGenes/tinyhouse#5
Add functional tests 5 and 6 #57 #58

I fix the test 6 by adding a result-checking part.
For test 5, originally test 5 did not have any data, so I copied the data of test 1 to test 5, and I set the expected data to check against the output of the command of the input data. Thus, I don't know if the code behaved as expected, maybe we need to assess the validity of the data later.

@XingerTang XingerTang changed the title Polish code with black8 output & add functional test 5 and 6 Polish code with flack8 output & add functional test 5 and 6 Aug 28, 2023
@XingerTang XingerTang changed the title Polish code with flack8 output & add functional test 5 and 6 Polish code with flake8 output & add functional test 5 and 6 Aug 28, 2023
@@ -241,64 +235,64 @@ def updateSeqError_ind(counts, errors, refReads, altReads, genoProbs):
#


def updateSeg(peelingInfo):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XingerTang these edits to getNewton() and related functions in this PeelingUpdates.py file should not be in this PR - best to keep each PR simple and doing mostly one thing - polishing flake8 and tests is fine, but you are doing something else here,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregorgorjanc I comment this part because this whole part of the code is broken (more details about it in the report I wrote), and updateSeg() is trying to call a function setupTransmission() which is not defined so the flake8 complained about this. So, it is flake8 related but I did do some extra here, I can uncomment the functions other than setupTransmission() if it is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregorgorjanc I comment this part because this whole part of the code is broken (more details about it in the report I wrote), and updateSeg() is trying to call a function setupTransmission() which is not defined so the flake8 complained about this. So, it is flake8 related but I did do some extra here, I can uncomment the functions other than setupTransmission() if it is better.

I see! Hmm, odd, BUT good find!

Please open an issue so we have a history of such changes in the repository with a clear statement of what we should do and then PRs solving the issue are linked to the issues through "hash-links".

Then let's solve this as part of that other issue. Best to work one step at a time;)

@gregorgorjanc
Copy link
Member

I fix the test 6 by adding a result-checking part.

Test 6 is for the sex-chromosome functionality (#58). Do we have sex-chromosome genotype or sequence data - I don't see these files in this PR (are they already in the git repo from before)?

For test 5, originally test 5 did not have any data, so I copied the data of test 1 to test 5, and I set the expected data to check against the output of the command of the input data. Thus, I don't know if the code behaved as expected, maybe we need to assess the validity of the data later.

Yeah, for test 5 we should look at the error-rate functionality (#57), so we need to discuss how to do a reasonable testing case! Do you have any suggestion how to do this in a simple way? Perhaps adding some errors into the input data compared to the true data and test that AlphaPeel recovers the true state?

@XingerTang
Copy link
Contributor Author

I fix the test 6 by adding a result-checking part.

Test 6 is for the sex-chromosome functionality (#58). Do we have sex-chromosome genotype or sequence data - I don't see these files in this PR (are they already in the git repo from before)?

For test 5, originally test 5 did not have any data, so I copied the data of test 1 to test 5, and I set the expected data to check against the output of the command of the input data. Thus, I don't know if the code behaved as expected, maybe we need to assess the validity of the data later.

Yeah, for test 5 we should look at the error-rate functionality (#57), so we need to discuss how to do a reasonable testing case! Do you have any suggestion how to do this in a simple way? Perhaps adding some errors into the input data compared to the true data and test that AlphaPeel recovers the true state?

@gregorgorjanc Fort test 6, the sex-related part is from the command and the input file pedigree.txt:

AlphaPeel -genotypes test6/genotypes.txt \
                          -seqfile test6/seqfile.txt \
                          -pedigree test6/pedigree.txt \
                          -runType multi \
                          -calling_threshold .1 \
                          -sexchrom \
                          -out test6/outputs/output

where -sexchrom indicates that this is a sex chromosome and

genotypes_0 0 0 M
genotypes_2 0 0 F
genotypes_M genotypes_0 genotypes_2 M 
genotypes_F genotypes_0 genotypes_2 F
sequence_0 0 0 M
sequence_2 0 0 F
sequence_M0 sequence_0 sequence_2 M 
sequence_F0 sequence_0 sequence_2 F
sequence_M9 sequence_0 sequence_2 M 
sequence_F9 sequence_0 sequence_2 F

gives the sex information.

@XingerTang
Copy link
Contributor Author

@gregorgorjanc For test 5, if we add some errors to the input data and compare the output to the true state, I'm wondering if the corrupted data can be recovered 100% to the true state, if not then how can we know which part of the loss of the accuracy is not error rate related and which part is caused by the input error rate. If we cannot separate these two cases then we won't know whether the effect of the input error rate is as expected. And for those that can return to the true state 100%, we can't tell if the accuracy is just a coincidence or a result of the correct error rate.
So, I prefer to prepare simple input data and compute the expected outcome by hand to see whether the computed and expected outcomes match.

@gregorgorjanc
Copy link
Member

Fort test 6, the sex-related part is from the command and the input file pedigree.txt:

AlphaPeel -genotypes test6/genotypes.txt \
                          -seqfile test6/seqfile.txt \
                          -pedigree test6/pedigree.txt \
                          -runType multi \
                          -calling_threshold .1 \
                          -sexchrom \
                          -out test6/outputs/output

where -sexchrom indicates that this is a sex chromosome and

genotypes_0 0 0 M
genotypes_2 0 0 F
genotypes_M genotypes_0 genotypes_2 M 
genotypes_F genotypes_0 genotypes_2 F
sequence_0 0 0 M
sequence_2 0 0 F
sequence_M0 sequence_0 sequence_2 M 
sequence_F0 sequence_0 sequence_2 F
sequence_M9 sequence_0 sequence_2 M 
sequence_F9 sequence_0 sequence_2 F

gives the sex information.

@XingerTang I understand, BUT have you looked if the input and output make sense in the context of sex chromosome - females have two copies of X chromosome so they can have allele dosage of 0, 1, or 2, while males have one copy of X, so they can have allele dosage of 0 or 1. At the moment, we ignore the PAR (pseudo-autosomal region) of the Y chromosome, where males can have allele dosage of 0, 1, or 2 (will open an issue for this for future work - not a priority now).

@gregorgorjanc
Copy link
Member

@gregorgorjanc For test 5, if we add some errors to the input data and compare the output to the true state, I'm wondering if the corrupted data can be recovered 100% to the true state, if not then how can we know which part of the loss of the accuracy is not error rate related and which part is caused by the input error rate. If we cannot separate these two cases then we won't know whether the effect of the input error rate is as expected. And for those that can return to the true state 100%, we can't tell if the accuracy is just a coincidence or a result of the correct error rate.
So, I prefer to prepare simple input data and compute the expected outcome by hand to see whether the computed and expected outcomes match.

I moved this discussion to #57

@XingerTang
Copy link
Contributor Author

Fort test 6, the sex-related part is from the command and the input file pedigree.txt:

AlphaPeel -genotypes test6/genotypes.txt \
                          -seqfile test6/seqfile.txt \
                          -pedigree test6/pedigree.txt \
                          -runType multi \
                          -calling_threshold .1 \
                          -sexchrom \
                          -out test6/outputs/output

where -sexchrom indicates that this is a sex chromosome and

genotypes_0 0 0 M
genotypes_2 0 0 F
genotypes_M genotypes_0 genotypes_2 M 
genotypes_F genotypes_0 genotypes_2 F
sequence_0 0 0 M
sequence_2 0 0 F
sequence_M0 sequence_0 sequence_2 M 
sequence_F0 sequence_0 sequence_2 F
sequence_M9 sequence_0 sequence_2 M 
sequence_F9 sequence_0 sequence_2 F

gives the sex information.

@XingerTang I understand, BUT have you looked if the input and output make sense in the context of sex chromosome - females have two copies of X chromosome so they can have allele dosage of 0, 1, or 2, while males have one copy of X, so they can have allele dosage of 0 or 1. At the moment, we ignore the PAR (pseudo-autosomal region) of the Y chromosome, where males can have allele dosage of 0, 1, or 2 (will open an issue for this for future work - not a priority now).

I checked the input genotypes and the sequence file, and almost every entry is filled with 0 or 9 except the genotypes_2, which has 2 on every locus, but the individual is a female, so the input data is fine. However, the output data (both the output generated by AlphaPeel and the expected output written a long time ago) contains male individuals with an allele dosage of 2. The AlphaPeel documentation marked the -sexchrom as an experimental option, so it might be still under development and requires further refinement.

@gregorgorjanc
Copy link
Member

I checked the input genotypes and the sequence file, and almost every entry is filled with 0 or 9 except the genotypes_2, which has 2 on every locus, but the individual is a female, so the input data is fine. However, the output data (both the output generated by AlphaPeel and the expected output written a long time ago) contains male individuals with an allele dosage of 2. The AlphaPeel documentation marked the -sexchrom as an experimental option, so it might be still under development and requires further refinement.

@XingerTang ok, I propose to get this PR in and we open a new issue to address the sexchrom output, but I think this also raises the fact that we don't have sufficient information about these tests to really trust them.

@gregorgorjanc
Copy link
Member

@XingerTang resolve the stuff mentioned in #63 (comment) and I will merge this PR.

@gregorgorjanc
Copy link
Member

Merging this in to move to the next issues, some discovered as part of this PR work.

@gregorgorjanc gregorgorjanc merged commit 466e303 into AlphaGenes:devel Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants