Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ncbi tests #86

Merged
merged 22 commits into from
Apr 26, 2021
Merged

Ncbi tests #86

merged 22 commits into from
Apr 26, 2021

Conversation

Anwesh1
Copy link
Contributor

@Anwesh1 Anwesh1 commented Apr 20, 2021

Added some tests for the ncbi.py file. Most of the functions return generator objects in the ncbi file. I was not exactly sure how to test for a few of the functions, so I checked if they were actually returning generator objects by using isinstance. Let me know how we could possibly improve this.

@Anwesh1 Anwesh1 closed this Apr 20, 2021
@Anwesh1 Anwesh1 reopened this Apr 20, 2021
@JohnGiorgi
Copy link
Collaborator

JohnGiorgi commented Apr 20, 2021

HI @Anwesh1,

Great! This is a good first start.

Testing whether the functions return an object of the expected type is good, but not sufficient. Ideally, you should also figure out what the expected output should be for the input you created (often this is stored in a variable called expected). Then you would call the function with the input (often storing the result in a variable called actual) and finally check if assert actual == expected.

Beyond this, it's good to craft tests that try to capture edge cases. E.g. for a function that expects a list as input, what happens when you pass the empty list? For any functions that raise errors, can you purposely trigger the error with a test? (you did this in #84) etc. etc. Each of these tests should go in their own function with a descriptive name, e.g. test_parse_medline_empty_list

tests/test_ncbi.py Outdated Show resolved Hide resolved
tests/test_ncbi.py Outdated Show resolved Hide resolved
tests/test_ncbi.py Outdated Show resolved Hide resolved
Anwesh1 and others added 3 commits April 20, 2021 17:13
Co-authored-by: John Giorgi <[email protected]>
Co-authored-by: John Giorgi <[email protected]>
Co-authored-by: John Giorgi <[email protected]>
@Anwesh1
Copy link
Contributor Author

Anwesh1 commented Apr 22, 2021

Hey @JohnGiorgi, I have a quick question. Most of these functions respond with a generator object, which looks like this:
<generator object parse at 0x7f562cfe9900>. This is our actual response, however, while comparing with our expected response, we get the same thing but the address always changes. How do we check if they are the same if the object address keeps changing?
Issue:

image

@JohnGiorgi
Copy link
Collaborator

Hi @Anwesh1,

I would get the response from the function and convert it to a list, e.g.

actual = list(my_function(...))

I would craft expected as a list as well, then assert actual == expected.

@Anwesh1
Copy link
Contributor Author

Anwesh1 commented Apr 22, 2021

Hey @JohnGiorgi,

That still wouldn't fix the error because the addresses won't be a match right? It will just convert the comparing to a list.

@JohnGiorgi
Copy link
Collaborator

JohnGiorgi commented Apr 22, 2021

The reason you are getting errors above with == is because you are trying to check equivalence between two generators. Casting actual and expected to lists should fix this.

More broadly, I think you may be mixing up is and ==. is checks identity (memory addresses) and == checks for equivalence. E.g.

print(1.0 == 1)  # True
print(1.0 is 1)  # False

We don't want to check that the memory addresses are the same (they won't be). We want to check that the actual output of our function is equivalent to the output we expect, hence assert actual == expected.

@Anwesh1
Copy link
Contributor Author

Anwesh1 commented Apr 22, 2021

I think I'm approaching the problem wrong. I thought we were supposed to provide the function with the expected response, so I put in the string we expect. This is what my function looks like:

def test_get_eutil_records():
    eutil = "efetch"
    id = "9887103"
    expected_response = list('<generator object parse at 0x7f562cfe9900>')
    actual_response = list(str(_get_eutil_records(eutil,id)))
    assert isinstance(_get_eutil_records(eutil, id), types.GeneratorType)
    assert actual_response == expected_response

@JohnGiorgi
Copy link
Collaborator

JohnGiorgi commented Apr 22, 2021

I think I'm approaching the problem wrong. I thought we were supposed to provide the function with the expected response, so I put in the string we expect. This is what my function looks like:

def test_get_eutil_records():
    eutil = "efetch"
    id = "9887103"
    expected_response = list('<generator object parse at 0x7f562cfe9900>')
    actual_response = list(str(_get_eutil_records(eutil,id)))
    assert isinstance(_get_eutil_records(eutil, id), types.GeneratorType)
    assert actual_response == expected_response

Hmm yeah, there are a couple of weird things happening here:

  • list('<generator object parse at 0x7f562cfe9900>') this is not correct, you need to actually figure out what returned value should be
  • list(str(_get_eutil_records(eutil,id))) not sure why the output is being cast to a string?

Here is a super stripped down example of a unit test:

def square(x):
    return x ** 2

def test_square():
    test_input = 2
    expected = 4
    actual = square(test_input)
    assert actual == expected

Does that help? You need to:

  1. Create an input for your function (test_input)
  2. Determine what the output should be (expected)
  3. Call the function on the input (actual = square(test_input))
  4. Assert the functions output matches the expected output (assert actual == expected)

@Anwesh1
Copy link
Contributor Author

Anwesh1 commented Apr 22, 2021

Hey @JohnGiorgi,

I believe I am doing it in the steps you described. Where the test_input are the eutil and the id variable.
The expected is this -> <generator object parse at 0x7f562cfe9900>
I make actual equal to this -> _get_eutil_records(eutil,id)
Then I compare them both.

The function only returns the address of the parsed generator object and not the actual object itself.

That's why when I compare them now, the only difference is the addresses and everything else works:

image

@JohnGiorgi
Copy link
Collaborator

JohnGiorgi commented Apr 22, 2021

Hmm, this isn't following the steps we worked out though? You are still comparing addresses here which is not the idea.

We want to take whatever the function returns (a generator) and cast it to a list. Then we are going to craft an expected object that is also a list. Then we are going to assert they are equal.

@JohnGiorgi
Copy link
Collaborator

JohnGiorgi commented Apr 22, 2021

Maybe if you push what you have I can comment on it directly?

In general, copy-pasting code into a comment or pushing it is better than screenshots. The screenshot above for example doesn't show how actual_response or expected_response are generated.

@Anwesh1
Copy link
Contributor Author

Anwesh1 commented Apr 22, 2021

I pushed what I have. Let me know if you find what I can fix.

def test_get_eutil_records():
eutil = "efetch"
id = "9887103"
expected_response = '<generator object parse at 0x7f562cfe9900>'
Copy link
Collaborator

@JohnGiorgi JohnGiorgi Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. You need to figure out what the output should be (after casting it to a list) and set expected_response to that. This is just the string representation of the object.

Copy link
Contributor Author

@Anwesh1 Anwesh1 Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected list is something like this: (I got it from printing the response of the function)

[{'PMID': '9887103', 'OWN': 'NLM', 'STAT': 'MEDLINE', 'DCOM': '19990225', 'LR': '20190516', 'IS': '0890-9369 (Print) 0890-9369 (Linking)', 'VI': '13', 'IP': '1', 'DP': '1999 Jan 1', 'TI': 'The Drosophila activin receptor baboon signals through dSmad2 and controls cell proliferation but not patterning during larval development.', 'PG': '98-111', 'AB': 'The TGF-beta superfamily of growth and differentiation factors, including TGF-beta, Activins and bone morphogenetic proteins (BMPs) play critical roles in regulating the development of many organisms. These factors signal through a heteromeric complex of type I and II serine/threonine kinase receptors that phosphorylate members of the Smad family of transcription factors, thereby promoting their nuclear localization. Although components of TGF-beta/Activin signaling pathways are well defined in vertebrates, no such pathway has been clearly defined in invertebrates. In this study we describe the role of Baboon (Babo), a type I Activin receptor previously called Atr-I, in Drosophila development and characterize aspects of the Babo intracellular signal-transduction pathway. Genetic analysis of babo loss-of-function mutants and ectopic activation studies indicate that Babo signaling plays a role in regulating cell proliferation. In mammalian cells, activated Babo specifically stimulates Smad2-dependent pathways to induce TGF-beta/Activin-responsive promoters but not BMP-responsive elements. Furthermore, we identify a new Drosophila Smad, termed dSmad2, that is most closely related to vertebrate Smads 2 and 3. Activated Babo associates with dSmad2 but not Mad, phosphorylates the carboxy-terminal SSXS motif and induces heteromeric complex formation with Medea, the Drosophila Smad4 homolog. Our results define a novel Drosophila Activin/TGF-beta pathway that is analogous to its vertebrate counterpart and show that this pathway functions to promote cellular growth with minimal effects on patterning.', 'FAU': ['Brummel, T', 'Abdollah, S', 'Haerry, T E', 'Shimell, M J', 'Merriam, J', 'Raftery, L', 'Wrana, J L', "O'Connor, M B"], 'AU': ['Brummel T', 'Abdollah S', 'Haerry TE', 'Shimell MJ', 'Merriam J', 'Raftery L', 'Wrana JL', "O'Connor MB"], 'AD': ['Department of Molecular Biology and Biochemistry, University of California, Irvine, California 92697, USA.'], 'LA': ['eng'], 'SI': ['GENBANK/AF101386'], 'GR': ['GM47462/GM/NIGMS NIH HHS/United States'], 'PT': ['Journal Article', "Research Support, Non-U.S. Gov't", "Research Support, U.S. Gov't, P.H.S."], 'PL': 'United States', 'TA': 'Genes Dev', 'JT': 'Genes & development', 'JID': '8711660', 'RN': ['0 (Bone Morphogenetic Proteins)', '0 (DNA-Binding Proteins)', '0 (Drosophila Proteins)', '0 (RNA, Messenger)', '0 (Receptors, Growth Factor)', '0 (Smad2 Protein)', '0 (Trans-Activators)', 'EC 2.7.11.30 (Activin Receptors)', 'EC 2.7.11.30 (Activin Receptors, Type I)', 'EC 2.7.11.30 (Babo protein, Drosophila)'], 'SB': 'IM', 'MH': ['Activin Receptors', 'Activin Receptors, Type I', 'Amino Acid Sequence', 'Animals', 'Bone Morphogenetic Proteins/genetics', 'Cell Division', 'Cloning, Molecular', 'DNA-Binding Proteins/chemistry/*genetics', 'Drosophila/*embryology', 'Drosophila Proteins', 'Gene Expression Regulation, Developmental', 'In Situ Hybridization', 'Larva/genetics/*growth & development', 'Molecular Sequence Data', 'Phosphorylation', 'RNA, Messenger/genetics', 'Receptors, Growth Factor/*genetics/metabolism', 'Sequence Alignment', 'Sequence Analysis, DNA', 'Signal Transduction/*physiology', 'Smad2 Protein', 'Trans-Activators/chemistry/*genetics', 'Wings, Animal/growth & development'], 'PMC': 'PMC316373', 'EDAT': '1999/01/14 00:00', 'MHDA': '1999/01/14 00:01', 'CRDT': ['1999/01/14 00:00'], 'PHST': ['1999/01/14 00:00 [pubmed]', '1999/01/14 00:01 [medline]', '1999/01/14 00:00 [entrez]'], 'AID': ['10.1101/gad.13.1.98 [doi]'], 'PST': 'ppublish', 'SO': 'Genes Dev. 1999 Jan 1;13(1):98-111. doi: 10.1101/gad.13.1.98.'}]

Now I made that equal to the expected. Then I made actual equal to what you had said (list(_get_eutil_records(eutil,id))). However, this still triggers an error on the pytest test. The error looks like this:

>       assert actual_response == expected_response
E       assert [{'<?xm': ['v...icle>']}, ...] == [{'AB': 'The ...', ...], ...}]
E         At index 0 diff: {'<?xm': ['version="1.0" ?>'], '<!DO': ['YPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2019//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_190101.dtd">'], '<Pub': ['dArticleSet>', 'dArticle>'], '': ['edlineCitation Status="MEDLINE" Owner="NLM">', '  <PMID Version="1">9</PMID>', '  <DateCompleted>', '      <Year>1976</Year>', '      <Month>01</Month>', '      <Day>23</Day>', '  </DateCompleted>', '  <DateRevised>', '      <Year>2019</Year>', '      <Month>06</Month>', '      <Day>23</Day>', '  </DateRevised>', '  <Article PubM...
E         
E         ...Full output truncated (3 lines hidden), use '-vv' to show

tests/test_ncbi.py:42: AssertionError

This is my updated function:

def test_get_eutil_records():
    eutil = "efetch"
    id = "9887103"
    expected_response = [{'PMID': '9887103', 'OWN': 'NLM', 'STAT': 'MEDLINE', 'DCOM': '19990225', 'LR': '20190516', 'IS': '0890-9369 (Print) 0890-9369 (Linking)', 'VI': '13', 'IP': '1', 'DP': '1999 Jan 1', 'TI': 'The Drosophila activin receptor baboon signals through dSmad2 and controls cell proliferation but not patterning during larval development.', 'PG': '98-111', 'AB': 'The TGF-beta superfamily of growth and differentiation factors, including TGF-beta, Activins and bone morphogenetic proteins (BMPs) play critical roles in regulating the development of many organisms. These factors signal through a heteromeric complex of type I and II serine/threonine kinase receptors that phosphorylate members of the Smad family of transcription factors, thereby promoting their nuclear localization. Although components of TGF-beta/Activin signaling pathways are well defined in vertebrates, no such pathway has been clearly defined in invertebrates. In this study we describe the role of Baboon (Babo), a type I Activin receptor previously called Atr-I, in Drosophila development and characterize aspects of the Babo intracellular signal-transduction pathway. Genetic analysis of babo loss-of-function mutants and ectopic activation studies indicate that Babo signaling plays a role in regulating cell proliferation. In mammalian cells, activated Babo specifically stimulates Smad2-dependent pathways to induce TGF-beta/Activin-responsive promoters but not BMP-responsive elements. Furthermore, we identify a new Drosophila Smad, termed dSmad2, that is most closely related to vertebrate Smads 2 and 3. Activated Babo associates with dSmad2 but not Mad, phosphorylates the carboxy-terminal SSXS motif and induces heteromeric complex formation with Medea, the Drosophila Smad4 homolog. Our results define a novel Drosophila Activin/TGF-beta pathway that is analogous to its vertebrate counterpart and show that this pathway functions to promote cellular growth with minimal effects on patterning.', 'FAU': ['Brummel, T', 'Abdollah, S', 'Haerry, T E', 'Shimell, M J', 'Merriam, J', 'Raftery, L', 'Wrana, J L', "O'Connor, M B"], 'AU': ['Brummel T', 'Abdollah S', 'Haerry TE', 'Shimell MJ', 'Merriam J', 'Raftery L', 'Wrana JL', "O'Connor MB"], 'AD': ['Department of Molecular Biology and Biochemistry, University of California, Irvine, California 92697, USA.'], 'LA': ['eng'], 'SI': ['GENBANK/AF101386'], 'GR': ['GM47462/GM/NIGMS NIH HHS/United States'], 'PT': ['Journal Article', "Research Support, Non-U.S. Gov't", "Research Support, U.S. Gov't, P.H.S."], 'PL': 'United States', 'TA': 'Genes Dev', 'JT': 'Genes & development', 'JID': '8711660', 'RN': ['0 (Bone Morphogenetic Proteins)', '0 (DNA-Binding Proteins)', '0 (Drosophila Proteins)', '0 (RNA, Messenger)', '0 (Receptors, Growth Factor)', '0 (Smad2 Protein)', '0 (Trans-Activators)', 'EC 2.7.11.30 (Activin Receptors)', 'EC 2.7.11.30 (Activin Receptors, Type I)', 'EC 2.7.11.30 (Babo protein, Drosophila)'], 'SB': 'IM', 'MH': ['Activin Receptors', 'Activin Receptors, Type I', 'Amino Acid Sequence', 'Animals', 'Bone Morphogenetic Proteins/genetics', 'Cell Division', 'Cloning, Molecular', 'DNA-Binding Proteins/chemistry/*genetics', 'Drosophila/*embryology', 'Drosophila Proteins', 'Gene Expression Regulation, Developmental', 'In Situ Hybridization', 'Larva/genetics/*growth & development', 'Molecular Sequence Data', 'Phosphorylation', 'RNA, Messenger/genetics', 'Receptors, Growth Factor/*genetics/metabolism', 'Sequence Alignment', 'Sequence Analysis, DNA', 'Signal Transduction/*physiology', 'Smad2 Protein', 'Trans-Activators/chemistry/*genetics', 'Wings, Animal/growth & development'], 'PMC': 'PMC316373', 'EDAT': '1999/01/14 00:00', 'MHDA': '1999/01/14 00:01', 'CRDT': ['1999/01/14 00:00'], 'PHST': ['1999/01/14 00:00 [pubmed]', '1999/01/14 00:01 [medline]', '1999/01/14 00:00 [entrez]'], 'AID': ['10.1101/gad.13.1.98 [doi]'], 'PST': 'ppublish', 'SO': 'Genes Dev. 1999 Jan 1;13(1):98-111. doi: 10.1101/gad.13.1.98.'}]
    actual_response = list(_get_eutil_records(eutil,id))
    assert isinstance(_get_eutil_records(eutil, id), types.GeneratorType)
    assert actual_response == expected_response

Copy link
Collaborator

@JohnGiorgi JohnGiorgi Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this means actual does not equal expected. Either expected is wrong or actual is wrong. It is helpful to run pytest with the -vv argument to get more information on where they differ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the actual value and replaced it and now it seems to work. I was getting the wrong list in the beginning. 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Is it easy to update the other unit tests now?

tests/test_ncbi.py Outdated Show resolved Hide resolved
tests/test_ncbi.py Outdated Show resolved Hide resolved
@Anwesh1
Copy link
Contributor Author

Anwesh1 commented Apr 23, 2021

Hey @JohnGiorgi, thanks again for a great co-op experience. I learned a lot working with you and hopefully, we get a chance to do it again. All my updates should be on this PR.

@Anwesh1
Copy link
Contributor Author

Anwesh1 commented Apr 24, 2021

Added testing for expected and actual responses on all the functions of ncbi.py.

@JohnGiorgi
Copy link
Collaborator

@Anwesh1 Awesome, I am glad you learned a lot, that is the most important thing! It was a pleasure working with you. I wish you all the best in the future!

@JohnGiorgi JohnGiorgi merged commit 657716e into develop Apr 26, 2021
@JohnGiorgi JohnGiorgi deleted the ncbi_tests branch April 26, 2021 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants