Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV's with the datasets from ComparativeDeerAnalyzer #47

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MichaelLantz
Copy link

@MichaelLantz MichaelLantz commented Oct 26, 2021

Fixes mindsdb/lightwood#673

Please describe what changes you made, in as much detail as possible

  • Adding ComparativeDeerAnalyzerdistribution csv from data sources mentioned in original issue

benchmarks

Distribution dataset from ComparativeDeerAnalyzer (Based on the supplementaty information referenced in the research paper George linked)
@George3d6
Copy link
Contributor

Could you consolidate them or would the dataset not make sense then? And if the answer is yes, could you write an info.py file (similar to other datasets) describing the target and some appropriate accuracy function.

Generating single sample CSV from DEERNet example. To help for use in Benchamarks.
@MichaelLantz
Copy link
Author

MichaelLantz commented Oct 31, 2021

Hi @George3d6,

If you want to get a sense of what's was involved to get from the .DTA,DSC to .DAT and eventually to a CSV, the zipped xlsx file has embedded Power Query transformations to split data from the output structure (referenced in doc) CDA generated text files (.DAT's) but required adding headers and doing general cleanup. Similar to the DEERNet library used by spinach this data was also broken up into multiple chunks (in this case distribution and fit). I extracted a single CSV using a fit example which is what I think would be used as a target benchmark.

That being said the xlsx can output the remaining 2 chunks and any other sample (distribution and fit) if necessary.

Feel free to dig in. Hopefully this output is useful as a benchmark and is generally what we're after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: to review
Development

Successfully merging this pull request may close these issues.

CSV with the dataset from the deernet paper
2 participants