-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add import and export of OAISTree #5
Comments
The structure so far is not defined, it's just a rough idea I had, inspired by @petermr CTree structure. I definitely want to talk with some of the Dataverse Devs about the idea -> if this would work right now and in the long run. The idea in general is, that only the filenames and the structure of the folders and files tell about, what should/can be inside and how to treat different files then. Like, every dataverse folder must have a metadata file, same for datasets. The content of the metadata file then must not be strictly defined, but most likely will also have mandatory attributes. This then can be used to create a local export independent of OS and connecting programming language, which also can be used by humans. Here my first draft of the structure: Naming Conventions:
├── dv_harvard/
│ ├── metadata.json
│ └── dv_iqss/
│ ├── metadata.json
│ └── ds_microcensus-2018/
│ ├── metadata.json
│ └── datafiles/
│ ├── documentation.pdf
│ └── data.csv
│ └── metadata.json
└── dv_aussda/
└── ds_survey-labour-2016/
├── metadata.json
└── datafiles/
├── docs.pdf
└── data.tsv Some open questions:
|
@skasberger thanks for this great write up! I just posted on the (ancient) "Round tripping the contents of DVN" thread at https://groups.google.com/d/msg/dataverse-community/07h0Ca-Ai1I/qyq3l-lakc0J with a link to this issue. I'm hoping to spur some good discussion. 😄 |
As mentioned several times at the Dataverse Community Conference: BagIt seems to be very similar, and could be a good inspiration. https://en.wikipedia.org/wiki/BagIt |
After developing the first proof of concept, I recommend to rename it to OAISTree (Open Archival Information System), cause the related processes are the guidance for the directory structure and it's conventions. Here my actual draft. ROOT_DIR
├── YYYYMMDD_dataverses.csv
├── YYYYMMDD_datasets.csv
├── YYYYMMDD_datafiles.csv
├── terms-of-access.html
├── terms-of-use.html`
├── PICKE_FILE.pickle: Pickle Files
└── OAISTrees/
└──DATASET_ID/
├── DATASET_ID_history.json
└── SIP/
└── RAW_DATA_FILENAME
└── AIP/
├── DATASET_ID_metadata.json
└── DATASET_ID_DATAFILE_ID_metadata.json
└── DIP/
├── terms-of-use.html
└── terms-of-access.html
└──DATASET_ID/
└──DATASET_ID/
└──DATASET_ID/ |
Is this dependent on the
https://en.wikipedia.org/wiki/Open_Archival_Information_System ? This is a
standard and a community.
…On Mon, Dec 2, 2019 at 2:05 PM Stefan Kasberger ***@***.***> wrote:
After developing the first proof of concept, I recommend to rename it to
OAISTree (Open Archival Information System), cause the related processes
are the guidance for the directory structure and it's conventions. Here my
actual draft.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/AUSSDA/pyDataverse/issues/5?email_source=notifications&email_token=AAFTCSZSAMQTEYBUPFIUXALQWUI2ZA5CNFSM4HNLTYGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFTS4PI#issuecomment-560410173>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFTCSZS6TQNAXMITURUUXLQWUI2ZANCNFSM4HNLTYGA>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
Hi Peter, not really dependent. It's a first proposal for a standardized folder/data structure for OAIS related data, which then can be used to convert data from and to different systems. We use Dataverse, but others use iRODS or other software solutions. |
Fine, so Dataverse encapulates OAIS. The important thing in using standards
is to be consistent with their specs otherwise it accuses confusion with
software.
…On Mon, Dec 2, 2019 at 2:20 PM Stefan Kasberger ***@***.***> wrote:
Hi Peter,
not really dependent. It's a first proposal for a standardized folder/data
structure for OAIS related data, which then can be used to convert data
from and to different systems. We use Dataverse, but others use iRODS or
other software solutions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/AUSSDA/pyDataverse/issues/5?email_source=notifications&email_token=AAFTCS4O7KHWREZUJYIMZRDQWUKRTA5CNFSM4HNLTYGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFTUI7A#issuecomment-560415868>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFTCS5TKOHHTMIYVCBVKILQWUKRTANCNFSM4HNLTYGA>
.
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
@skasberger this looks great. Have you considered how to support dataverses of arbitrary depth? For example the dataset below about test taking is in a dataverse called "JOPD" which is inside a dataverse called "Ubiquity Press": For our "dataverse-sample-data" repo I ended up using nested directories to support this. The idea is that there can be an arbitrary number of "dataverses" directories to trigger the next level: I tend to have the sample data loaded up at https://dev2.dataverse.org if you'd like to take a look. Here's the "sample data" repo: https://github.com/IQSS/dataverse-sample-data In it I use pyDataverse! Thanks! 😄 |
Some notes on how to develop the oaistree. I already have some code for this locally running for AUSSDA purpose, so if you want to contribute, please get in touch with me first. Workflow
Functinoalities:
Development
|
As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python |
As I mentioned at IQSS/dataverse#5235 (comment) I'm curious if the "DVTree" (Dataverse Tree) format could be used to upload sample data to a brand new Dataverse installation for use in demos and usability testing.
I would love to see some docs. Or a pointer to the code for now. Thanks! 😄
The text was updated successfully, but these errors were encountered: