Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

U/gblackadder/pydantic and excel #1

Merged
merged 55 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
86cfd7f
setup poetry and pre-commit
Jul 2, 2024
0f561ae
define pydantic schemas
Jul 2, 2024
0a89f1d
turn unicode apostrophe to ascii
Jul 2, 2024
d8eda9c
type is integer, not type number - format integer
Jul 2, 2024
be56cb1
rename so that - become _
Jul 2, 2024
990a9b5
change keywords from an optional keyword (which is then itself a list…
Jul 2, 2024
1cfee8a
update document keyword definition
Jul 2, 2024
3eadb01
Merge branch 'main' into u/gblackadder/pydantic_and_excel
Jul 5, 2024
68e60c8
move to definitions folder
Jul 9, 2024
dbc379c
git ignore
Jul 10, 2024
fc4da8f
Merge branch 'main' into u/gblackadder/pydantic_and_excel
Jul 10, 2024
a100924
update pydantic after merging master
Jul 10, 2024
f112767
array must be of something
Jul 12, 2024
7dee00c
updated for latest schema
Jul 12, 2024
657d200
function to make skeleton of schemas
Jul 12, 2024
190a82a
write and read simple pydantic to excel
Jul 16, 2024
850daae
docstring for write_simple_pydantic_to_sheet
Jul 16, 2024
bf6055c
work with multinesting where we use multiindexs
Jul 17, 2024
235045b
handle list and lists of lists
Jul 19, 2024
b7d32a8
seperate out top level simple fields
Jul 22, 2024
131def1
write metadata over many sheets
Jul 23, 2024
87ed4b1
read metadata written over many sheets
Jul 23, 2024
c636616
write dictionaries of str to str
Jul 23, 2024
bc6f3c3
specift the array as an array of str
Jul 23, 2024
e80c414
read dictionaries
Jul 23, 2024
6f66bdc
make status an array of strings not just an array
Jul 24, 2024
9cdced6
collapse root models
Jul 24, 2024
eac8caa
resolve bugs with making excel sheets of skeletons
Jul 24, 2024
9e474e3
empty excel sheets for metadata
Jul 24, 2024
5b5fc42
rewrite the excel to pydantic
Jul 29, 2024
e4f250c
allow users to add columns
Jul 30, 2024
e7c49d8
excel interface
Jul 30, 2024
7c3845f
text excel interface
Jul 30, 2024
6cb2957
prepare for build
Aug 1, 2024
0d1cf89
simplify interface
Aug 7, 2024
8529028
bug fix: final list element was shaded and locked
Aug 8, 2024
b83fcc4
inflate read data to schema
Aug 8, 2024
6949a28
marginal speedup
Aug 8, 2024
7857924
make unknown fields raise an error
Aug 9, 2024
a29291c
create automatic excel file creation
Aug 9, 2024
833ecd5
updated excel outlines
Aug 9, 2024
4c73e49
include versioning and automatic type detection
Aug 9, 2024
cefd80b
100X speed up excel writers
Aug 13, 2024
5db1610
add readme
Aug 22, 2024
8acb008
implement template to pydantic
Aug 26, 2024
d762c3f
generalise excel interface to schema interface, fix bugs in template …
Aug 28, 2024
05f04a4
include resource, delete Series, remove template stuff but make the m…
Sep 5, 2024
04f4e22
resetting version number
Sep 5, 2024
16785ec
Merge branch 'main' into u/gblackadder/pydantic_and_excel
Sep 5, 2024
ab6a78d
update pydantic and excel sheets after merging
Sep 5, 2024
82d57ce
include ref to excel sheet folder
Sep 5, 2024
ddce25f
fix bugs when working with templates
Sep 6, 2024
742c114
rename surveys, timeseries and timeseries_db
Sep 13, 2024
5d27568
Merge branch 'main' into u/gblackadder/pydantic_and_excel
Sep 13, 2024
a98d5c1
following merge from main - apply schema changes to pydantic and excel
Sep 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]

# Unit test
.pytest_cache/

# Environments
.venv

# Visual Studio Code
.vscode/

# Environment variables
.env
22 changes: 22 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
repos:
- repo: https://github.com/Yelp/detect-secrets
rev: v1.5.0
hooks:
- id: detect-secrets
exclude: package.lock.json
args: ["--exclude-lines", "\\s*\"image/png\": \".+\""]

- repo: https://github.com/pre-commit/mirrors-isort
rev: v5.10.1 # Use the latest version
hooks:
- id: isort

- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.0.287 # Use the latest version
hooks:
- id: ruff

- repo: https://github.com/psf/black
rev: 23.3.0 # Use the latest version
hooks:
- id: black
97 changes: 95 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,97 @@
# metadata-schemas
Metadata JSON Schemas
This repository contains both the definitions of Metadata Schemas and a python library for creating schema objects with pydantic and Excel.

View documentation - https://worldbank.github.io/metadata-schemas/
## Defining Metadata Schemas

The schemas are defined in the JSON Schema format in the folder `schemas`. For more information you can view documentation at https://worldbank.github.io/metadata-schemas/

## Excel

Excel sheets formatted for each metadata type are located in this repo in the excel_sheets folder.

## Python library

To install the library run

```pip install metadataschemas```

### Creating a pydantic metadata object

To create a timeseries metadata object run

```python
from metadataschemas import timeseries_schema

timeseries_metadata = timeseries_schema.TimeseriesSchema(idno='project_idno',series_description=timeseries_schema.SeriesDescription(idno='project_idno', name='project_name'))
```

Depending on your IDE, selecting `TimeseriesSchema` could show you what fields the schema contains and their corresponding object definitions.

There are metadata objects for each of the following metadata types:

| Metadata Type | Metadata Object |
|------------------|-------------------------------------------------|
| document | `document_schema.ScriptSchemaDraft` |
| geospatial | `geospatial_schema.GeospatialSchema` |
| script | `script_schema.ResearchProjectSchemaDraft` |
| series | `series_schema.Series` |
| survey | `microdata_schema.MicrodataSchema` |
| table | `table_schema.Model` |
| timeseries | `timeseries_schema.TimeseriesSchema` |
| timeseries_db | `timeseries_db_schema.TimeseriesDatabaseSchema` |
| video | `video_schema.Model` |

### Python - Metadata Manager

The Manager exists to be an interface with Excel and to lightly assist creating schemas.

For Excel we can:

1. Create blank Excel files formatted for a given metadata type
2. Write metadata objects to Excel
3. Read an appropriately formatted Excel file containing metadata into a pydantic metadata object

To use it run:

```python
from metadataschemas import MetadataManager

mm = MetadataManager()

filename = mm.write_metadata_outline_to_excel('timeseries')

filename = mm.save_metadata_to_excel('timeseries',
object=timeseries_metadata)

# Then after you have updated the metadata in the Excel file

updated_timeseries_metadata = mm.read_metadata_from_excel(timeseries_metadata_filename)
```

Note that the Excel write and save functions do not currently support Geospatial metadata.

The manager also offers a convenient way to get started creating metadata in pydantic by creating an empty pydantic object for a given metadata type which can then be updated as needed.

```python
# list the supported metadata types
mm.metadata_type_names

# get the pydantic class for a given metadata type
survey_type = mm.metadata_class_from_name("survey")

# create an instantiated pydantic object and then fill in your data
survey_metadata = mm.type_to_outline(metadata_type="survey")
survey_metadata.repositoryid = "repository id"
survey_metadata.study_desc.title_statement.idno = "project_idno"
```


## Updating Pydantic definitions and Excel sheets

To update the pydantic schemas so that they match the latest json schemas run

`python pydantic_schemas/generators/generate_pydantic_schemas.py`

Then to update the Excel sheets run

`python pydantic_schemas/generators/generate_excel_files.py`
Binary file added excel_sheets/Document_metadata.xlsx
Binary file not shown.
Binary file added excel_sheets/Resource_metadata.xlsx
Binary file not shown.
Binary file added excel_sheets/Script_metadata.xlsx
Binary file not shown.
Binary file added excel_sheets/Survey_metadata.xlsx
Binary file not shown.
Binary file added excel_sheets/Table_metadata.xlsx
Binary file not shown.
Binary file added excel_sheets/Timeseries_db_metadata.xlsx
Binary file not shown.
Binary file added excel_sheets/Timeseries_metadata.xlsx
Binary file not shown.
Binary file added excel_sheets/Video_metadata.xlsx
Binary file not shown.
Loading