Skip to content

duohub-ai/create-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generate training data for crime entity extraction

Basic Usage

python scenarios.py
python json.py
python format.py
python hugging_face.py

Details

Scenarios

Generate crime scenarios using the scenarios.py script. Modify the SYSTEM_MESSAGE variable to change the prompt.

JSON

Extract entities from the scenarios using the json.py script.

Format

Format the JSON output into a JSONL file using the format.py script.

Hugging Face

Upload the JSONL file to Hugging Face using the hugging_face.py script.

Prerequisites

  • python
  • pip
  • huggingface-cli
  • boto3
  • datasets
  • huggingface_hub

You should have an AWS account with the necessary permissions to use Amazon Bedrock. You should have authenticated with AWS SSO using the aws configure command. Alternatively, you can use another LLM provider.

About

Create a synthetic dataset for fine-tuning LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages