JSON RAG Integration

A tool for efficiently loading and integrating nested JSON data structures into RAG (Retrieval-Augmented Generation) systems, with enhanced entity tracking, relationship detection, and context preservation.

Key Features

Advanced Query Understanding:
- Temporal patterns (exact dates, relative ranges, named periods)
- Metric aggregations (average, maximum, minimum, sum, count)
- Entity relationships (direct, semantic, and cross-file connections)
- State transitions and system conditions
- Hybrid search combining vector similarity, relationships, and filters
Smart Data Processing:
- Automatic entity detection and relationship mapping
- Cross-file relationship detection and validation
- Key-value pair extraction for filtered searches
- Embedded metadata tracking
- Batch processing with change detection
Archetype-Aware Processing:
- Pattern detection (entities, events, metrics, collections)
- Archetype-based scoring and ranking
- Relationship validation by archetype
- Context-aware embedding generation
- Archetype-specific traversal strategies
Hierarchical Data Management:
- Full JSON structure preservation
- Parent-child relationship tracking
- Cross-file relationship mapping
- Contextual embedding with ancestry
- Path-based chunk identification
Enhanced Retrieval:
- Vector similarity search using PGVector
- Relationship-aware context assembly
- Entity-aware result filtering
- Cross-file context expansion
- Confidence-based scoring and ranking

Quick Start

Clone and install:

git clone https://github.com/Mocksi/json-rag.git
cd json_rag
python -m venv rag_env
source rag_env/bin/activate  # Windows: rag_env\Scripts\activate
pip install -r requirements.txt

Configure database:

# Update POSTGRES_CONN_STR in app/config.py:
POSTGRES_CONN_STR = "dbname=myragdb user=your_user host=localhost port=5432"

Set up environment:

# Create .env file with:
OPENAI_API_KEY=your-key-here

Initialize and run:

python -m app.main --new  # Truncates all tables and starts fresh
python -m app.main        # Normal operation

Architecture

app/
├── analysis/           # Analysis and pattern detection
│   ├── archetype.py   # Pattern and archetype detection
│   └── relationships.py# Cross-file relationship analysis
├── core/              # Core system components
│   ├── config.py      # Configuration settings
│   └── models.py      # Data models
├── processing/        # Data processing modules
│   ├── json_parser.py # JSON structure parsing
│   ├── parsing.py     # Document parsing and chunking
│   └── processor.py   # Data processing pipeline
├── retrieval/         # Query processing and retrieval
│   ├── embedding.py   # Vector embedding generation
│   └── retrieval.py   # Query pipeline and execution
├── storage/           # Data persistence
│   └── database.py    # PostgreSQL and vector storage
├── utils/             # Utility modules
│   └── logging_config.py # Logging configuration
├── __init__.py        # Package initialization
├── chat.py           # Chat interface and interactions
└── main.py           # Application entry point

The codebase is organized into logical modules:

analysis/: Modules for analyzing data patterns, cross-file relationships, and user intent
core/: Core system configuration and shared components
processing/: Data processing and relationship detection modules
retrieval/: Relationship-aware search and context assembly
storage/: Database interaction and relationship persistence
utils/: Shared utility functions and helpers

Each module is designed to be independent with clear responsibilities, while working together through well-defined interfaces.

Installation Requirements

Python 3.8 or higher
PostgreSQL 12 or higher with PGVector extension
OpenAI API key
Required Python packages (see requirements.txt)

Documentation

The codebase features comprehensive inline documentation:

Detailed module-level docstrings explaining key concepts
Function and class documentation with examples
Type hints and parameter descriptions
Usage examples and implementation notes

Contributing

We welcome contributions! Please see our Contributing Guide for details on:

Setting up your development environment
Code style guidelines
Pull request process
Development workflow

Code of Conduct

This project follows the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior.

License

MIT License - see LICENSE file for details.

Roadmap

Query Pipeline

The system implements a structured reasoning pipeline:

Query Analysis:
- Determines required data types
- Identifies needed operations (filtering, aggregation)
- Detects relationships and constraints
Plan Creation:
- Builds retrieval strategy
- Plans processing operations
- Determines result formatting
Execution:
- Retrieves relevant chunks
- Processes according to plan
- Assembles coherent response

This systematic approach ensures consistent and reliable query handling while preserving context and relationships.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
app		app
data/json_docs		data/json_docs
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JSON RAG Integration

Key Features

Quick Start

Architecture

Installation Requirements

Documentation

Contributing

Code of Conduct

License

Roadmap

Query Pipeline

About

Releases

Packages

Languages

License

Mocksi/json-rag

Folders and files

Latest commit

History

Repository files navigation

JSON RAG Integration

Key Features

Quick Start

Architecture

Installation Requirements

Documentation

Contributing

Code of Conduct

License

Roadmap

Query Pipeline

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages