This repository documents my learning journey working with Knowledge Graphs, following the DeepLearning.AI Knowledge Graphs and RAG course. The project focuses on constructing and expanding a knowledge graph using SEC filings (Form 10-K and Form 13 documents).
A Knowledge Graph (KG) is a database that represents data as nodes and edges, where:
- Nodes and edges can have labels
- Both can have properties
- Relationships between data points are explicit and queryable
After struggling with vector support in Neo4j, I discovered that version 5.16 is the sweet spot for vector operations. The journey involved:
- Initial attempts with v5.13 where vector indexes were introduced but had limitations
- Excitement when finally getting it working: "SUCSSESSSSSSSSSSSSSSSSSS…………. 5.16 VERSION IS IDEAL FOR VECTORS"
- Learning that vector indexes are enterprise-only features
One of my key insights was developing a unique approach to RAG with knowledge graphs:
- Use vector search to find relevant chunks
- Leverage chunk IDs to backtrace relationships
- Perform complex queries on these relationships
"...those relationships can be missed from vector search.... my idea."
Found a non-obvious solution for vector functionality:
- Discovered GenAI jar in an unexpected location:
C:\Users\Black Mamba\AppData\Local\Neo4j\Relate\Cache\dbmss....\plugin-resources
- Learned that copying to the correct plugin folder is crucial
- Realized
dbms.security.procedures.unrestricted=genai.*
wasn't necessary
Developed a clear understanding of the vector workflow:
- Create index with node association
- Specify embedding location
- Use
db.create.setNodeVectorProperty
for property creation - Generate embeddings
- L5 notebook emerged as the cornerstone for knowledge graph work
"L5 NOTEBOOK IS IMPORTANT, ALWAYS CONSULT THAT BEFORE WORKING ON KNOWLEDGE GRAPH"
- Understanding paths and relationships is fundamental
- Direction of relationships affects semantic meaning
- Converting traditional table structures to graph models requires careful thought
- The power of APOC for data transformation and management
- Importance of proper relationship modeling for semantic accuracy
-
APOC Plugin Setup:
- Add APOC plugin from Neo4j desktop
- Configure
neo4j.conf
(located in.Neo4jdesktop
inside related data,<database_path>/conf/
):
dbms.security.procedures.unrestricted=apoc.* dbms.security.procedures.allowlist=apoc.*
-
Important Version Notes:
- Neo4j v5.16 is ideal for vector operations
- Vector support was introduced in v5.13
- Vector indexes are only supported in enterprise edition
- With Neo4j v5, APOC is split into Core and Extended editions
- Core edition: Standard installation
- Extended edition: Requires manual download from GitHub
- GenAI Plugin:
- Location:
C:\Users\Black Mamba\AppData\Local\Neo4j\Relate\Cache\dbmss....\plugin-resources
- Copy GenAI jar to the plugin folder (location available in Neo4j desktop app)
- Required for vector functions to work
- Location:
- Data import/export helpers (JSON, CSV)
- Graph algorithms for advanced computations
- Metadata querying (e.g.,
apoc.meta.data()
) - Utility functions (string handling, date calculations)
- Use
CALL apoc.meta.graph()
to visualize schema
Pros:
- All data in one place
- Embeddings attached to nodes
- Can find and create relationships dynamically
- Enhanced RAG capabilities through relationship backtracing
Cons:
- Increased complexity
- Create an index associated with a node
- Specify embedding location
- Create embedding property using
db.create.setNodeVectorProperty
- Generate embeddings
- Paths are matched patterns of nodes and relationships
- Path length = number of relationships in the path
- Can be captured as variables
- Variable length paths allow specifying relationship ranges
- Direction of relationships matters for semantic meaning
- Core component for connecting nodes
- Enable complex querying and pattern matching
- Can be used to trace back connections from vector search results
- Full text indexing available for string matching
- Vector search for semantic similarity
- Combined approaches possible for enhanced retrieval
- Find relevant chunks using vector search
- Use chunk IDs to backtrace relationships
- Perform complex queries on related data
- Leverage both semantic similarity and graph structure
- More comprehensive information retrieval
- Captures relationships that might be missed by vector search alone
- Enables complex reasoning through graph traversal
- Sharding or partitioning data across multiple databases
- Optimization for write-heavy or read-heavy operations
- Performance tuning for large-scale deployments
- L5 Notebook: Critical reference for knowledge graph work
- Always consult L5 before making changes to graph structure
- Converting tables/columns and CSV rows requires careful data modeling
- Consider relationship direction for semantic accuracy
- Explore methodologies for vector enhancement of KG
- Develop improved pipelines for data integration
- Investigate advanced scaling solutions
- Implement comprehensive testing strategies
-
Neo4j Documentation
-
APOC Documentation
L3-prep_text_for_RAG.ipynb
: Text preparation for RAGL4-construct_kg_from_text.ipynb
: Initial KG constructionL5-add_relationships_to_kg.ipynb
: Critical relationship managementL6-expand_the_kg.ipynb
: Graph expansion with Form 13 data
This journey has transformed my understanding of knowledge graphs from simple node-edge structures to powerful tools for complex data relationships and retrieval systems.