-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oraclevs integration #7333
Open
skmishraoracle
wants to merge
34
commits into
langchain-ai:main
Choose a base branch
from
skmishraoracle:oraclevs_integration
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Oraclevs integration #7333
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
e37525f
refresh to current version
skmishraoracle b722003
Add doc loader files
hackerdave d93610c
Add docs
hackerdave c648518
Update dependencies
hackerdave 00abea4
change metadata column type to JSON & add support for connection pool
skmishraoracle 2f2f6b3
documentation in cookbook - oraclevs.md
skmishraoracle afc56fe
Add entry points
hackerdave d649ca8
Update import paths in doc
hackerdave c72d99c
Combine doc loader and text splitter files
hackerdave 7fe39a0
Move oracle text splitter to langchain-textsplitters
hackerdave 9b2778c
Update docs for oracle text splitter
hackerdave 1478731
Update import paths
hackerdave b84f98e
Change imports oracleai to oracle
hackerdave 6cdd334
Handle escaped double quotes
hackerdave c2dd804
config changes changes to imports
skmishraoracle d719938
checking in files to gen pr...
skmishraoracle 74bfaad
generate pr-3
skmishraoracle 49163ce
Add doc loader files
hackerdave e3e8c19
Add docs
hackerdave 21cf483
Update dependencies
hackerdave 581893e
change metadata column type to JSON & add support for connection pool
skmishraoracle 486050c
documentation in cookbook - oraclevs.md
skmishraoracle c9ccb8f
Add entry points
hackerdave 5b1e0dd
Update import paths in doc
hackerdave a79a934
Combine doc loader and text splitter files
hackerdave 462de8a
Move oracle text splitter to langchain-textsplitters
hackerdave 9b1460c
Update docs for oracle text splitter
hackerdave d37038b
Update import paths
hackerdave 0d6c42f
Change imports oracleai to oracle
hackerdave 9cb2b44
Handle escaped double quotes
hackerdave b680622
generate pr - 4
skmishraoracle cb6bc78
generate pr - 5
skmishraoracle 49afa7d
generate pr-7
skmishraoracle 87a172d
generate pr-7
skmishraoracle File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,273 @@ | ||
Oracle AI Vector Search with LangchainJS Integration | ||
Introduction | ||
Oracle AI Vector Search enables semantic search on unstructured data while simultaneously providing relational search capabilities on business data, all within a unified system. This approach eliminates the need for a separate vector database, reducing data fragmentation and improving efficiency. | ||
|
||
By integrating Oracle AI Vector Search with Langchain, you can build a powerful pipeline for Retrieval Augmented Generation (RAG), leveraging Oracle's robust database features. | ||
|
||
Key Advantages of Oracle Database | ||
Oracle AI Vector Search is built on top of the Oracle Database, providing several key features: | ||
|
||
Partitioning Support | ||
Real Application Clusters (RAC) Scalability | ||
Exadata Smart Scans | ||
Geographically Distributed Shard Processing | ||
Transactional Capabilities | ||
Parallel SQL | ||
Disaster Recovery | ||
Advanced Security | ||
Oracle Machine Learning | ||
Oracle Graph Database | ||
Oracle Spatial and Graph | ||
Oracle Blockchain | ||
JSON Support | ||
Guide Overview | ||
This guide demonstrates how to integrate Oracle AI Vector Search with Langchain to create an end-to-end RAG pipeline. You'll learn how to: | ||
|
||
Load documents from different sources using OracleDocLoader. | ||
Summarize documents inside or outside the database using OracleSummary. | ||
Generate embeddings either inside or outside the database using OracleEmbeddings. | ||
Chunk documents based on specific needs using OracleTextSplitter. | ||
Store, index, and query data using OracleVS. | ||
Getting Started | ||
If you're new to Oracle Database, consider using the free Oracle 23 AI Database to get started. | ||
|
||
Best Practices | ||
User Management: Create dedicated users for your Oracle Database projects instead of using the system user for security and control purposes. See the end-to-end guide for more details. | ||
User Privileges: Be sure to manage user privileges effectively to maintain database security. You can find more information in the official Oracle documentation. | ||
Prerequisites | ||
To get started, install the Oracle JavaScript client driver: | ||
|
||
``` typescript | ||
npm install oracledb | ||
``` | ||
|
||
Document Preparation | ||
Assuming you have documents stored in a file system that you want to use with Oracle AI Vector Search and Langchain, these documents need to be instances of langchain/core/documents. | ||
|
||
Example: Ingesting JSON Documents | ||
In the following TypeScript example, we demonstrate how to ingest documents from JSON files: | ||
|
||
```typescript | ||
private createDocument(row: DataRow): Document { | ||
const metadata = { | ||
id: row.id, | ||
link: row.link, | ||
}; | ||
return new Document({ pageContent: row.text, metadata: metadata }); | ||
} | ||
|
||
public async ingestJson(): Promise<Document[]> { | ||
try { | ||
const filePath = `${this.docsDir}${this.filename}`; | ||
const fileContent = await fs.readFile(filePath, {encoding: 'utf8'}); | ||
const jsonData: DataRow[] = JSON.parse(fileContent); | ||
return jsonData.map((row) => this.createDocument(row)); | ||
} catch (error) { | ||
console.error('An error occurred while ingesting JSON:', error); | ||
throw error; // Rethrow for the calling function to handle | ||
} | ||
} | ||
``` | ||
|
||
Langchain and Oracle Integration | ||
The Oracle AI Vector Search Langchain library offers a rich set of APIs for document processing, which includes loading, chunking, summarizing, and embedding generation. Here's how to set up a connection and integrate Oracle with Langchain. | ||
|
||
Connecting to Oracle Database | ||
Below is an example of how to connect to an Oracle Database using both a direct connection and a connection pool: | ||
|
||
```typescript | ||
async function dbConnect(): Promise<oracledb.Connection> { | ||
const connection = await oracledb.getConnection({ | ||
user: '****', | ||
password: '****', | ||
connectString: '***.**.***.**:1521/****' | ||
}); | ||
console.log('Connection created...'); | ||
return connection; | ||
} | ||
|
||
async function dbPool(): Promise<oracledb.Pool> { | ||
const pool = await oracledb.createPool({ | ||
user: '****', | ||
password: '****', | ||
connectString: '***.**.***.**:1521/****' | ||
}); | ||
console.log('Connection pool started...'); | ||
return pool; | ||
} | ||
``` | ||
|
||
Testing the Integration | ||
Here, we demonstrate how to create a test class TestsOracleVS to explore various features of Oracle Vector Store and its integration with Langchain. | ||
|
||
Example Test Class | ||
Testing the Integration | ||
Here, we demonstrate how to create a test class TestsOracleVS to explore various features of Oracle Vector Store and its integration with Langchain. | ||
|
||
Example Test Class | ||
|
||
``` typescript | ||
class TestsOracleVS { | ||
client: any | null = null; | ||
embeddingFunction: HuggingFaceTransformersEmbeddings; | ||
dbConfig: Record<string, any> = {}; | ||
oraclevs!: OracleVS; | ||
|
||
constructor(embeddingFunction: HuggingFaceTransformersEmbeddings) { | ||
this.embeddingFunction = embeddingFunction; | ||
} | ||
|
||
async init(): Promise<void> { | ||
this.client = await dbPool(); | ||
this.dbConfig = { | ||
"client": this.client, | ||
"tableName": "some_tablenm", | ||
"distanceStrategy": DistanceStrategy.DOT_PRODUCT, | ||
"query": "What are the salient features of OracleDB?" | ||
}; | ||
this.oraclevs = new OracleVS(this.embeddingFunction, this.dbConfig); | ||
} | ||
|
||
public async testCreateIndex(): Promise<void> { | ||
const connection: oracledb.Connection = await dbConnect(); | ||
await createIndex(connection, this.oraclevs, { | ||
idxName: "IVF", | ||
idxType: "IVF", | ||
neighborPart: 64, | ||
accuracy: 90 | ||
}); | ||
console.log("Index created successfully"); | ||
await connection.close(); | ||
} | ||
|
||
// We are ready to test SimilaritySearchByVector - To this one passes an embedding which is a number array. a k value and a filter. This call returns documents ordered by distance. | ||
public async testSimilaritySearchByVector( | ||
embedding: number[], | ||
k: number, | ||
filter?: OracleVS["FilterType"], | ||
): Promise<[DocumentInterface, number][]> { | ||
return this.oraclevs.similaritySearchVectorWithScore( | ||
embedding, | ||
k, | ||
filter, | ||
); | ||
} | ||
|
||
// This call does the same except that it returns Documents and embeddings. | ||
public async testSimilaritySearchByVectorReturningEmbeddings( | ||
embedding: number[], | ||
k: number = 4, | ||
filter?: OracleVS["FilterType"], | ||
): Promise<[Document, number, Float32Array | number[]][]> { | ||
return await this.oraclevs.similaritySearchByVectorReturningEmbeddings( embedding, k, filter); | ||
} | ||
|
||
// This call tests out the MaxMarginalRelevanceSearch the parameters are self explanatory. The Callback is reserved for future use. | ||
public async testMaxMarginalRelevanceSearch( | ||
query: string, | ||
options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>, | ||
_callbacks?: Callbacks | ||
): Promise<DocumentInterface[]> { | ||
if (!options) { | ||
options = { k: 10, fetchK: 20 }; // Default values for the options | ||
} | ||
// @ts-ignore | ||
return this.oraclevs.maxMarginalRelevanceSearch(query, options, _callbacks); | ||
} | ||
|
||
// This call is the same as above except that it takes a vector instead of a query as an argument. | ||
public async testMaxMarginalRelevanceSearchByVector( | ||
query: number[], | ||
options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>, | ||
_callbacks?: Callbacks | undefined | ||
): Promise<DocumentInterface[]> { | ||
if (!options) { | ||
options = { k: 10, fetchK: 20 }; // Default values for the options | ||
} | ||
return this.oraclevs!.maxMarginalRelevanceSearchByVector(query, options, _callbacks); | ||
} | ||
|
||
// This too is the same as above except that it returns document and the score. | ||
public async testMaxMarginalRelevanceSearchWithScoreByVector( | ||
embedding: number[], | ||
options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>, | ||
_callbacks?: Callbacks | undefined | ||
): Promise<Array<{ document: Document; score: number }>> { | ||
if (!options) { | ||
options = { k: 10, fetchK: 20 }; // Default values for the options | ||
} | ||
return this.oraclevs.maxMarginalRelevanceSearchWithScoreByVector(embedding, options, _callbacks) | ||
} | ||
|
||
// This call tests out the delete feature. | ||
testDelete( params: { ids?: string[], deleteAll?: boolean } ): Promise<void> { | ||
return this.oraclevs.delete(params); | ||
} | ||
} | ||
|
||
// The runTestOracleVS is the driver to test out each of the calls. | ||
async function runTestsOracleVS() { | ||
// Initialize dotenv to load environment variables | ||
dotenv.config(); | ||
const query = "What is the language used by Oracle database"; | ||
|
||
// Set up the embedding function model: "Xenova/all-MiniLM-L6-v2" | ||
const embeddingFunction = new HuggingFaceTransformersEmbeddings(); | ||
if (!embeddingFunction) { | ||
console.error("Failed to initialize the embedding function."); | ||
return; | ||
} | ||
|
||
if (!(embeddingFunction instanceof Embeddings)) { | ||
console.error("Embedding function is not an instance of Embeddings."); | ||
return; | ||
} | ||
|
||
console.log("Embedding function initialized successfully"); | ||
|
||
// Initialize the TestsOracleVS class | ||
const testsOracleVS = new TestsOracleVS("concepts23c_small.json", | ||
embeddingFunction); | ||
|
||
// Initialize connection and other setup | ||
await testsOracleVS.init(); | ||
|
||
// Ingest JSON data to create documents | ||
const documents = await testsOracleVS.testIngestJson(); | ||
await OracleVS.fromDocuments( | ||
documents, | ||
testsOracleVS.embeddingFunction, | ||
testsOracleVS.dbConfig | ||
) | ||
|
||
// Create an index | ||
await testsOracleVS.testCreateIndex(); | ||
|
||
// Assume some dummy embedding vector for demonstration | ||
// const embedding: number[] = [0.1, 0.2, 0.3, 0.4]; // Example embedding | ||
|
||
// Perform a similarity search by vector | ||
const embedding = await embeddingFunction.embedQuery(query); | ||
const similaritySearchByVector = await testsOracleVS.testSimilaritySearchByVector(embedding, 5); | ||
console.log("Similarity Search Results:", similaritySearchByVector); | ||
|
||
// Perform a similarity search by vector | ||
const similaritySearchByEmbeddings = | ||
await testsOracleVS.testSimilaritySearchByVectorReturningEmbeddings(embedding, 5) | ||
console.log("Similarity Search Results:", similaritySearchByEmbeddings); | ||
|
||
const maxMarginalRelevanceSearch = | ||
await testsOracleVS.testMaxMarginalRelevanceSearch(query) | ||
console.log("Max Marginal Relevance Search:", maxMarginalRelevanceSearch); | ||
|
||
const maxMarginalRelevanceSearchByVector = | ||
await testsOracleVS.testMaxMarginalRelevanceSearchByVector(embedding) | ||
console.log("Max Marginal Relevance Search By Vector:", maxMarginalRelevanceSearchByVector); | ||
|
||
const maxMarginalRelevanceSearchWithScoreByVector = | ||
await testsOracleVS.testMaxMarginalRelevanceSearchWithScoreByVector(embedding) | ||
console.log("Max Marginal Relevance Search By Vector:", maxMarginalRelevanceSearchWithScoreByVector); | ||
|
||
} | ||
``` | ||
That is all for now. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
79 changes: 79 additions & 0 deletions
79
docs/core_docs/docs/integrations/document_loaders/file_loaders/oracleai.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# Oracle AI Vector Search: Document Processing | ||
|
||
## Load Documents | ||
|
||
Users have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F). | ||
|
||
A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html). | ||
|
||
Below is a sample code snippet that demonstrates how to use OracleDocLoader | ||
|
||
```typescript | ||
import {OracleDocLoader} from "@langchain/community/document_loaders/fs/oracle"; | ||
|
||
/* | ||
// loading a local file | ||
loader_params = {"file": "<file>"}; | ||
|
||
// loading from a local directory | ||
loader_params = {"dir": "<directory>"}; | ||
*/ | ||
|
||
// loading from Oracle Database table | ||
// make sure you have the table with this specification | ||
const loader_params = { | ||
"owner": "testuser", | ||
"tablename": "demo_tab", | ||
"colname": "data", | ||
}; | ||
|
||
// load the docs | ||
const loader = new OracleDocLoader(conn, loader_params); | ||
const docs = await loader.load(); | ||
|
||
// verify | ||
console.log(`Number of docs loaded: ${docs.length}`); | ||
//console.log(`Document-0: ${docs[0].pageContent}`); | ||
``` | ||
|
||
## Split Documents | ||
|
||
The documents may vary in size, ranging from small to very large. Users often prefer to chunk their documents into smaller sections to facilitate the generation of embeddings. A wide array of customization options is available for this splitting process. For comprehensive details regarding these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-4E145629-7098-4C7C-804F-FC85D1F24240). | ||
|
||
Below is a sample code illustrating how to implement this: | ||
|
||
```typescript | ||
import {OracleTextSplitter} from "@langchain/textsplitters/oracle"; | ||
|
||
/* | ||
// Some examples | ||
// split by chars, max 500 chars | ||
splitter_params = {"split": "chars", "max": 500, "normalize": "all"}; | ||
|
||
// split by words, max 100 words | ||
splitter_params = {"split": "words", "max": 100, "normalize": "all"}; | ||
|
||
// split by sentence, max 20 sentences | ||
splitter_params = {"split": "sentence", "max": 20, "normalize": "all"}; | ||
*/ | ||
|
||
// split by default parameters | ||
const splitter_params = {"normalize": "all"}; | ||
|
||
// get the splitter instance | ||
const splitter = new OracleTextSplitter(conn, splitter_params); | ||
|
||
let list_chunks = []; | ||
for (let[, doc]of docs.entries()) { | ||
let chunks = await splitter.splitText(doc.pageContent); | ||
list_chunks.push(chunks); | ||
} | ||
|
||
// verify | ||
console.log(`Number of Chunks: ${list_chunks.length}`); | ||
//console.log(`Chunk-0: ${list_chunks[0]}`); // content | ||
``` | ||
|
||
## End to End Demo | ||
|
||
Please refer to our complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchainjs/tree/main/cookbook/oracleai.mdx) to build an end to end RAG pipeline with the help of Oracle AI Vector Search. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert this?