Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add rerank document compressor (#331)
Fixes #298 Added: - BedrockRerank based on [BaseDocumentCompressor](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.compressor.BaseDocumentCompressor.html) so we can use [ContextualCompressionRetriever](https://python.langchain.com/api_reference/langchain/retrievers/langchain.retrievers.contextual_compression.ContextualCompressionRetriever.html) - Import from root import i.e: `from langchain_aws import BedrockRerank` - Unit tests Some snippets: - Example 1 (from documents): ```python3 from langchain_core.documents import Document from langchain_aws import BedrockRerank # Initialize the class reranker = BedrockRerank(model_arn=model_arn) # List of documents to rerank documents = [ Document(page_content="LangChain is a powerful library for LLMs."), Document(page_content="AWS Bedrock enables access to AI models."), Document(page_content="Artificial intelligence is transforming the world."), ] # Query for reranking query = "What is AWS Bedrock?" # Call the rerank method results = reranker.compress_documents(documents, query) # Display the most relevant documents for doc in results: print(f"Content: {doc.page_content}") print(f"Score: {doc.metadata['relevance_score']}") ``` - Example 2 (with contextual compression retriever): ```python3 from langchain_aws import BedrockEmbeddings from langchain.retrievers.contextual_compression import ContextualCompressionRetriever from langchain.vectorstores import FAISS from langchain_core.documents import Document from langchain_aws import BedrockRerank # Create a vector store using FAISS with Bedrock embeddings documents = [ Document(page_content="LangChain integrates LLM models."), Document(page_content="AWS Bedrock provides cloud-based AI models."), Document(page_content="Machine learning can be used for predictions."), ] embeddings = BedrockEmbeddings() vectorstore = FAISS.from_documents(documents, embeddings) # Create the document compressor using BedrockRerank reranker = BedrockRerank(model_arn=model_arn) # Create the retriever with contextual compression retriever = ContextualCompressionRetriever( base_compressor=reranker, base_retriever=vectorstore.as_retriever(), ) # Execute a query query = "How does AWS Bedrock work?" retrieved_docs = retriever.invoke(query) # Display the most relevant documents for doc in retrieved_docs: print(f"Content: {doc.page_content}") print(f"Score: {doc.metadata.get('relevance_score', 'N/A')}") ``` - Example 3 (from list): ```python3 from langchain_aws import BedrockRerank # Initialize BedrockRerank reranker = BedrockRerank(model_arn=model_arn) # Unstructured documents documents = [ "LangChain is used to integrate LLM models.", "AWS Bedrock provides access to cloud-based models.", "Machine learning is revolutionizing the world.", ] # Query query = "What is the role of AWS Bedrock?" # Rerank the documents results = reranker.rerank(query=query, documents=documents) # Display the results for res in results: print(f"Index: {res['index']}, Score: {res['relevance_score']}") print(f"Document: {documents[res['index']]}") ```
- Loading branch information