Custom Id for document creation #5503
-
Checked other resources
Commit to Help
Example Codeif (response.status === 200) {
const $ = cheerio.load(response.data);
const scrapedData: string[] = [];
// Example: scraping all paragraph texts
$("p").each((index, element) => {
const paragraph = $(element).text().trim();
if (paragraph) {
scrapedData.push(paragraph);
}
});
// Send scraped data as response
const scrape = scrapedData.join("\n");
//@ts-ignore
allResults["Link " + i] = {
title: title,
url: link,
description: desc,
content: scrape,
};
const output = await splitter.createDocuments([scrape]);
console.log(output);
// await pineconeStore.addDocuments(output);
await new Promise((resolve) => setTimeout(resolve, 3000));
} DescriptionI am scraping data and dynamically creating a document to push to pinecone, however I already have data on pinecone, so i can't use 'await pineconeStore.delete({ deleteAll: true });' to delete the scraped data on pinecone. Is there a way to set a custom Id so that i can only delete the scraped data or is there another solution? System Infolangchain 0.2.0 |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
You can set custom IDs for the documents you add to Pinecone, which will allow you to delete specific scraped data later. When adding documents using the Modified Example Codeif (response.status === 200) {
const $ = cheerio.load(response.data);
const scrapedData: string[] = [];
// Example: scraping all paragraph texts
$("p").each((index, element) => {
const paragraph = $(element).text().trim();
if (paragraph) {
scrapedData.push(paragraph);
}
});
// Send scraped data as response
const scrape = scrapedData.join("\n");
//@ts-ignore
allResults["Link " + i] = {
title: title,
url: link,
description: desc,
content: scrape,
};
const output = await splitter.createDocuments([scrape]);
console.log(output);
// Generate custom IDs for the documents
const customIds = output.map((_, index) => `scraped-doc-${i}-${index}`);
// Add documents with custom IDs to Pinecone
await pineconeStore.addDocuments(output, { ids: customIds });
await new Promise((resolve) => setTimeout(resolve, 3000));
} Deleting Specific Scraped DataLater, you can delete these documents by specifying the custom IDs in the // Example: Deleting the documents with custom IDs
const customIdsToDelete = output.map((_, index) => `scraped-doc-${i}-${index}`);
await pineconeStore.delete({ ids: customIdsToDelete }); This approach ensures that you can manage and delete specific documents based on the custom IDs you set.
|
Beta Was this translation helpful? Give feedback.
You can set custom IDs for the documents you add to Pinecone, which will allow you to delete specific scraped data later. When adding documents using the
addDocuments
method, you can provide an array of custom IDs. Here's how you can modify your example code to include custom IDs:Modified Example Code