-
Notifications
You must be signed in to change notification settings - Fork 88
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(tools): add Exa and Firecrawl tools with comprehensive README do…
…cumentation.
- Loading branch information
1 parent
d52b3fa
commit 7c976a6
Showing
2 changed files
with
174 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# Exa Search Tool | ||
|
||
This tool integrates with Exa (https://exa.ai/), a search engine for AI that organizes the web using embeddings. It provides high-quality web data specifically optimized for AI applications, offering advanced search capabilities through neural and traditional keyword approaches. | ||
|
||
## Components | ||
|
||
The tool uses the following components: | ||
|
||
- An Exa API client instance | ||
- An API Key for authentication | ||
- A custom HTTP client (ky) for making API requests | ||
- Input validation using Zod schema | ||
- Configurable search parameters | ||
- Multiple search type options | ||
|
||
## Key Features | ||
|
||
- Neural Search: Meaning-based search using embeddings | ||
- Keyword Search: Traditional search capabilities | ||
- Auto Search: Dynamically chooses between neural and keyword | ||
- Category-focused search (company, research paper, news, github, tweet, etc.) | ||
- Domain and text filtering | ||
- Date-based filtering | ||
- Configurable content retrieval options | ||
- Support for autoprompt query enhancement | ||
|
||
## Input | ||
|
||
The input should be a JSON object with a "query" field containing the search query to process. | ||
|
||
## Output | ||
|
||
The output is the response from Exa's API containing search results based on the configured parameters and search type. | ||
|
||
## Configuration Options | ||
|
||
- `type`: Search type ('neural', 'keyword', or 'auto') | ||
- `useAutoprompt`: Enable query enhancement (for neural search) | ||
- `numResults`: Number of results to return | ||
- `category`: Focus on specific category | ||
- `startPublishedDate`: ISO 8601 date for earliest publish date | ||
- `endPublishedDate`: ISO 8601 date for latest publish date | ||
- `includeDomains`: List of domains to include | ||
- `excludeDomains`: List of domains to exclude | ||
- `includeText`: Text/phrase to include in results | ||
- `excludeText`: Text/phrase to exclude from results | ||
- `startCrawlDate`: ISO 8601 date for earliest crawl date | ||
- `endCrawlDate`: ISO 8601 date for latest crawl date | ||
- `contents`: Configuration for content retrieval | ||
|
||
## Example | ||
|
||
```javascript | ||
const tool = new ExaSearch({ | ||
apiKey: 'your-api-key', | ||
type: 'neural', | ||
useAutoprompt: false, | ||
numResults: 10, | ||
category: 'company' | ||
}); | ||
|
||
const result = await tool._call({ | ||
query: 'AI companies focusing on natural language processing' | ||
}); | ||
``` | ||
|
||
## Advanced Example with Filters | ||
|
||
```javascript | ||
const tool = new ExaSearch({ | ||
apiKey: process.env.EXA_API_KEY, | ||
type: 'neural', | ||
numResults: 20, | ||
includeDomains: ['techcrunch.com', 'wired.com'], | ||
startPublishedDate: '2023-01-01', | ||
contents: { | ||
text: { maxCharacters: 1000, includeHtmlTags: false }, | ||
highlights: { numSentences: 3, highlightsPerUrl: 2 } | ||
} | ||
}); | ||
|
||
try { | ||
const result = await tool._call({ | ||
query: 'recent developments in quantum computing' | ||
}); | ||
console.log(result); | ||
} catch (error) { | ||
console.error('Error performing Exa search:', error); | ||
} | ||
``` | ||
|
||
### Disclaimer | ||
|
||
Ensure you have proper API credentials and respect Exa's usage terms and rate limits. Some features may require specific subscription tiers. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# Firecrawl Tool | ||
|
||
This tool integrates with Firecrawl (https://www.firecrawl.dev/), a web scraping and crawling service designed to turn websites into LLM-ready data. It enables the extraction of clean, well-formatted content from websites, making it ideal for AI applications, particularly those using Large Language Models (LLMs). | ||
|
||
## Components | ||
|
||
The tool uses the following components: | ||
|
||
- A Firecrawl API client instance | ||
- An API Key for authentication | ||
- A custom HTTP client (ky) for making API requests | ||
- Input validation using Zod schema | ||
- Configurable output format | ||
|
||
## Key Features | ||
|
||
- Scrapes and crawls websites, even those with dynamic content | ||
- Converts web content into clean, LLM-ready markdown | ||
- Handles complex web scraping challenges: | ||
- Rate limits | ||
- JavaScript rendering | ||
- Anti-bot mechanisms | ||
- Multiple output format options | ||
- Clean, structured data extraction | ||
- Support for dynamic content | ||
- Automatic content cleaning and formatting | ||
|
||
## Input | ||
|
||
The input should be a JSON object with a "url" field containing the URL to scrape and retrieve content from. | ||
|
||
## Output | ||
|
||
The output is the scraped content from the specified URL, formatted according to the configured format (default: markdown). | ||
|
||
## Configuration Options | ||
|
||
- `apiKey`: Your Firecrawl API key | ||
- `format`: Output format (defaults to 'markdown') | ||
- `mode`: Scraping mode (currently supports 'scrape') | ||
|
||
## Example | ||
|
||
```javascript | ||
const tool = new Firecrawl({ | ||
apiKey: 'your-api-key', | ||
format: 'markdown' | ||
}); | ||
|
||
const result = await tool._call({ | ||
url: 'https://example.com' | ||
}); | ||
``` | ||
|
||
## Advanced Example with Error Handling | ||
|
||
```javascript | ||
const tool = new Firecrawl({ | ||
apiKey: process.env.FIRECRAWL_API_KEY, | ||
format: 'markdown' | ||
}); | ||
|
||
try { | ||
const result = await tool._call({ | ||
url: 'https://example.com/blog/article' | ||
}); | ||
|
||
// Process the scraped content | ||
console.log('Scraped content:', result); | ||
|
||
// Use the content with an LLM or other processing | ||
// ... | ||
} catch (error) { | ||
console.error('Error scraping website:', error); | ||
} | ||
``` | ||
|
||
### Disclaimer | ||
|
||
Ensure you have proper API credentials and respect Firecrawl's usage terms and rate limits. The service offers flexible pricing plans, including a free tier for small-scale use. When scraping websites, make sure to comply with the target website's terms of service and robots.txt directives. |