feat(tools): add Exa and Firecrawl tools with comprehensive README do…

…cumentation.
kaiban-ai · Dec 19, 2024 · 7c976a6 · 7c976a6
1 parent d52b3fa
commit 7c976a6
Show file tree

Hide file tree

Showing 2 changed files with 174 additions and 0 deletions.
diff --git a/packages/tools/src/exa/README.md b/packages/tools/src/exa/README.md
@@ -0,0 +1,94 @@
+# Exa Search Tool
+
+This tool integrates with Exa (https://exa.ai/), a search engine for AI that organizes the web using embeddings. It provides high-quality web data specifically optimized for AI applications, offering advanced search capabilities through neural and traditional keyword approaches.
+
+## Components
+
+The tool uses the following components:
+
+- An Exa API client instance
+- An API Key for authentication
+- A custom HTTP client (ky) for making API requests
+- Input validation using Zod schema
+- Configurable search parameters
+- Multiple search type options
+
+## Key Features
+
+- Neural Search: Meaning-based search using embeddings
+- Keyword Search: Traditional search capabilities
+- Auto Search: Dynamically chooses between neural and keyword
+- Category-focused search (company, research paper, news, github, tweet, etc.)
+- Domain and text filtering
+- Date-based filtering
+- Configurable content retrieval options
+- Support for autoprompt query enhancement
+
+## Input
+
+The input should be a JSON object with a "query" field containing the search query to process.
+
+## Output
+
+The output is the response from Exa's API containing search results based on the configured parameters and search type.
+
+## Configuration Options
+
+- `type`: Search type ('neural', 'keyword', or 'auto')
+- `useAutoprompt`: Enable query enhancement (for neural search)
+- `numResults`: Number of results to return
+- `category`: Focus on specific category
+- `startPublishedDate`: ISO 8601 date for earliest publish date
+- `endPublishedDate`: ISO 8601 date for latest publish date
+- `includeDomains`: List of domains to include
+- `excludeDomains`: List of domains to exclude
+- `includeText`: Text/phrase to include in results
+- `excludeText`: Text/phrase to exclude from results
+- `startCrawlDate`: ISO 8601 date for earliest crawl date
+- `endCrawlDate`: ISO 8601 date for latest crawl date
+- `contents`: Configuration for content retrieval
+
+## Example
+
+```javascript
+const tool = new ExaSearch({
+  apiKey: 'your-api-key',
+  type: 'neural',
+  useAutoprompt: false,
+  numResults: 10,
+  category: 'company'
+});
+
+const result = await tool._call({ 
+  query: 'AI companies focusing on natural language processing' 
+});
+```
+
+## Advanced Example with Filters
+
+```javascript
+const tool = new ExaSearch({
+  apiKey: process.env.EXA_API_KEY,
+  type: 'neural',
+  numResults: 20,
+  includeDomains: ['techcrunch.com', 'wired.com'],
+  startPublishedDate: '2023-01-01',
+  contents: {
+    text: { maxCharacters: 1000, includeHtmlTags: false },
+    highlights: { numSentences: 3, highlightsPerUrl: 2 }
+  }
+});
+
+try {
+  const result = await tool._call({ 
+    query: 'recent developments in quantum computing' 
+  });
+  console.log(result);
+} catch (error) {
+  console.error('Error performing Exa search:', error);
+}
+```
+
+### Disclaimer
+
+Ensure you have proper API credentials and respect Exa's usage terms and rate limits. Some features may require specific subscription tiers. 
diff --git a/packages/tools/src/firecrawl/README.md b/packages/tools/src/firecrawl/README.md
@@ -0,0 +1,80 @@
+# Firecrawl Tool
+
+This tool integrates with Firecrawl (https://www.firecrawl.dev/), a web scraping and crawling service designed to turn websites into LLM-ready data. It enables the extraction of clean, well-formatted content from websites, making it ideal for AI applications, particularly those using Large Language Models (LLMs).
+
+## Components
+
+The tool uses the following components:
+
+- A Firecrawl API client instance
+- An API Key for authentication
+- A custom HTTP client (ky) for making API requests
+- Input validation using Zod schema
+- Configurable output format
+
+## Key Features
+
+- Scrapes and crawls websites, even those with dynamic content
+- Converts web content into clean, LLM-ready markdown
+- Handles complex web scraping challenges:
+  - Rate limits
+  - JavaScript rendering
+  - Anti-bot mechanisms
+- Multiple output format options
+- Clean, structured data extraction
+- Support for dynamic content
+- Automatic content cleaning and formatting
+
+## Input
+
+The input should be a JSON object with a "url" field containing the URL to scrape and retrieve content from.
+
+## Output
+
+The output is the scraped content from the specified URL, formatted according to the configured format (default: markdown).
+
+## Configuration Options
+
+- `apiKey`: Your Firecrawl API key
+- `format`: Output format (defaults to 'markdown')
+- `mode`: Scraping mode (currently supports 'scrape')
+
+## Example
+
+```javascript
+const tool = new Firecrawl({
+  apiKey: 'your-api-key',
+  format: 'markdown'
+});
+
+const result = await tool._call({ 
+  url: 'https://example.com' 
+});
+```
+
+## Advanced Example with Error Handling
+
+```javascript
+const tool = new Firecrawl({
+  apiKey: process.env.FIRECRAWL_API_KEY,
+  format: 'markdown'
+});
+
+try {
+  const result = await tool._call({ 
+    url: 'https://example.com/blog/article' 
+  });
+
+  // Process the scraped content
+  console.log('Scraped content:', result);
+
+  // Use the content with an LLM or other processing
+  // ...
+} catch (error) {
+  console.error('Error scraping website:', error);
+}
+```
+
+### Disclaimer
+
+Ensure you have proper API credentials and respect Firecrawl's usage terms and rate limits. The service offers flexible pricing plans, including a free tier for small-scale use. When scraping websites, make sure to comply with the target website's terms of service and robots.txt directives.