Skip to content

Commit

Permalink
feat(tools): add Exa and Firecrawl tools with comprehensive README do…
Browse files Browse the repository at this point in the history
…cumentation.
  • Loading branch information
darielnoel committed Dec 19, 2024
1 parent d52b3fa commit 7c976a6
Show file tree
Hide file tree
Showing 2 changed files with 174 additions and 0 deletions.
94 changes: 94 additions & 0 deletions packages/tools/src/exa/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Exa Search Tool

This tool integrates with Exa (https://exa.ai/), a search engine for AI that organizes the web using embeddings. It provides high-quality web data specifically optimized for AI applications, offering advanced search capabilities through neural and traditional keyword approaches.

## Components

The tool uses the following components:

- An Exa API client instance
- An API Key for authentication
- A custom HTTP client (ky) for making API requests
- Input validation using Zod schema
- Configurable search parameters
- Multiple search type options

## Key Features

- Neural Search: Meaning-based search using embeddings
- Keyword Search: Traditional search capabilities
- Auto Search: Dynamically chooses between neural and keyword
- Category-focused search (company, research paper, news, github, tweet, etc.)
- Domain and text filtering
- Date-based filtering
- Configurable content retrieval options
- Support for autoprompt query enhancement

## Input

The input should be a JSON object with a "query" field containing the search query to process.

## Output

The output is the response from Exa's API containing search results based on the configured parameters and search type.

## Configuration Options

- `type`: Search type ('neural', 'keyword', or 'auto')
- `useAutoprompt`: Enable query enhancement (for neural search)
- `numResults`: Number of results to return
- `category`: Focus on specific category
- `startPublishedDate`: ISO 8601 date for earliest publish date
- `endPublishedDate`: ISO 8601 date for latest publish date
- `includeDomains`: List of domains to include
- `excludeDomains`: List of domains to exclude
- `includeText`: Text/phrase to include in results
- `excludeText`: Text/phrase to exclude from results
- `startCrawlDate`: ISO 8601 date for earliest crawl date
- `endCrawlDate`: ISO 8601 date for latest crawl date
- `contents`: Configuration for content retrieval

## Example

```javascript
const tool = new ExaSearch({
apiKey: 'your-api-key',
type: 'neural',
useAutoprompt: false,
numResults: 10,
category: 'company'
});

const result = await tool._call({
query: 'AI companies focusing on natural language processing'
});
```

## Advanced Example with Filters

```javascript
const tool = new ExaSearch({
apiKey: process.env.EXA_API_KEY,
type: 'neural',
numResults: 20,
includeDomains: ['techcrunch.com', 'wired.com'],
startPublishedDate: '2023-01-01',
contents: {
text: { maxCharacters: 1000, includeHtmlTags: false },
highlights: { numSentences: 3, highlightsPerUrl: 2 }
}
});

try {
const result = await tool._call({
query: 'recent developments in quantum computing'
});
console.log(result);
} catch (error) {
console.error('Error performing Exa search:', error);
}
```

### Disclaimer

Ensure you have proper API credentials and respect Exa's usage terms and rate limits. Some features may require specific subscription tiers.
80 changes: 80 additions & 0 deletions packages/tools/src/firecrawl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Firecrawl Tool

This tool integrates with Firecrawl (https://www.firecrawl.dev/), a web scraping and crawling service designed to turn websites into LLM-ready data. It enables the extraction of clean, well-formatted content from websites, making it ideal for AI applications, particularly those using Large Language Models (LLMs).

## Components

The tool uses the following components:

- A Firecrawl API client instance
- An API Key for authentication
- A custom HTTP client (ky) for making API requests
- Input validation using Zod schema
- Configurable output format

## Key Features

- Scrapes and crawls websites, even those with dynamic content
- Converts web content into clean, LLM-ready markdown
- Handles complex web scraping challenges:
- Rate limits
- JavaScript rendering
- Anti-bot mechanisms
- Multiple output format options
- Clean, structured data extraction
- Support for dynamic content
- Automatic content cleaning and formatting

## Input

The input should be a JSON object with a "url" field containing the URL to scrape and retrieve content from.

## Output

The output is the scraped content from the specified URL, formatted according to the configured format (default: markdown).

## Configuration Options

- `apiKey`: Your Firecrawl API key
- `format`: Output format (defaults to 'markdown')
- `mode`: Scraping mode (currently supports 'scrape')

## Example

```javascript
const tool = new Firecrawl({
apiKey: 'your-api-key',
format: 'markdown'
});

const result = await tool._call({
url: 'https://example.com'
});
```

## Advanced Example with Error Handling

```javascript
const tool = new Firecrawl({
apiKey: process.env.FIRECRAWL_API_KEY,
format: 'markdown'
});

try {
const result = await tool._call({
url: 'https://example.com/blog/article'
});

// Process the scraped content
console.log('Scraped content:', result);

// Use the content with an LLM or other processing
// ...
} catch (error) {
console.error('Error scraping website:', error);
}
```

### Disclaimer

Ensure you have proper API credentials and respect Firecrawl's usage terms and rate limits. The service offers flexible pricing plans, including a free tier for small-scale use. When scraping websites, make sure to comply with the target website's terms of service and robots.txt directives.

0 comments on commit 7c976a6

Please sign in to comment.