Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
santteegt committed Nov 7, 2024
1 parent 8e4f8e4 commit 7875cf6
Show file tree
Hide file tree
Showing 19 changed files with 672 additions and 222 deletions.
6 changes: 4 additions & 2 deletions docs/pages/apis.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# API Examples
The repository already includes API pipeline manifest definitions for generating knowledge bases from several REST APIs. Each demonstrates how to define a YAML manifest for extracting data from target API endpoints using different Authentication/Pagination strategies. For a more in-depth review of how to build a manifest for creating a RAG pipeline for your own API, visit [Defining the API Pipeline Manifest](/manifest-definition).
The repository already includes a few API pipeline manifest definitions that showcase how to use the `rag-api-pipeline` for generating knowledge bases from REST APIs.
Each example demonstrates how to define a YAML manifest for extracting data from target API endpoints using different Authentication/Pagination strategies.
For a more in-depth review of how to build a manifest for creating a RAG pipeline for your own API, remember to visit [Defining the API Pipeline Manifest](/manifest-definition) section.

## Boardroom Governance API
[Boardroom](https://boardroom.io/) offers its `Boardrooms Governance API` to provide comprehensive data on 350+ DAOs across chains. It offers endpoints that fetch information about proposals, delegates, discussions, and much more. You can find the complete API documentation at this [link](https://docs.boardroom.io/docs/api/cd5e0c8aa2bc1-overview).
Expand All @@ -12,4 +14,4 @@ The [Agora](https://www.agora.xyz/#Product) OP API provides various endpoints to
Check the [Agora API](/apis/agora-api) section for details on how to extract data from the API and generate a knowledge base related to RetroPGF projects and proposals within the OP collective.

## Working with Other APIs
If you are interested in working with any other API, visit the [API Examples](/apis/other-api-sources) section to get started.
If you are interested in working with any other API, visit the [Other API Sources](/apis/other-api-sources) section to get started.
104 changes: 91 additions & 13 deletions docs/pages/apis/agora-api.mdx
Original file line number Diff line number Diff line change
@@ -1,17 +1,65 @@
# Optimism Agora API

This repository contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_openapi.yaml) and [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_api_pipeline.yaml) needed to create a RAG pipeline. This pipeline generates a knowledge base from RetroPGF projects and proposals within the OP collective. These files are typically located in the `config` folder.
This repository contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_openapi.yaml) and [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_api_pipeline.yaml) needed to create a RAG pipeline.
This pipeline generates a knowledge base from RetroPGF projects and proposals within the OP collective.

To access this API, you'll need an API key. You can request one through [Agora's Discord server](https://www.agora.xyz/#Product). Once obtained, store the key in `config/secrets/api-key` or provide it directly using the `--api-key` CLI argument.
## Pre-requisites

## API Pipeline Manifest - Overview
To access this API, you'll need an API key. You can request one through the [Agora's Discord server](https://www.agora.xyz/#Product). You can run the `rag-api-pipeline setup` command to set the REST API Key,
or your can directly store the key in the `config/secrets/api-key` file. A less secure option is to provide it using the `--api-key` CLI argument.

The API pipeline extracts data from the `/proposals` and `/projects` [endpoints](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_api_pipeline.yaml#L79). Since no `api_parameters` are required, this section remains [empty](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_api_pipeline.yaml#L5).
## Getting the Agora API OpenAPI Spec

Below is the requester definition. The API implements a BearerAuthenticator schema and retrieves the `api_token` from the `config` object:
TODO:

```yaml
# agora_api_pipeline.yaml
## Defining the RAG API Pipeline Manifest

This pipeline will extract data related to DAO proposals (`/proposals`) and RetroPGF projects (`/projects`).
Next, you can find an overview of the main sections in the API pipeline manifest.

### Basic Configuration

Since no `api_parameters` are required, this section remains empty.

```yaml [agora_api_pipeline.yaml]
api_name: "optimism_agora_api"

api_parameters:

api_config:
request_method: "get"
content_type: "application/json"
response_entrypoint_field: "data"
```
### Connector Specification
The manifest then defines some metadata and the request parameters needed for making calls to the API. In this case, it only needs an `api_key`
parameter for authentication:

```yaml [agora_api_pipeline.yaml]
spec:
connection_specification:
$schema: http://json-schema.org/draft-07/schema#
additionalProperties: true
properties:
api_key:
airbyte-secret: true
description: Agora API Key.
type: string
required:
- api_key
title: Agora API Spec
type: object
documentation_url: https://docs.airbyte.com/integrations/sources/agora
type: Spec
```

### API Request Configuration

Below is the `requester_base` definition. The API implements a BearerAuthenticator schema and retrieves the `api_token` from the `config` object:

```yaml [agora_api_pipeline.yaml]
definition:
requester_base:
type: HttpRequester
Expand All @@ -22,10 +70,11 @@ definition:
api_token: "{{ config['api_key'] }}"
```

### Record Selection and Pagination

The API uses an Offset-based pagination strategy. The `page_size` is set to 50, while `offset` and `limit` parameters are dynamically inserted into the URL as request parameters:

```yaml
# agora_api_pipeline.yaml
```yaml [agora_api_pipeline.yaml]
definition:
paginator: # Details at https://docs.airbyte.com/connector-development/config-based/understanding-the-yaml-file/pagination
type: DefaultPaginator
Expand All @@ -42,12 +91,41 @@ definition:
field_name: "limit"
```

## Generating a Knowledge Base Using the `rag-api-pipeline` CLI
### Endpoint Configuration

Below are the target endpoints with their respective schemas:

Before running the `run-all` command, ensure that `Ollama` is running locally with your preferred LLM embeddings model:
```yaml [agora_api_pipeline.yaml]
endpoints:
/proposals:
id: "proposals"
primary_key: "id"
responseSchema: "#/schemas/Proposal"
textSchema:
$ref: "#/textSchemas/Proposal"
/projects:
id: "projects"
primary_key: "id"
responseSchema: "#/schemas/Project"
textSchema:
$ref: "#/textSchemas/Project"
```

## Using the RAG Pipeline to generate a Knowledge Base for the OP Collective

### RAG Pipeline CLI

1. Make sure to setup the pipeline initial settings by running the `rag-api-pipeline setup` command.
2. Execute the following command:

```bash
poetry run rag-api-pipeline run-all config/agora_api_pipeline.yaml --openapi-spec-file config/agora_openapi.yaml --llm-provider ollama
rag-api-pipeline run all config/agora_api_pipeline.yaml config/agora_openapi.yaml
```

After execution, you'll find a compressed knowledge base snapshot in `{OUTPUT_FOLDER}/optimism_agora_api/` named `optimism_agora_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`. For instructions on importing this into your Gaianet node, refer to the documentation on [selecting a knowledge base](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base). Find recommended prompts and node configuration settings [here](/cli/node-deployment#recommended-gaianet-node-configuration).
After execution, you'll find the processed data and compressed knowledge base snapshot in the `output/optimism_agora_api` folder.

### Import the KB Snapshot into a Gaianet Node

1. Locate the generated snapshot in `output/optimism_agora_api/` (named `optimism_agora_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`) or download it from the HuggingFace link above.
2. Follow the official [knowledge base selection guide](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base)
3. Configure your node using the recommended settings from the [node deployment guide](/cli/node-deployment#recommended-gaianet-node-configuration)
90 changes: 59 additions & 31 deletions docs/pages/apis/boardroom-api.mdx
Original file line number Diff line number Diff line change
@@ -1,18 +1,26 @@
# Boardroom Governance API

This repository contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_openapi.yaml) and the [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) needed to create a RAG pipeline. This pipeline generates a knowledge base from any DAO/Protocol hosted by the Boardroom Governance API. All configuration files are located in the `config` folder.
The repository already contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_openapi.yaml) and the [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) needed to create a RAG API pipeline.
This pipeline generates a knowledge base from any DAO/Protocol hosted by the Boardroom Governance API.

## Prerequisites
## Pre-requisites

To use this API, you'll need an API key. Request one from [Boardroom's developer portal](https://boardroom.io/developers/billing). Store the key in `config/secrets/api-key` or provide it directly using the `--api-key` CLI argument.
To use this API, you'll need an API key. Request one from [Boardroom's developer portal](https://boardroom.io/developers/billing). You can run the `rag-api-pipeline setup` command to set the REST API Key,
or your can directly store the key in the `config/secrets/api-key` file. A less secure option is to provide it using the `--api-key` CLI argument.

## API Pipeline Manifest Overview
## Getting the Boardroom API OpenAPI Spec

TODO:

## Defining the RAG API Pipeline Manifest

This pipeline will extract data related to protocol metadata (`/protocols/aave`), DAO proposals (`/protocols/aave/proposals`) and discussion posts from the Discourse forum site (`discourseTopics`, `discourseCategories` and `discourseTopicPosts`) if there's any.

### Basic Configuration

The manifest begins with defining the API name and parameters. This example uses the [Aave Governance DAO](https://boardroom.io/aave/insights):
The manifest starts by defining the API name, parameters and requests settings. You can visit this [link](https://docs.boardroom.io/docs/api/5b445a81af241-get-all-protocols) to get the list of all DAO protocols in Boardroom. This example focuses on the [Aave Governance DAO](https://boardroom.io/aave/insights):

```yaml
```yaml [boardroom_api_pipeline.yaml]
api_name: "aave_boardroom_api"

api_parameters:
Expand All @@ -27,9 +35,9 @@ api_config:
### Connector Specification
The manifest defines parameters required for API requests:
The manifest then defines some metadata and the request parameters needed for making calls to the API:
```yaml
```yaml [boardroom_api_pipeline.yaml]
spec:
type: Spec
documentation_url: https://docs.airbyte.com/integrations/sources/boardroom
Expand Down Expand Up @@ -63,9 +71,9 @@ spec:
### API Request Configuration
The `requester_base` defines how to interact with the API:
Then, the `requester_base` defines the how connector should make requests to the API. Here, an `ApiKeyAuthenticator` schema is required and gets the `api_token` value from the `config` object:

```yaml
```yaml [boardroom_api_pipeline.yaml]
definitions:
requester_base:
type: HttpRequester
Expand All @@ -82,9 +90,9 @@ definitions:

### Record Selection and Pagination

Data records are wrapped in the `data` field:
Data records returned by the API are always wrapped in the `data` field, while pagination is handled using a Cursor-based approach:

```yaml
```yaml [boardroom_api_pipeline.yaml]
definitions:
selector:
type: RecordSelector
Expand All @@ -106,9 +114,9 @@ definitions:

### Endpoint Configuration

Define endpoints with their respective schemas. Example for proposals endpoint:
Now it's time to define the target endpoints with their respective schemas. Below is an example for the *proposals* endpoint:

```yaml
```yaml [boardroom_api_pipeline.yaml]
endpoints:
"/protocols/{cname}/proposals":
id: "proposals"
Expand All @@ -120,9 +128,9 @@ endpoints:

### Schema Definitions

The `responseSchema` defines the complete data structure:
The `responseSchema` reference from above defines the complete *unwrappd* data schema that is returned by the API endpoint:

```yaml
```yaml [boardroom_api_pipeline.yaml]
schemas:
Proposals:
type: object
Expand Down Expand Up @@ -205,9 +213,10 @@ schemas:
type: integer
```

The `textSchema` specifies fields for text parsing. Note that all properties must be listed in the `responseSchema`. In this case, `title`, `content`, and `summary` will be parsed as texts, while other fields will be included as metadata properties in a JSON object:
On the other hand, the endpoint's `textSchema` reference specifies the list of fields for text parsing. Note that all properties are also listed in the `responseSchema`.
In this case, `title`, `content`, and `summary` will be parsed as texts, while other fields will be included as metadata properties in a JSON object:

```yaml
```yaml [boardroom_api_pipeline.yaml]
textSchemas:
Proposal:
type: object
Expand All @@ -222,9 +231,9 @@ textSchemas:

### Chunking Parameters

Configure text chunking behavior:
This section set the settings to be used when applying text chunking to the extracted content:

```yaml
```yaml [boardroom_api_pipeline.yaml]
chunking_params:
mode: "elements"
chunking_strategy: "by_title"
Expand All @@ -237,25 +246,44 @@ chunking_params:
multipage_sections: true
```

## Usage Guide
## Using the RAG Pipeline to generate a Knowledge Base for Aave

### Generating a Knowledge Base
### RAG Pipeline CLI

1. Ensure `Ollama` is running locally with your preferred LLM embeddings model
2. Run the following command:
1. Make sure to setup the pipeline initial settings by running the `rag-api-pipeline setup` command.
2. Execute the following command:

```bash
poetry run rag-api-pipeline run-all config/boardroom_api_pipeline.yaml --openapi-spec-file config/openapi.yaml --llm-provider ollama
rag-api-pipeline run all config/boardroom_api_pipeline.yaml config/boardroom_openapi.yaml
```

The processed data and knowledge base snapshot for Aave are available on [Hugging Face](https://huggingface.co/datasets/uxman/aave_snapshot_boardroom/tree/main).
The processed data and knowledge base snapshot for Aave will be available in the `output/aave_boardroom_api` folder. You can also find a public knowledge base snapshot on [Hugging Face](https://huggingface.co/datasets/uxman/aave_snapshot_boardroom/tree/main).

### Import the KB Snapshot into a Gaianet Node

1. Locate the generated snapshot in `output/aave_boardroom_api/` (named `aave_boardroom_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`) or download it from the HuggingFace link above.
2. Follow the official [knowledge base selection guide](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base)
3. Configure your node using the recommended settings from the [node deployment guide](/cli/node-deployment#recommended-gaianet-node-configuration)

Once the command above finishes, you'll find a compressed knowledge base snapshot in
`{OUTPUT_FOLDER}/aave_boardroom_api/` with name aave_boardroom_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`. Now it's time to import it
into your gaianet node. You can find the instructions on how to select a knowledge base [here](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base).
The recommended prompts and node config settings can be found [here](/cli/node-deployment#recommended-gaianet-node-configuration).

### Example user prompts

- Asking what information the RAG bot is able to provide

![intro_prompt](https://raw.githubusercontent.com/raid-guild/gaianet-rag-api-pipeline/72403cc4503ce65da4e737eb8f68c03aa5772f44/aave_samples/intro.png)

- Asking for information about the proposal [Enable Metis as Collateral on the Metis Chain](https://boardroom.io/aave/proposal/cHJvcG9zYWw6YWF2ZTpvbmNoYWluLXVwZ3JhZGU6MTUy)

![proposal1_prompt](https://raw.githubusercontent.com/raid-guild/gaianet-rag-api-pipeline/72403cc4503ce65da4e737eb8f68c03aa5772f44/aave_samples/proposal1_summary.png)

### Importing into Gaianet Node
- Asking for information about [Onboarding USDS and sUSDS to Aave v3](https://boardroom.io/aave/discussions/18987)

1. Locate the generated snapshot in `{OUTPUT_FOLDER}/aave_boardroom_api/` (named `aave_boardroom_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`)
2. Follow the [knowledge base selection guide](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base)
3. Configure using the recommended settings from the [node deployment guide](/cli/node-deployment#recommended-gaianet-node-configuration)
![proposal1_prompt](https://raw.githubusercontent.com/raid-guild/gaianet-rag-api-pipeline/72403cc4503ce65da4e737eb8f68c03aa5772f44/aave_samples/proposal2_summary.png)

### Customizing for Other DAOs

To generate a knowledge base for a different DAO, modify the `api_name` and `api_parameters` in the [boardroom_api_pipeline.yaml](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) file.
To generate a knowledge base for a different DAO, you just need to modify the `api_name` and `api_parameters` values in the [boardroom_api_pipeline.yaml](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) manifest file.
26 changes: 19 additions & 7 deletions docs/pages/apis/other-api-sources.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,28 @@ Want to supercharge your RAG pipeline with different APIs? We've got you covered
* Look for existing OpenAPI/Swagger specifications

### 2. Schema Setup
* Use help from LLMs to create OpenAPI schemas
* Do checkout this guide on [OpenAPI](https://docs.bump.sh/guides/openapi/specification/v3.1/introduction/what-is-openapi/)
* Look for an official OpenAPI spec file from the API provider.
* Use some help from LLMs to create OpenAPI schemas
* Validate your schema:
* New schemas: [Swagger Editor](https://editor.swagger.io/)
* Existing specs: [Swagger Validator](https://validator.swagger.io/)

### 3. Test and Deploy
* Test each endpoint thoroughly
* Use AI assistance to fix validation issues
* Connect everything to your RAG pipeline

Do checkout this guide on [OpenAPI](https://docs.bump.sh/guides/openapi/specification/v3.1/introduction/what-is-openapi/)
### 3. Define the RAG API Pipeline manifest
* Define the target endpoints and required request parameters
* Get an API Key if needed.
* Check out our [guide](/manifest-definition) or [API examples](/apis) for inspiration.

### 4. Test and Deploy
* Setup the pipeline initial configuration by running the `rag-api-pipeline setup` command.
* Test each endpoint thoroughly:
* Run `rag-api-pipeline run all <API_MANIFEST_FILE> <OPENAPI_SPEC_FILE>` and check for any errors.
* Comment other endpoints in the API manifest.
* Use the `--normalized-only` CLI option and check results in the `output` folder.
* Adjust data chunking parameter settings:
* Use the `--chunked-only` CLI option and analyze results (e.g. using a Jupyter notebook)
* If you want to include recent endpoint data, use the `--full-refresh` CLI option to cleanup the cache.
* Use AI assistance to fix validation issues.
* Connect everything to your RAG pipeline.

Still need help? Feel free reach out or open an issue on this repository!
Loading

0 comments on commit 7875cf6

Please sign in to comment.