update docs

raid-guild · Nov 7, 2024 · 7875cf6 · 7875cf6
1 parent 8e4f8e4
commit 7875cf6
Show file tree

Hide file tree

Showing 19 changed files with 672 additions and 222 deletions.
diff --git a/docs/pages/apis.mdx b/docs/pages/apis.mdx
@@ -1,5 +1,7 @@
 # API Examples
-The repository already includes API pipeline manifest definitions for generating knowledge bases from several REST APIs. Each demonstrates how to define a YAML manifest for extracting data from target API endpoints using different Authentication/Pagination strategies. For a more in-depth review of how to build a manifest for creating a RAG pipeline for your own API, visit [Defining the API Pipeline Manifest](/manifest-definition).
+The repository already includes a few API pipeline manifest definitions that showcase how to use the `rag-api-pipeline` for generating knowledge bases from REST APIs. 
+Each example demonstrates how to define a YAML manifest for extracting data from target API endpoints using different Authentication/Pagination strategies. 
+For a more in-depth review of how to build a manifest for creating a RAG pipeline for your own API, remember to visit [Defining the API Pipeline Manifest](/manifest-definition) section.
 
 ## Boardroom Governance API
 [Boardroom](https://boardroom.io/) offers its `Boardrooms Governance API` to provide comprehensive data on 350+ DAOs across chains. It offers endpoints that fetch information about proposals, delegates, discussions, and much more. You can find the complete API documentation at this [link](https://docs.boardroom.io/docs/api/cd5e0c8aa2bc1-overview).
@@ -12,4 +14,4 @@ The [Agora](https://www.agora.xyz/#Product) OP API provides various endpoints to
 Check the [Agora API](/apis/agora-api) section for details on how to extract data from the API and generate a knowledge base related to RetroPGF projects and proposals within the OP collective.
 
 ## Working with Other APIs
-If you are interested in working with any other API, visit the [API Examples](/apis/other-api-sources) section to get started.
+If you are interested in working with any other API, visit the [Other API Sources](/apis/other-api-sources) section to get started.
diff --git a/docs/pages/apis/agora-api.mdx b/docs/pages/apis/agora-api.mdx
@@ -1,17 +1,65 @@
 # Optimism Agora API
 
-This repository contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_openapi.yaml) and [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_api_pipeline.yaml) needed to create a RAG pipeline. This pipeline generates a knowledge base from RetroPGF projects and proposals within the OP collective. These files are typically located in the `config` folder.
+This repository contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_openapi.yaml) and [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_api_pipeline.yaml) needed to create a RAG pipeline. 
+This pipeline generates a knowledge base from RetroPGF projects and proposals within the OP collective.
 
-To access this API, you'll need an API key. You can request one through [Agora's Discord server](https://www.agora.xyz/#Product). Once obtained, store the key in `config/secrets/api-key` or provide it directly using the `--api-key` CLI argument.
+## Pre-requisites
 
-## API Pipeline Manifest - Overview
+To access this API, you'll need an API key. You can request one through the [Agora's Discord server](https://www.agora.xyz/#Product). You can run the `rag-api-pipeline setup` command to set the REST API Key, 
+or your can directly store the key in the `config/secrets/api-key` file. A less secure option is to provide it using the `--api-key` CLI argument.
 
-The API pipeline extracts data from the `/proposals` and `/projects` [endpoints](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_api_pipeline.yaml#L79). Since no `api_parameters` are required, this section remains [empty](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/agora_api_pipeline.yaml#L5).
+## Getting the Agora API OpenAPI Spec
 
-Below is the requester definition. The API implements a BearerAuthenticator schema and retrieves the `api_token` from the `config` object:
+TODO:
 
-```yaml
-# agora_api_pipeline.yaml
+## Defining the RAG API Pipeline Manifest
+
+This pipeline will extract data related to DAO proposals (`/proposals`) and RetroPGF projects (`/projects`). 
+Next, you can find an overview of the main sections in the API pipeline manifest.
+
+### Basic Configuration
+
+Since no `api_parameters` are required, this section remains empty.
+
+```yaml [agora_api_pipeline.yaml]
+api_name: "optimism_agora_api"
+
+api_parameters:
+
+api_config:
+  request_method: "get"
+  content_type: "application/json"
+  response_entrypoint_field: "data"
+```
+
+### Connector Specification
+
+The manifest then defines some metadata and the request parameters needed for making calls to the API. In this case, it only needs an `api_key` 
+parameter for authentication:
+
+```yaml [agora_api_pipeline.yaml]
+spec:
+  connection_specification:
+    $schema: http://json-schema.org/draft-07/schema#
+    additionalProperties: true
+    properties:
+      api_key:
+        airbyte-secret: true
+        description: Agora API Key.
+        type: string
+    required:
+    - api_key
+    title: Agora API Spec
+    type: object
+  documentation_url: https://docs.airbyte.com/integrations/sources/agora
+  type: Spec
+```
+
+### API Request Configuration
+
+Below is the `requester_base` definition. The API implements a BearerAuthenticator schema and retrieves the `api_token` from the `config` object:
+
+```yaml [agora_api_pipeline.yaml]
 definition:
   requester_base:
     type: HttpRequester
@@ -22,10 +70,11 @@ definition:
       api_token: "{{ config['api_key'] }}"
 ```
 
+### Record Selection and Pagination
+
 The API uses an Offset-based pagination strategy. The `page_size` is set to 50, while `offset` and `limit` parameters are dynamically inserted into the URL as request parameters:
 
-```yaml
-# agora_api_pipeline.yaml
+```yaml [agora_api_pipeline.yaml]
 definition:
   paginator: # Details at https://docs.airbyte.com/connector-development/config-based/understanding-the-yaml-file/pagination
     type: DefaultPaginator
@@ -42,12 +91,41 @@ definition:
       field_name: "limit"
 ```
 
-## Generating a Knowledge Base Using the `rag-api-pipeline` CLI
+### Endpoint Configuration
+
+Below are the target endpoints with their respective schemas:
 
-Before running the `run-all` command, ensure that `Ollama` is running locally with your preferred LLM embeddings model:
+```yaml [agora_api_pipeline.yaml]
+endpoints:
+  /proposals:
+    id: "proposals"
+    primary_key: "id"
+    responseSchema: "#/schemas/Proposal"
+    textSchema:
+      $ref: "#/textSchemas/Proposal"
+  /projects:
+    id: "projects"
+    primary_key: "id"
+    responseSchema: "#/schemas/Project"
+    textSchema:
+      $ref: "#/textSchemas/Project"
+```
+
+## Using the RAG Pipeline to generate a Knowledge Base for the OP Collective
+
+### RAG Pipeline CLI
+
+1. Make sure to setup the pipeline initial settings by running the `rag-api-pipeline setup` command.
+2. Execute the following command:
 
 ```bash
-poetry run rag-api-pipeline run-all config/agora_api_pipeline.yaml --openapi-spec-file config/agora_openapi.yaml --llm-provider ollama
+rag-api-pipeline run all config/agora_api_pipeline.yaml config/agora_openapi.yaml
 ```
 
-After execution, you'll find a compressed knowledge base snapshot in `{OUTPUT_FOLDER}/optimism_agora_api/` named `optimism_agora_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`. For instructions on importing this into your Gaianet node, refer to the documentation on [selecting a knowledge base](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base). Find recommended prompts and node configuration settings [here](/cli/node-deployment#recommended-gaianet-node-configuration).
+After execution, you'll find the processed data and compressed knowledge base snapshot in the `output/optimism_agora_api` folder.
+
+### Import the KB Snapshot into a Gaianet Node
+
+1. Locate the generated snapshot in `output/optimism_agora_api/` (named `optimism_agora_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`) or download it from the HuggingFace link above.
+2. Follow the official [knowledge base selection guide](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base)
+3. Configure your node using the recommended settings from the [node deployment guide](/cli/node-deployment#recommended-gaianet-node-configuration)
diff --git a/docs/pages/apis/boardroom-api.mdx b/docs/pages/apis/boardroom-api.mdx
@@ -1,18 +1,26 @@
 # Boardroom Governance API
 
-This repository contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_openapi.yaml) and the [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) needed to create a RAG pipeline. This pipeline generates a knowledge base from any DAO/Protocol hosted by the Boardroom Governance API. All configuration files are located in the `config` folder.
+The repository already contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_openapi.yaml) and the [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) needed to create a RAG API pipeline. 
+This pipeline generates a knowledge base from any DAO/Protocol hosted by the Boardroom Governance API.
 
-## Prerequisites
+## Pre-requisites
 
-To use this API, you'll need an API key. Request one from [Boardroom's developer portal](https://boardroom.io/developers/billing). Store the key in `config/secrets/api-key` or provide it directly using the `--api-key` CLI argument.
+To use this API, you'll need an API key. Request one from [Boardroom's developer portal](https://boardroom.io/developers/billing). You can run the `rag-api-pipeline setup` command to set the REST API Key, 
+or your can directly store the key in the `config/secrets/api-key` file. A less secure option is to provide it using the `--api-key` CLI argument.
 
-## API Pipeline Manifest Overview
+## Getting the Boardroom API OpenAPI Spec
+
+TODO:
+
+## Defining the RAG API Pipeline Manifest
+
+This pipeline will extract data related to protocol metadata (`/protocols/aave`), DAO proposals (`/protocols/aave/proposals`) and discussion posts from the Discourse forum site (`discourseTopics`, `discourseCategories` and `discourseTopicPosts`) if there's any.
 
 ### Basic Configuration
 
-The manifest begins with defining the API name and parameters. This example uses the [Aave Governance DAO](https://boardroom.io/aave/insights):
+The manifest starts by defining the API name, parameters and requests settings. You can visit this [link](https://docs.boardroom.io/docs/api/5b445a81af241-get-all-protocols) to get the list of all DAO protocols in Boardroom. This example focuses on the [Aave Governance DAO](https://boardroom.io/aave/insights):
 
-```yaml
+```yaml [boardroom_api_pipeline.yaml]
 api_name: "aave_boardroom_api"
 
 api_parameters:
@@ -27,9 +35,9 @@ api_config:
 
 ### Connector Specification
 
-The manifest defines parameters required for API requests:
+The manifest then defines some metadata and the request parameters needed for making calls to the API:
 
-```yaml
+```yaml [boardroom_api_pipeline.yaml]
 spec:
   type: Spec
   documentation_url: https://docs.airbyte.com/integrations/sources/boardroom
@@ -63,9 +71,9 @@ spec:
 
 ### API Request Configuration
 
-The `requester_base` defines how to interact with the API:
+Then, the `requester_base` defines the how connector should make requests to the API. Here, an `ApiKeyAuthenticator` schema is required and gets the `api_token` value from the `config` object:
 
-```yaml
+```yaml [boardroom_api_pipeline.yaml]
 definitions:
   requester_base:
     type: HttpRequester
@@ -82,9 +90,9 @@ definitions:
 
 ### Record Selection and Pagination
 
-Data records are wrapped in the `data` field:
+Data records returned by the API are always wrapped in the `data` field, while pagination is handled using a Cursor-based approach:
 
-```yaml
+```yaml [boardroom_api_pipeline.yaml]
 definitions:
   selector:
     type: RecordSelector
@@ -106,9 +114,9 @@ definitions:
 
 ### Endpoint Configuration
 
-Define endpoints with their respective schemas. Example for proposals endpoint:
+Now it's time to define the target endpoints with their respective schemas. Below is an example for the *proposals* endpoint:
 
-```yaml
+```yaml [boardroom_api_pipeline.yaml]
 endpoints:
   "/protocols/{cname}/proposals":
     id: "proposals"
@@ -120,9 +128,9 @@ endpoints:
 
 ### Schema Definitions
 
-The `responseSchema` defines the complete data structure:
+The `responseSchema` reference from above defines the complete *unwrappd* data schema that is returned by the API endpoint:
 
-```yaml
+```yaml [boardroom_api_pipeline.yaml]
 schemas:
   Proposals:
     type: object
@@ -205,9 +213,10 @@ schemas:
         type: integer
 ```
 
-The `textSchema` specifies fields for text parsing. Note that all properties must be listed in the `responseSchema`. In this case, `title`, `content`, and `summary` will be parsed as texts, while other fields will be included as metadata properties in a JSON object:
+On the other hand, the endpoint's `textSchema` reference specifies the list of fields for text parsing. Note that all properties are also listed in the `responseSchema`. 
+In this case, `title`, `content`, and `summary` will be parsed as texts, while other fields will be included as metadata properties in a JSON object:
 
-```yaml
+```yaml [boardroom_api_pipeline.yaml]
 textSchemas:
   Proposal:
     type: object
@@ -222,9 +231,9 @@ textSchemas:
 
 ### Chunking Parameters
 
-Configure text chunking behavior:
+This section set the settings to be used when applying text chunking to the extracted content:
 
-```yaml
+```yaml [boardroom_api_pipeline.yaml]
 chunking_params:
   mode: "elements"
   chunking_strategy: "by_title"
@@ -237,25 +246,44 @@ chunking_params:
   multipage_sections: true
 ```
 
-## Usage Guide
+## Using the RAG Pipeline to generate a Knowledge Base for Aave
 
-### Generating a Knowledge Base
+### RAG Pipeline CLI
 
-1. Ensure `Ollama` is running locally with your preferred LLM embeddings model
-2. Run the following command:
+1. Make sure to setup the pipeline initial settings by running the `rag-api-pipeline setup` command.
+2. Execute the following command:
 
 ```bash
-poetry run rag-api-pipeline run-all config/boardroom_api_pipeline.yaml --openapi-spec-file config/openapi.yaml --llm-provider ollama
+rag-api-pipeline run all config/boardroom_api_pipeline.yaml config/boardroom_openapi.yaml
 ```
 
-The processed data and knowledge base snapshot for Aave are available on [Hugging Face](https://huggingface.co/datasets/uxman/aave_snapshot_boardroom/tree/main).
+The processed data and knowledge base snapshot for Aave will be available in the `output/aave_boardroom_api` folder. You can also find a public knowledge base snapshot on [Hugging Face](https://huggingface.co/datasets/uxman/aave_snapshot_boardroom/tree/main).
+
+### Import the KB Snapshot into a Gaianet Node
+
+1. Locate the generated snapshot in `output/aave_boardroom_api/` (named `aave_boardroom_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`) or download it from the HuggingFace link above.
+2. Follow the official [knowledge base selection guide](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base)
+3. Configure your node using the recommended settings from the [node deployment guide](/cli/node-deployment#recommended-gaianet-node-configuration)
+
+Once the command above finishes, you'll find a compressed knowledge base snapshot in
+`{OUTPUT_FOLDER}/aave_boardroom_api/` with name aave_boardroom_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`. Now it's time to import it 
+into your gaianet node. You can find the instructions on how to select a knowledge base [here](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base). 
+The recommended prompts and node config settings can be found [here](/cli/node-deployment#recommended-gaianet-node-configuration).
+
+### Example user prompts
+
+- Asking what information the RAG bot is able to provide
+
+![intro_prompt](https://raw.githubusercontent.com/raid-guild/gaianet-rag-api-pipeline/72403cc4503ce65da4e737eb8f68c03aa5772f44/aave_samples/intro.png)
+
+- Asking for information about the proposal [Enable Metis as Collateral on the Metis Chain](https://boardroom.io/aave/proposal/cHJvcG9zYWw6YWF2ZTpvbmNoYWluLXVwZ3JhZGU6MTUy)
+
+![proposal1_prompt](https://raw.githubusercontent.com/raid-guild/gaianet-rag-api-pipeline/72403cc4503ce65da4e737eb8f68c03aa5772f44/aave_samples/proposal1_summary.png)
 
-### Importing into Gaianet Node
+- Asking for information about [Onboarding USDS and sUSDS to Aave v3](https://boardroom.io/aave/discussions/18987)
 
-1. Locate the generated snapshot in `{OUTPUT_FOLDER}/aave_boardroom_api/` (named `aave_boardroom_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`)
-2. Follow the [knowledge base selection guide](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base)
-3. Configure using the recommended settings from the [node deployment guide](/cli/node-deployment#recommended-gaianet-node-configuration)
+![proposal1_prompt](https://raw.githubusercontent.com/raid-guild/gaianet-rag-api-pipeline/72403cc4503ce65da4e737eb8f68c03aa5772f44/aave_samples/proposal2_summary.png)
 
 ### Customizing for Other DAOs
 
-To generate a knowledge base for a different DAO, modify the `api_name` and `api_parameters` in the [boardroom_api_pipeline.yaml](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) file.
+To generate a knowledge base for a different DAO, you just need to modify the `api_name` and `api_parameters` values in the [boardroom_api_pipeline.yaml](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) manifest file.
diff --git a/docs/pages/apis/other-api-sources.mdx b/docs/pages/apis/other-api-sources.mdx
@@ -16,16 +16,28 @@ Want to supercharge your RAG pipeline with different APIs? We've got you covered
 * Look for existing OpenAPI/Swagger specifications
 
 ### 2. Schema Setup
-* Use help from LLMs to create OpenAPI schemas
+* Do checkout this guide on [OpenAPI](https://docs.bump.sh/guides/openapi/specification/v3.1/introduction/what-is-openapi/)
+* Look for an official OpenAPI spec file from the API provider.
+* Use some help from LLMs to create OpenAPI schemas
 * Validate your schema:
   * New schemas: [Swagger Editor](https://editor.swagger.io/)
   * Existing specs: [Swagger Validator](https://validator.swagger.io/)
 
-### 3. Test and Deploy
-* Test each endpoint thoroughly
-* Use AI assistance to fix validation issues
-* Connect everything to your RAG pipeline
-
-Do checkout this guide on [OpenAPI](https://docs.bump.sh/guides/openapi/specification/v3.1/introduction/what-is-openapi/)
+### 3. Define the RAG API Pipeline manifest
+* Define the target endpoints and required request parameters
+* Get an API Key if needed.
+* Check out our [guide](/manifest-definition) or [API examples](/apis) for inspiration.
+
+### 4. Test and Deploy
+* Setup the pipeline initial configuration by running the `rag-api-pipeline setup` command.
+* Test each endpoint thoroughly:
+  * Run `rag-api-pipeline run all <API_MANIFEST_FILE> <OPENAPI_SPEC_FILE>` and check for any errors.
+  * Comment other endpoints in the API manifest.
+  * Use the `--normalized-only` CLI option and check results in the `output` folder.
+* Adjust data chunking parameter settings:
+  * Use the `--chunked-only` CLI option and analyze results (e.g. using a Jupyter notebook)
+* If you want to include recent endpoint data, use the `--full-refresh` CLI option to cleanup the cache.
+* Use AI assistance to fix validation issues.
+* Connect everything to your RAG pipeline.
 
 Still need help? Feel free reach out or open an issue on this repository!