replaced gaianet with gaia

raid-guild · Nov 26, 2024 · 6f5d489 · 6f5d489
1 parent 5f57fb3
commit 6f5d489
Show file tree

Hide file tree

Showing 10 changed files with 100 additions and 98 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# GaiaNet x RAG API Pipeline
+# Gaia x RAG API Pipeline
 
 `rag-api-pipeline` is a Python-based data pipeline tool that allows you to easily generate a vector knowledge base from any REST API data source. The resulting database snapshot can be then plugged-in into a Gaia node's LLM model with a prompt and provide contextual responses to user queries using RAG (Retrieval Augmented Generation).
 
@@ -11,7 +11,7 @@ The following sections help you to quickly setup and execute the pipeline on you
   - (Optional): a Python virtual environment manager of your preference (e.g. conda, venv)
 - Qdrant vector database ([Docs](https://qdrant.tech/documentation/))
   - (Optional): Docker to spin up a local container
-- LLM model provider ([spin up your own Gaia node](docs/pages/cli/node-deployment.mdx) or pick one from the [GaiaNet public network](https://www.gaianet.ai/chat))
+- LLM model provider ([spin up your own Gaia node](docs/pages/cli/node-deployment.mdx) or pick one from the [Gaia public network](https://www.gaianet.ai/chat))
   - An Embeddings model (e.g. [Nomic-embed-text-v1.5](https://huggingface.co/gaianet/Nomic-embed-text-v1.5-Embedding-GGUF/tree/main?show_file_info=nomic-embed-text-v1.5.f16.gguf))
 
 ## Setup Instructions
@@ -24,9 +24,9 @@ Git clone or download this repository to your local machine.
 git clone https://github.com/raid-guild/gaianet-rag-api-pipeline.git
 ```
 
-### 2. Install the Pipeline CLI 
+### 2. Install the Pipeline CLI
 
-It is recommended to activate your [own virtual environment](https://python-poetry.org/docs/basic-usage/#using-your-virtual-environment). 
+It is recommended to activate your [own virtual environment](https://python-poetry.org/docs/basic-usage/#using-your-virtual-environment).
 Then, navigate to the directory where this repository was cloned/download and execute the following command to install the `rag-api-pipeline` CLI:
 
 ```bash
@@ -53,24 +53,24 @@ rag-api-pipeline run all config/boardroom_api_pipeline.yaml config/boardroom_ope
 ```
 
 You are required two specify two main arguments to the pipeline:
-- The path to the OpenAPI specification file (e.g. `config/boardroom_openapi.yaml`): the OpenAPI spec for the REST API data source 
+- The path to the OpenAPI specification file (e.g. `config/boardroom_openapi.yaml`): the OpenAPI spec for the REST API data source
 you're looking to extract data from.
-- The path to the API pipeline manifest file (e.g. `config/boardroom_api_pipeline.yaml`): a YAML file that defines API endpoints you're 
+- The path to the API pipeline manifest file (e.g. `config/boardroom_api_pipeline.yaml`): a YAML file that defines API endpoints you're
 looking to extract data from, among other parameters (more details in the next section).
 
 Once the pipeline execution is completed, you'll find the vector database snapshot and extracted/processed datasets under the `output/molochdao_boardroom_api` folder.
 
 ## Define your own API Pipeline manifest
 
-Now it's time to define the pipeline manifest for the REST API you're looking to extract data from. Make sure you get the OpenAPI specification 
-for the API you're targeting. Check the 
-[Defining an API Pipeline Manifest](docs/pages/manifest-definition/overview.mdx) page for details on how to get the OpenAPI spec and define an API pipeline manifest, 
+Now it's time to define the pipeline manifest for the REST API you're looking to extract data from. Make sure you get the OpenAPI specification
+for the API you're targeting. Check the
+[Defining an API Pipeline Manifest](docs/pages/manifest-definition/overview.mdx) page for details on how to get the OpenAPI spec and define an API pipeline manifest,
 or take a look at the in-depth review of the sample manifests available in the [API Examples](docs/pages/apis) folder.
 
 ## Using the Pipeline CLI
 
-Once you have both the API pipeline manifest and OpenAPI spec files, you're ready to start using the `rag-api-pipeline run` command to execute different tasks of the RAG pipeline, 
-from extracting data from an API source to generating vector embeddings and a database snapshot. If you need more details about the parameters available 
+Once you have both the API pipeline manifest and OpenAPI spec files, you're ready to start using the `rag-api-pipeline run` command to execute different tasks of the RAG pipeline,
+from extracting data from an API source to generating vector embeddings and a database snapshot. If you need more details about the parameters available
 on each task you can execute:
 
 ```bash [Terminal]

diff --git a/docs/pages/apis/boardroom-api.mdx b/docs/pages/apis/boardroom-api.mdx
@@ -1,11 +1,11 @@
 # Boardroom Governance API
 
-The repository already contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_openapi.yaml) and the [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) needed to create a RAG API pipeline. 
+The repository already contains the [OpenAPI specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_openapi.yaml) and the [API pipeline manifest](https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/config/boardroom_api_pipeline.yaml) needed to create a RAG API pipeline.
 This pipeline generates a knowledge base from any DAO/Protocol hosted by the Boardroom Governance API.
 
 ## Pre-requisites
 
-To use this API, you'll need an API key. Request one from [Boardroom's developer portal](https://boardroom.io/developers/billing). You can run the `rag-api-pipeline setup` command to set the REST API Key, 
+To use this API, you'll need an API key. Request one from [Boardroom's developer portal](https://boardroom.io/developers/billing). You can run the `rag-api-pipeline setup` command to set the REST API Key,
 or your can directly store the key in the `config/secrets/api-key` file. A less secure option is to provide it using the `--api-key` CLI argument.
 
 ## Getting the Boardroom API OpenAPI Spec
@@ -213,7 +213,7 @@ schemas:
         type: integer
 ```
 
-On the other hand, the endpoint's `textSchema` reference specifies the list of fields for text parsing. Note that all properties are also listed in the `responseSchema`. 
+On the other hand, the endpoint's `textSchema` reference specifies the list of fields for text parsing. Note that all properties are also listed in the `responseSchema`.
 In this case, `title`, `content`, and `summary` will be parsed as texts, while other fields will be included as metadata properties in a JSON object:
 
 ```yaml [boardroom_api_pipeline.yaml]
@@ -259,15 +259,15 @@ rag-api-pipeline run all config/boardroom_api_pipeline.yaml config/boardroom_ope
 
 The processed data and knowledge base snapshot for Aave will be available in the `output/aave_boardroom_api` folder. You can also find a public knowledge base snapshot on [Hugging Face](https://huggingface.co/datasets/uxman/aave_snapshot_boardroom/tree/main).
 
-### Import the KB Snapshot into a Gaianet Node
+### Import the KB Snapshot into a Gaia Node
 
 1. Locate the generated snapshot in `output/aave_boardroom_api/` (named `aave_boardroom_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`) or download it from the HuggingFace link above.
 2. Follow the official [knowledge base selection guide](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base)
 3. Configure your node using the recommended settings from the [node deployment guide](/cli/node-deployment#recommended-gaianet-node-configuration)
 
 Once the command above finishes, you'll find a compressed knowledge base snapshot in
-`{OUTPUT_FOLDER}/aave_boardroom_api/` with name aave_boardroom_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`. Now it's time to import it 
-into your gaianet node. You can find the instructions on how to select a knowledge base [here](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base). 
+`{OUTPUT_FOLDER}/aave_boardroom_api/` with name aave_boardroom_api_collection-xxxxxxxxxxxxxxxx-yyyy-mm-dd-hh-mm-ss.snapshot.tar.gz`. Now it's time to import it
+into your gaia node. You can find the instructions on how to select a knowledge base [here](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base).
 The recommended prompts and node config settings can be found [here](/cli/node-deployment#recommended-gaianet-node-configuration).
 
 ### Example user prompts

diff --git a/docs/pages/architecture/tech-stack.mdx b/docs/pages/architecture/tech-stack.mdx
@@ -5,7 +5,7 @@ This page outlines the technologies and tools integrated into the `rag-api-pipel
 ## Tools & Frameworks
 
 ### 1. RAG Pipeline over Data Stream: Pathway ([Docs](https://pathway.com/developers/user-guide/introduction/welcome/))
-- **Description**: A Python-based data processing framework designed for creating AI-driven pipelines over data streams 
+- **Description**: A Python-based data processing framework designed for creating AI-driven pipelines over data streams
 - **Core Technology**:
   - **Rust Engine** with multithreading and multiprocessing capabilities for high performance
 - **Use Case**: Efficient data processing, enabling integration with third-party data-related tools and AI models to process large, real-time data streams
@@ -31,7 +31,7 @@ This page outlines the technologies and tools integrated into the `rag-api-pipel
 ### 5. Feature Embedding Generation:
 - **Description**: connects to a LLM provider and is responsible for generating feature embeddings, which create dense vector representations of the extracted data.
 - **Technologies Used**:
-  - **Gaianet Node** ([Docs](https://docs.gaianet.ai/category/node-operator-guide)): Offers a *RAG API Server* that provides an *OpenAI-like API* to interact with hosted LLM models
+  - **Gaia Node** ([Docs](https://docs.gaianet.ai/category/node-operator-guide)): Offers a *RAG API Server* that provides an *OpenAI-like API* to interact with hosted LLM models
   - **Ollama** ([Docs](https://ollama.com/)): Easy-to-install LLM engine for running large language models on a local machine
 - **Python Libraries**:
   - [litellm](https://docs.litellm.ai/docs/providers/openai_compatible) Python library for connecting with OpenAI-compatible LLM providers
@@ -41,4 +41,4 @@ This page outlines the technologies and tools integrated into the `rag-api-pipel
 - **Description**: A **vector database** and **vector similarity search engine**
 - **Key Features**:
   - Provides efficient vector searches based on similarity, crucial for tasks like nearest-neighbor search in large datasets
-  - Acts as a **knowledge base snapshot** repository, storing vectors generated from processed data and feature embeddings
+  - Acts as a **knowledge base snapshot** repository, storing vectors generated from processed data and feature embeddings
diff --git a/docs/pages/cli/node-deployment.mdx b/docs/pages/cli/node-deployment.mdx
@@ -2,13 +2,13 @@
 
 ## Quick start guide
 
-We recommend to follow the GaiaNet Official [quick start guide](https://docs.gaianet.ai/node-guide/quick-start). Your GaiaNet node will 
+We recommend to follow the Gaia Official [quick start guide](https://docs.gaianet.ai/node-guide/quick-start). Your Gaia node will
 be setup in the `GAIANET_BASE_DIR` (default: `"$HOME/gaianet"`) directory.
 
 ## Deploying your GaiaNet node in *embeddings* running mode (⚠️**Recommended**)
 
-The `rag-api-pipeline` requires an embeddings model to generate vector embeddings from the API data source. At this stage, we recommend to 
-start your GaiaNet node in *embeddings-only* mode (thus consuming less resources than starting the full node) by running the following command:
+The `rag-api-pipeline` requires an embeddings model to generate vector embeddings from the API data source. At this stage, we recommend to
+start your Gaia node in *embeddings-only* mode (thus consuming less resources than starting the full node) by running the following command:
 
 ```bash [Terminal]
 cd $GAIANET_BASE_DIR
@@ -22,9 +22,9 @@ wasmedge --dir .:./dashboard --env NODE_VERSION=0.4.7 \
 - `--model-name <model_name>`: specifies the embeddings model name
 - `--ctx-size` and `--batch-size` should be set according to the selected embeddings model
 
-## Selecting a knowledge base and custom prompts for your GaiaNet node
+## Selecting a knowledge base and custom prompts for your Gaia node
 
-In order to supplement the LLM model hosted on your Gaia node with a custom knowledge base and prompts follow the instructions outlined in this [link](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base). 
+In order to supplement the LLM model hosted on your Gaia node with a custom knowledge base and prompts follow the instructions outlined in this [link](https://docs.gaianet.ai/node-guide/customize#select-a-knowledge-base).
 Remember to re-initialize and re-start the node after you make any configuration changes.
 
 ```bash [Terminal]

diff --git a/docs/pages/cli/other-llm-providers.mdx b/docs/pages/cli/other-llm-providers.mdx
@@ -1,13 +1,13 @@
 # Supported LLM providers
 
-The `rag-api-pipeline` currently supports two types of LLM providers: `openai` and `ollama`. A Gaianet node for example, uses a Rust-based [RAG API Server](https://github.com/LlamaEdge/rag-api-server) 
+The `rag-api-pipeline` currently supports two types of LLM providers: `openai` and `ollama`. A Gaia node for example, uses a Rust-based [RAG API Server](https://github.com/LlamaEdge/rag-api-server)
 to offer OpenAI-compatible web APIs for creating RAG applications.
 
 In the following sections, you'll find more details on the supported LLM providers that the pipeline currently supports and how to setup them
 
 ## OpenAI
 
-By default, the pipeline supports any LLM provider that offers OpenAI-compatible web APIs. If you wanna work with a provider other than Gaianet, 
+By default, the pipeline supports any LLM provider that offers OpenAI-compatible web APIs. If you wanna work with a provider other than Gaia,
 you can setup the connection using the setup wizard via the `rag-api-pipeline setup` command:
 
 ```bash [Terminal]
@@ -17,7 +17,7 @@ Init pipeline...
 (Step 1/3) Setting Pipeline LLM provider settings...
 Select a custom LLM provider (openai, ollama): openai
 LLM provider API URL [http://127.0.0.1:8080/v1]: https://api.openai.com/v1
-LLM provider API Key: 
+LLM provider API Key:
 LLM Provider API connection OK!
 Embeddings model Name [Nomic-embed-text-v1.5]: text-embedding-ada-002
 Embeddings Vector Size [768]: 2048
@@ -26,8 +26,8 @@ Pipeline LLM Provider settings OK!
 
 ## Ollama
 
-If you're planning to use the pipeline on consumer hardware that cannot handle a GaiaNet node running in the background, you can opt-in to use Ollama 
-as LLM provider. Depending on the use case and resources available, some of the advantages of using Ollama for example are that it is more lighweight, 
+If you're planning to use the pipeline on consumer hardware that cannot handle a Gaia node running in the background, you can opt-in to use Ollama
+as LLM provider. Depending on the use case and resources available, some of the advantages of using Ollama for example are that it is more lighweight,
 easier to install and ready to use with Mac GPU devices.
 
 ### Getting Ollama
@@ -46,13 +46,13 @@ Which LLM provider you want to use? (gaia, other) [gaia]: other
 Init pipeline...
 (Step 1/3) Setting Pipeline LLM provider settings...
 Select a custom LLM provider (openai, ollama): ollama
-LLM provider API URL [http://127.0.0.1:11434]: 
+LLM provider API URL [http://127.0.0.1:11434]:
 ERROR: LLM Provider API (@ http://127.0.0.1:11434/v1/models) is down. HTTPConnectionPool(host='127.0.0.1', port=11434): Max retries exceeded with url: /v1/models (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1091c2490>: Failed to establish a new connection: [Errno 61] Connection refused'))
 Try again...
 LLM provider API URL [http://127.0.0.1:11434]:
 LLM Provider API connection OK!
-Embeddings model Name [Nomic-embed-text-v1.5]: 
-Embeddings Vector Size [768]: 
+Embeddings model Name [Nomic-embed-text-v1.5]:
+Embeddings Vector Size [768]:
 Enter the Absolute Path to the Embeddings model file: /home/user/rag-api-pipeline/models/nomic-embed-text-v1.5.f16.gguf
 Importing embeddings model into Ollama...
 Pipeline LLM Provider settings OK!

diff --git a/docs/pages/cli/settings.mdx b/docs/pages/cli/settings.mdx
@@ -1,6 +1,6 @@
 # Customizing the Pipeline Config Settings
 
-Most of the pipeline configuration settings are set by running the setup wizard via `rag-api-pipeline setup` command. However, there are 
+Most of the pipeline configuration settings are set by running the setup wizard via `rag-api-pipeline setup` command. However, there are
 more advanced features that can be also set via environment variables in `config/.env`.
 
 ## Environment variables
@@ -21,7 +21,7 @@ The following environment variables can be adjusted in `config/.env` based on th
     - Default value:  `Nomic-embed-text-v1.5`
   - `LLM_EMBEDDINGS_VECTOR_SIZE`: embeddings vector size
     - Default value:  `768`
-  - `LLM_PROVIDER`: LLM provider backend to use. It can be either `openai` or `ollama` (Gaianet offers an OpenAI-compatible API)
+  - `LLM_PROVIDER`: LLM provider backend to use. It can be either `openai` or `ollama` (Gaia offers an OpenAI-compatible API)
     - Default value:  `openai`
 - **Qdrant DB settings**:
   - `QDRANTDB_URL`: Qdrant DB base URL
@@ -31,7 +31,7 @@ The following environment variables can be adjusted in `config/.env` based on th
   - `QDRANTDB_DISTANCE_FN`: score function to use during vector similarity search. Available functions: ['COSINE', 'EUCLID', 'DOT', 'MANHATTAN']
     - Default value:  `COSINE`
 - **Pathway-related variables**:
-  - `AUTOCOMMIT_DURATION_MS`: the maximum time between two commits. Every autocommit_duration_ms milliseconds, the updates received by the connector are 
+  - `AUTOCOMMIT_DURATION_MS`: the maximum time between two commits. Every autocommit_duration_ms milliseconds, the updates received by the connector are
   committed automatically and pushed into Pathway's dataflow. More information can be found [here](https://pathway.com/developers/user-guide/connect/connectors/custom-python-connectors#connector-method-reference)
     - Default value:  `1000`
   - `FixedDelayRetryStrategy` ([docs](https://pathway.com/developers/api-docs/udfs#pathway.udfs.FixedDelayRetryStrategy)) config parameters: