-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Add opentelemetry docs #5048
Open
franciscojavierarceo
wants to merge
8
commits into
feast-dev:master
Choose a base branch
from
franciscojavierarceo:devin/1739407593-add-opentelemetry-docs
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
c5160b2
feat: Make entity value_type mandatory with deprecation warning
devin-ai-integration[bot] 5338764
style: Fix import sorting in entity files
devin-ai-integration[bot] 7fd8781
Merge pull request #9 from franciscojavierarceo/devin/1733888469-mand…
franciscojavierarceo 018d456
chore: Skip tests for community/docs/examples paths
devin-ai-integration[bot] f5eb980
Merge pull request #10 from franciscojavierarceo/devin/1733890533-ski…
franciscojavierarceo a6682cc
Merge branch 'feast-dev:master' into master
franciscojavierarceo 6155a1c
Add OpenTelemetry documentation to components section
devin-ai-integration[bot] 82072f3
docs: Incorporate OpenTelemetry Helm chart setup instructions directl…
devin-ai-integration[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
# OpenTelemetry Integration | ||
|
||
The OpenTelemetry integration in Feast provides comprehensive monitoring and observability capabilities for your feature serving infrastructure. This component enables you to track key metrics, traces, and logs from your Feast deployment. | ||
|
||
## Motivation | ||
|
||
Monitoring and observability are critical for production machine learning systems. The OpenTelemetry integration addresses these needs by: | ||
|
||
1. **Performance Monitoring:** Track CPU and memory usage of feature servers | ||
2. **Operational Insights:** Collect metrics to understand system behavior and performance | ||
3. **Troubleshooting:** Enable effective debugging through distributed tracing | ||
4. **Resource Optimization:** Monitor resource utilization to optimize deployments | ||
5. **Production Readiness:** Provide enterprise-grade observability capabilities | ||
|
||
## Architecture | ||
|
||
The OpenTelemetry integration in Feast consists of several components working together: | ||
|
||
- **OpenTelemetry Collector:** Receives, processes, and exports telemetry data | ||
- **Prometheus Integration:** Enables metrics collection and monitoring | ||
- **Instrumentation:** Automatic Python instrumentation for tracking metrics | ||
- **Exporters:** Components that send telemetry data to monitoring systems | ||
|
||
## Key Features | ||
|
||
1. **Automated Instrumentation:** Python auto-instrumentation for comprehensive metric collection | ||
2. **Metric Collection:** Track key performance indicators including: | ||
- Memory usage | ||
- CPU utilization | ||
- Request latencies | ||
- Feature retrieval statistics | ||
3. **Flexible Configuration:** Customizable metric collection and export settings | ||
4. **Kubernetes Integration:** Native support for Kubernetes deployments | ||
5. **Prometheus Compatibility:** Integration with Prometheus for metrics visualization | ||
|
||
## Setup and Configuration | ||
|
||
To add monitoring to the Feast Feature Server, follow these steps: | ||
|
||
### 1. Deploy Prometheus Operator | ||
Follow the [Prometheus Operator documentation](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md) to install the operator. | ||
|
||
### 2. Deploy OpenTelemetry Operator | ||
Before installing the OpenTelemetry Operator: | ||
1. Install `cert-manager` | ||
2. Validate that the `pods` are running | ||
3. Apply the OpenTelemetry operator: | ||
```bash | ||
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml | ||
``` | ||
|
||
For additional installation steps, refer to the [OpenTelemetry Operator documentation](https://github.com/open-telemetry/opentelemetry-operator). | ||
|
||
### 3. Configure OpenTelemetry Collector | ||
Add the OpenTelemetry Collector configuration under the metrics section in your values.yaml file: | ||
|
||
```yaml | ||
metrics: | ||
enabled: true | ||
otelCollector: | ||
endpoint: "otel-collector.default.svc.cluster.local:4317" # sample | ||
headers: | ||
api-key: "your-api-key" | ||
``` | ||
|
||
### 4. Add Instrumentation Configuration | ||
Add the following annotations and environment variables to your deployment.yaml: | ||
|
||
```yaml | ||
template: | ||
metadata: | ||
annotations: | ||
instrumentation.opentelemetry.io/inject-python: "true" | ||
``` | ||
|
||
```yaml | ||
- name: OTEL_EXPORTER_OTLP_ENDPOINT | ||
value: http://{{ .Values.service.name }}-collector.{{ .Release.namespace }}.svc.cluster.local:{{ .Values.metrics.endpoint.port}} | ||
- name: OTEL_EXPORTER_OTLP_INSECURE | ||
value: "true" | ||
``` | ||
|
||
### 5. Add Metric Checks | ||
Add metric checks to all manifests and deployment files: | ||
|
||
```yaml | ||
{{ if .Values.metrics.enabled }} | ||
apiVersion: opentelemetry.io/v1alpha1 | ||
kind: Instrumentation | ||
metadata: | ||
name: feast-instrumentation | ||
spec: | ||
exporter: | ||
endpoint: http://{{ .Values.service.name }}-collector.{{ .Release.Namespace }}.svc.cluster.local:4318 | ||
env: | ||
propagators: | ||
- tracecontext | ||
- baggage | ||
python: | ||
env: | ||
- name: OTEL_METRICS_EXPORTER | ||
value: console,otlp_proto_http | ||
- name: OTEL_LOGS_EXPORTER | ||
value: otlp_proto_http | ||
- name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED | ||
value: "true" | ||
{{end}} | ||
``` | ||
|
||
### 6. Add Required Manifests | ||
Add the following components to your chart: | ||
- Instrumentation | ||
- OpenTelemetryCollector | ||
- ServiceMonitors | ||
- Prometheus Instance | ||
- RBAC rules | ||
|
||
### 7. Deploy Feast | ||
Deploy Feast with metrics enabled: | ||
|
||
```bash | ||
helm install feast-release infra/charts/feast-feature-server --set metric=true --set feature_store_yaml_base64="" | ||
``` | ||
|
||
## Usage | ||
|
||
To enable OpenTelemetry monitoring in your Feast deployment: | ||
|
||
1. Set `metrics.enabled=true` in your Helm values | ||
2. Configure the OpenTelemetry Collector endpoint | ||
3. Deploy with proper annotations and environment variables | ||
|
||
Example configuration: | ||
```yaml | ||
metrics: | ||
enabled: true | ||
otelCollector: | ||
endpoint: "otel-collector.default.svc.cluster.local:4317" | ||
``` | ||
|
||
## Monitoring | ||
|
||
Once configured, you can monitor various metrics including: | ||
|
||
- `feast_feature_server_memory_usage`: Memory utilization of the feature server | ||
- `feast_feature_server_cpu_usage`: CPU usage statistics | ||
- Additional custom metrics based on your configuration | ||
|
||
These metrics can be visualized using Prometheus and other compatible monitoring tools. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should plan this functionality in go-operator as well ?