Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ray as an Offline Store #4997

Open
franciscojavierarceo opened this issue Jan 31, 2025 · 0 comments
Open

Add Ray as an Offline Store #4997

franciscojavierarceo opened this issue Jan 31, 2025 · 0 comments
Labels
kind/feature New feature or request

Comments

@franciscojavierarceo
Copy link
Member

franciscojavierarceo commented Jan 31, 2025

Is your feature request related to a problem? Please describe.
Feast currently lacks built-in support for Ray as an offline store, which could be beneficial for distributed data processing and efficient feature engineering at scale. I'm frustrated by the limited options available when working with large-scale feature transformations that require high parallelism and distributed computation.

Describe the solution you'd like
I would like to see Ray integrated as an official offline store in Feast. This integration should enable users to read from data sources (e.g., Parquet, CSV, databases) and perform distributed data processing using Ray's capabilities. It should support scalable batch feature retrieval and preprocessing while ensuring ease of configuration within Feast.

Describe alternatives you've considered

  • Using Dask as an offline store (though it may not offer the same scalability as Ray for certain workloads).
  • Writing custom scripts outside of Feast to process data with Ray and then load it manually into the offline store, which introduces extra complexity and maintenance overhead.
  • Relying on Spark, which can be heavier and harder to deploy for some users compared to Ray.

Additional context

  • Ray’s dynamic scaling and actor-based programming model could be a natural fit for distributed feature computation and preprocessing.
  • This could potentially integrate well with Ray Serve and Ray Train for end-to-end machine learning pipelines.
  • Here's the documentation on transforming data in Ray
@franciscojavierarceo franciscojavierarceo added the kind/feature New feature or request label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant