Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PostgreSQL as a backend target (MVP) #137

Open
nj1973 opened this issue Mar 28, 2024 · 1 comment
Open

Add PostgreSQL as a backend target (MVP) #137

nj1973 opened this issue Mar 28, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@nj1973
Copy link
Collaborator

nj1973 commented Mar 28, 2024

This issue is to implement PostgreSQL as a backend target but not spend too much effort tuning the PostgreSQL load from GCS, this is because we suspect we need a GCS to PostgreSQL FDW extension, hence the MVP in the title.

For MVP transport my initial thought was to use a single COPY FROM STDIN command but COPY does not accept Avro or Parquet. Is there an alternative or do we need to stage to CSV? Would CSV limit supported data types?

Perhaps instead of MVP this should just be "phase 1" which is to implement everything except data copy?

Tasks:

  • Look at connnectivity, we need to support core PostgreSQL, Cloud SQL and AlloyDB. Test TLS too
  • Add code to map canonical columns to PostgreSQL data types (+ tests)
  • Consider backend table creation, partitioned table creation is more involved than for other backends
  • Implement BackendApi methods for PostgreSQL (+ tests)
  • Implement BackendTable methods for PostgreSQL (+ tests)
  • Implement Predicate Offload methods
  • Transport, maybe?
  • Implement methods for final and staged validation

I've probably missed some tasks so don't rely solely on the list above.

@nj1973 nj1973 added the enhancement New feature or request label Mar 28, 2024
@nj1973
Copy link
Collaborator Author

nj1973 commented Apr 3, 2024

Possible alternate flow is for Spark to stage to an unlogged table in PostgreSQL instead of Cloud Storage and then INSERT/SELECT from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant