You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is going to be substantially more challenging than the csv format, but might be rewarding. We want to add Apache Parquet support as Parquet is used in a very large number of real-world data science applications.
But one challenge right now is that we require a byte pointer to a specific record whereas Parquet is columnar, meaning records are split across different locations.
We likely will need to write our own Parquet file parser to figure out the correct byte offset, then in the js library be pretty particular about how exactly that record is fetched/parsed. This might involve needing to return additional metadata beyond just the byte offset, which we can do via an intermediate pointer in the index.
Anyways, let's talk about this one before working on it, it'll be super educational about how Parquet works but I don't want us to get lost in the complexity.
The text was updated successfully, but these errors were encountered:
This is going to be substantially more challenging than the csv format, but might be rewarding. We want to add Apache Parquet support as Parquet is used in a very large number of real-world data science applications.
But one challenge right now is that we require a byte pointer to a specific record whereas Parquet is columnar, meaning records are split across different locations.
https://github.com/apache/parquet-format
We likely will need to write our own Parquet file parser to figure out the correct byte offset, then in the js library be pretty particular about how exactly that record is fetched/parsed. This might involve needing to return additional metadata beyond just the byte offset, which we can do via an intermediate pointer in the index.
Anyways, let's talk about this one before working on it, it'll be super educational about how Parquet works but I don't want us to get lost in the complexity.
The text was updated successfully, but these errors were encountered: