Replies: 2 comments
-
This is the script we ended up using. It's an adapted version of
|
Beta Was this translation helpful? Give feedback.
0 replies
-
@whilo ☝️ |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Note: I'm not going to try and be brief. There are lots of little things we've learnt a long the way I'm not 100% which ones are the most useful so I'm just putting it all here. Since I'm not a complete savage there is a TLDR at the end.
Context
I've been using Datahike in anger for 4 years now with history enabled. I started with datahike-firebase which ran on Datahike 0.2.1. Obviously, as our company scaled we moved off Firebase as a DB. In the beginning as a bootstrapped startup we were super cost-sensitive so from free Firebase we moved to Planetscale's free tier (which uses MySQL). This drove a lot of the work on datahike-jdbc. The latency started to kill us so we moved to DigitalOcean which was shortlived since they don't have VPCs for their apps. So again we moved, this time to Render. We stayed with Render for about 18 months till one of our partners required our stores to be ISO27001 certified. So we had to move to GCP (0/10 would not recommend, the cost balloons). About two weeks ago Render got their ISO27001 certification so we moved back! And we're very happy there. So what's the point? We've done 5 major migrations each one more terrifying that the last. Our last migration was 16M datoms (606MB cbor backup).
In between all of this, there was our adventure with datahike-server
Lessons
Here are some things we've learnt along the way that might help us think more rigorously about how we do backups and restoration.
Backup up format
At one point the backup format changed from
edn
tocbor
. We moved to datahike-server and need to restore our backup. But the unknown to us theimport-db
only supported cbor. So after some full blown terror, eventually we realised we could load the datoms directly into a datahike connection.Invalid data
Since we were in the validation phase (and because I had no idea what I was doing), we built our db with
schema-on-read
. The first problem was that datahike stored some fractions as ratios. Since there was no validation at write we only encountered the issue on read. So the solution was to export the db, remove the ratios, and then restore it. This made us move toschema-on-write
. The migration failed epically. Hundreds of datoms failed the schema constraints. I had to manually sort through all the datoms to remove all the invalid ones. We also found that when restoring data from the cbor formatdoubles
were incorrectly cast asfloats
and thus violated the schema. We used the work around shown in 663Importing at scale
In our most recent migration of 16M datoms the migration kept failing because datoms were inserted before the schema (we're using
schema-on-write
) so anot in schema
error gets thrown. This error throws the transaction data so 606MB is printed to the REPL. And crashes the REPL or SSH connection. Ideally, the size of the data thrown in errors should be limited. (To fix this I filtered the datoms by kind, imported the schema, then imported the rest)Migrating infrastructure
Each megabyte in the backup now maps to about 10MB in the store (postgres) (as of 0.6.1592). This is much better than before, but it still more efficient to export-db and then import-db instead of replicating the tables (if the postgres providers support this). This means you need to have down time to backup (to prevent more write) and then the time to import. That means more or less 20 minutes of downtime. The backup and imports are memory intensive so you can't actually have workloads while the operations are in progress. It would be great to have a streaming backup or chunked backup that is memory light. It would also be great if you could stream datoms from one instance to another to allow in-place migration. Theoretically, this could be done using the
d/listen
.TLDR
Beta Was this translation helpful? Give feedback.
All reactions