Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-replicating (local) / replication-only transformation functions #37

Closed
stuartpb opened this issue Dec 1, 2016 · 6 comments
Closed

Comments

@stuartpb
Copy link

stuartpb commented Dec 1, 2016

It would make a lot of sense - especially for applications that maintain a special in-DB representation of data that should be replicated without transformation, like, say, crypto-pouch - if there were variants like incomingLocal, outgoingLocal, incomingReplication, and outgoingReplication, where the latter two only apply for replication, and the former only apply for non-replicating actions (get, put, etc).

I guess this can be emulated right now by doing something like this:

const dbOriginal = new PouchDB('mydb');
const dbOnlyForLocalUse = new PouchDB(dbOriginal)
  .transform({incoming: incomingLocal, outgoing: outgoingLocal});
const dbOnlyForReplication = new PouchDB(dbOriginal)
  .transform({incoming: incomingReplication, outgoing: outgoingReplication});

but that would lead to issues, with, say, a general function that takes one database as an argument, on which it performs both replication and non-replication operations.

Also, does new PouchDB(dbTransformHasAlreadyBeenCalledOn) copy the transformations that are on that DB? My gut says it should, but I'm not certain (and I'm not sure if the docs make any statement on the matter one way or another).

@stuartpb
Copy link
Author

stuartpb commented Dec 1, 2016

There's also the matter of how, with these as available functions, they'd interact if specified alongside incoming and outgoing.

One way to do it would be to make them mutually exclusive, with any use of plain incoming alongside incomingLocal and/or incomingReplication an error (and likewise for outgoing; however, I think it'd be more conducive to convenience for models that share functionality in their document transformation to not put the burden of composition on the end user, and instead offer two more transformations, where the complete document transformation function application flow would look like this:

           Documents coming in
via put(),         ||          via
post(), etc.       \/          replicate.from()
              incomingPre()
               /        \
              V          V
   incomingLocal()    incomingReplication()
               \        /
                V      V
               incoming()
                   \/
            Stored in database
                   ||
               outgoing()
                /      \
               V        V
   outgoingLocal()    outgoingReplication()
                \      /
                 V    V
             outgoingFinal()
via get(),         ||          via
allDocs(),         \/          replicate.to()
etc.       Documents going out

In other words, that documents would pass through (at most) three transformations before coming into / going out from the database, depending on the function:

  • put(), post(), etc: incomingPre, incomingLocal, incoming
  • replicate.from(): incomingPre, incomingReplication, incoming
  • get(), allDocs(), etc: outgoing, outgoingLocal, outgoingFinal
  • replicate.to(): outgoing, outgoingReplication, outgoingFinal

And, of course, these only apply if specified. Applications could need as few as only one transformation function to specify (ie. transforming incoming documents only from replication, for the purposes of migration), or could require as many as all eight (needing to perform a common transformation to the outside incoming document format, transforming that further when inserting and also for replication from outside, finalizing both of those source-specific changes in a common way, performing a common pre-transformation out of the database, altering that further in specific ways, and then putting one last set of common finishing touches on), or any number of other combinations (most would probably use no more than six, but it could realistically be any six).

@stuartpb
Copy link
Author

stuartpb commented Dec 1, 2016

Also, something that just occurred to me: it might make sense for transformation functions to have access to data about the transaction or replication as a second parameter. For example, this would allow an outgoing replication function to strip local-application-specific fields when replicating to remote databases.

Indeed, now that I think about it, this might be the more sensible level to factor this at: a second parameter that functions could use to filter and compose their own outgoing or incoming transformations as applicable, whether that be on a local-versus-replication basis, or whatever.

@stuartpb
Copy link
Author

stuartpb commented Dec 1, 2016

#38 makes a good point on how these are fundamentally different operations, and should really be handled completely differently: an incoming or outgoing transformation from the out-of-database world may not correspond to a revision to a document, but a transformation in the process of replication absolutely does.

In that light (assuming incoming and outgoing don't do this already, which seems to be the case), it'd probably make more sense to just make incoming and outgoing keep their current behavior, but make them only apply for non-replication functions, and make it so any transformation to be applied in the context of replication would have to be specified as incomingReplication and outgoingReplication, with these each creating a new revision with _rev to represent the transformation they've performed.

@stuartpb
Copy link
Author

stuartpb commented Dec 1, 2016

Well, either that, or there could be a separate revise-pouch plugin that applies this kind of revising transformation whenever an object is inserted or updated locally (ie. it commits the given version, then commits the transformed version), and creates ephemeral revisions for transformations when replicating (though I believe PouchDB's non-deterministic random-based revision identifiers introduce the potential for this to introduce meaningless conflicts in the event that the same deterministic transformation is applied two different times). This could also be introduced in the form of a new option to .transform(), like "revise": true" or "revise": "replication".

In any case, I think incoming and outgoing here should really not be applied silently in replication, in any scenario. Doing so breaks one of the fundamental assumptions of the CouchDB consistency model (that all changes to the document coming in will be reflected by a differing revision history).

@nolanlawson
Copy link
Member

I'm starting to think that, yes, it would make sense to make a separate plugin. This plugin is designed for very simple use cases, e.g. I have a CouchDB full of too-big documents and I merely want them to be smaller when I replicate them locally.

You are fully encouraged to write a revise-pouch plugin. :)

@gr2m
Copy link
Collaborator

gr2m commented Dec 3, 2016

I second Nolan’s comment, but please keep us posted on revise-pouch :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants