Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter migrations #438

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

gregwebs
Copy link

@gregwebs gregwebs commented Apr 26, 2023

Overview

Many users want to tell dbmate to run specific migrations. Issues have been opened as well as the following merge requests

None of these actually satisfied my use case so I opened the following PR:

The nice thing about #434 is it works for all commands, including status, allowing one to preview what will be done. However, it doesn't satisfy everyone else's use case properly.

Solution

This PR builds on top of #434 but additionally adds a range syntax. So one can specify -m version1...version3. Additionally, it supports positive and negative numerals as position arguments. So to use the last 2 migrations, one can use -2....

dbmate -m version # just one version is selected
dbmate -m ...version # everything before and including version
dbmate -m version... # everything starting at version and after
dbmate -m version...version2 # everything starting at version and ending at version 2

Please see the test cases for more examples.

TODO

  • More test cases
  • Additional documentation in the README
  • The environment variable should specify multiple migrations by splitting on a comma?
  • Restrict to specific commands? Doesn't make sense for wait.

gregwebs and others added 5 commits April 20, 2023 06:49
specify a range of migrations using ... syntax
also support positive and negative offsets
@gregwebs
Copy link
Author

gregwebs commented May 1, 2023

@dossy please review

@dossy
Copy link
Collaborator

dossy commented May 1, 2023

Before reviewing the code itself, I'd like to understand the requirement driving the change.

Many users want to tell dbmate to run specific migrations. [...]

My opinion is that those users are trying to use a database migration tool as a means to execute ad-hoc SQL against their database schema, and that's not what a schema migration tool is meant to be used for.

As schema migrations are expressing transitions in schema state, they are context-dependent, and we shouldn't create the expectation that executing them in an arbitrary order should always work.

My personal workflow looks like this:

  • Start from pristine known state using dbmate drop
  • Create database and schema up to latest version with dbmate up
  • Create new migration with dbmate new
  • Test newly added migration with cycles of dbmate up and dbmate down, editing as necessary
  • Once this initial testing passes, perform a dbmate drop followed by a dbmate up and then confirm the resulting schema is correct
  • Commit the new migration

Following a different process can lead to migrations that may produce different resulting schema states, and that violates the primary use case for a database migration tool.

@gregwebs
Copy link
Author

gregwebs commented May 1, 2023

Before reviewing the code itself, I'd like to understand the requirement driving the change.

Great!

As schema migrations are expressing transitions in schema state, they are context-dependent, and we shouldn't create the expectation that executing them in an arbitrary order should always work.

We also shouldn't create an expectation that the time-based ordering implies a hard dependency. dbmate provides no way to track the dependencies of schema changes, and different schema changes with no dependency on each other are often being committed at the same time. There is prior art here: sqitch has been around a long time and tracks dependencies. dbmate chose not to, so the onus is on the user to figure out how to apply them apropriately.

You are going through the development workflow- I like your workflow and I think that those that are in a position to follow it should. The main issue I see with it is that after there are hundreds of migrations one needs to ensure they are compressed in something like a schema.sql. But the existing schema.sql is inadequate.

My use case for selecting migrations is less for development but more for production. I might have a couple different migrations that are performing a schema change, adding an index, or deleting or update some data. Maybe the delete/update is an ad-hoc SQL query according to you, but I need to ensure these are performed in a controlled and reviewed manner and potentially rolled out to a staging environment and applied locally as well, and a migration system works quite well for this. Also note that in the discussion about schema.sql being inadequate we discussed the concept of "schema data" that is necessary for the application to function properly- and this data needs alterations (DML) after the schema (DDL) is in place.

In a zero-downtime setup none of the migrations should be required to be ran immediately before a deploy and create downtime until applied. Some of the migrations may need to be ran before the new deployment. Some may be able to be ran after the new deployment (for example adding indexes). I may want to wait for a lower traffic period before adding indexes, or for applying a schema change that is blocking.

The only way I have figured out how to accomplish these workflows with the existing dbmate is to delete/move migrations that I don't want applied.

@dossy
Copy link
Collaborator

dossy commented May 2, 2023

[...] We also shouldn't create an expectation that the time-based ordering implies a hard dependency. [...]

Actually, that's exactly the promise and guarantee that dbmate makes to you: that it will apply migrations in a time-based ordering, if you use a numerically sortable string representation of time as the version numbering scheme.

This is a first-class feature of dbmate ("Timestamp-versioned migration files"). It's what sets it apart from some of the other schema migration tools available.

It's why I chose dbmate over sqitch, sql-migrate, laravel and even prisma, for schema migrations for my projects.

I would really prefer to not turn dbmate into sqitch. If you want sqitch's features, why not use sqitch, instead of using dbmate and wishing it were sqitch?

[...] The main issue I see with it is that after there are hundreds of migrations one needs to ensure they are compressed in something like a schema.sql. [...]

Why?

From a freshly created development database, applying literal hundreds of migrations takes tens of seconds.

In higher environments like staging or production that already contain data, you really should not be applying more than 1 or 2 new migrations at a time, if you Release Early, Release Often.

In a zero-downtime setup [...]

Here, I agree with you: in my opinion, dbmate really isn't the right tool for schema management for applications that require zero-downtime schema changes.

This is part of the motivation for #424, being able to specify multiple --migrations-dir directories. This made it possible to separate groups of migrations into separate directories, where developers could apply them all in one shot, but in production some migrations can be applied differently, either using something like Percona's pt-online-schema-change or some other approach to enable zero-downtime schema changes, depending on the nature of the change and what would be required to apply it in an online fashion.

I think extending dbmate to be capable of applying schema changes with the full functionality of pt-online-schema-change is unreasonable, so recognizing it as a limitation of dbmate and solving it outside of dbmate feels right, to me.

The only way I have figured out how to accomplish these workflows with the existing dbmate is to delete/move migrations that I don't want applied.

Yes, having multiple migrations directories was largely how I solved the problem, which is why #424 was so important to me.

I wouldn't consider it unreasonable to have a migrations folder structure that might look something like:

db
`-- migrations
    |-- auto
    |   |-- feature_a_before
    |   |-- feature_a_after
    |   |-- feature_b_before
    |   |-- feature_b_after
    |   |-- ...
    |   |-- feature_z_before
    |   `-- feature_z_after
    `-- manual
        |-- feature_a_before
        |-- feature_a_after
        |-- feature_b_before
        |-- feature_b_after
        |-- ...
        |-- feature_z_before
        `-- feature_z_after

And, you can selectively apply the migrations from one or more of those directories, as needed. In development, you can let dbmate just apply all of them in one go, but when deploying to production, perhaps you only use dbmate to apply the migrations in the folders inside auto, and all of the manual migrations have both the plain SQL for applying the change in development, as well as human-readable instructions in a SQL comment in the file on how to apply the schema changes by hand, using pt-online-schema-change or whatever is appropriate.

Either way, I agree, dbmate isn't well-suited for managing production database schemas when the underlying DBMS doesn't have great support for applying online DDL operations.

Trying to enhancing dbmate to compensate for the deficiency of the underlying DBMS feels like massive scope creep for dbmate to me, and might not even be possible to fully achieve, anyway.

That's why I chose to use different tooling to solve that problem, while still being able to use dbmate for the non-production environments.

@dossy
Copy link
Collaborator

dossy commented May 2, 2023

One kind of potentially wild idea I had was implementing support for migrations to contain either SQL or shell scripts, and having dbmate execute the shell script in sequence based on the version numbering.

For example, dbmate would recognize that a migration file named 20230502123456_foo.sh was a shell script and would just exec() it, passing along settings as environment variables to the subshell, including a DBMATE_MODE being either up or down so that the script could adjust its behavior to apply or roll back as necessary. This would provide a completely open-ended escape hatch to implement whatever functionality necessary to implement the migration beyond just feeding a SQL snippet to the database server as-is.

As I haven't personally needed this much flexibility -- yet -- I haven't been motivated to implement it, but it was something I seriously considered when solving some of my own needs.

Something to evaluate further, perhaps.

@gregwebs
Copy link
Author

gregwebs commented May 2, 2023

Actually, that's exactly the promise and guarantee that dbmate makes to you: that it will apply migrations in a time-based ordering, if you use a numerically sortable string representation of time as the version numbering scheme.

Applying migrations in a time-based ordering is just that. It does not mean that migration #5 has a hard dependency on migration #4. It just means that dbmate will apply #4 before #5. Where the actual dependencies exist is only known to the operator of the migrations.

This is a first-class feature of dbmate ("Timestamp-versioned migration files"). It's what sets it apart from some of the other schema migration tools available.

This is the way most schema migration tools seem to work today. Some might use an integer, but a timestamp is just an integer with less coordination (which can be a good thing). I don't think dbmate is set apart here- it is doing the simplest thing possible (for better and worse)- and what it is doing is very common now.

From a freshly created development database, applying literal hundreds of migrations takes tens of seconds.

Why would we want to wait tens of seconds? Developers may not run this workflow a lot, but CI does, and that is getting ran constantly.

Yes, having multiple migrations directories was largely how I solved the problem, which is why #424 was so important to me.

So this opened with a lot of arguments against selecting migrations, but it is actually already known that migrations need to be selected.

and all of the manual migrations have both the plain SQL for applying the change in development, as well as human-readable instructions in a SQL comment in the file on how to apply the schema changes by hand, using pt-online-schema-change or whatever is appropriate.

This is where the conversation is getting silly. The need has been acknowledged. I am a Postgres user and I can't have this feature because I am supposed to use a MySQL tool? Postgres has some migrations that can be done in a non-blocking way (e.g ADD INDEX CONCURRENTLY). And again, I know the traffic patterns of my site and can do some migrations selectively around that. I want to run one migration at a time with some time gap in between if the migrations will result in some blocking behavior- I don't want to have a longer downtime period.

@gregwebs
Copy link
Author

gregwebs commented May 2, 2023

It's also worth noting that TiDB is MySQL compatible and would work with dbmate and that almost all migrations done with TiDB would be online.

@gregwebs
Copy link
Author

gregwebs commented May 2, 2023

With respect to extensibility, exec is taking total control over some thing, which limits extensibility. outputting the migration content as JSON would be most extensible. Simpler in this case might be a switch to output just the up or down.

@gregwebs
Copy link
Author

gregwebs commented May 5, 2023

It's worth noting how orthogonal this feature is to all other workflows. For example the suggestion for working with other tools. If I were to use a different tool to apply a migration in production, I would want to be able to use the -m flag to select a single migration to give to the other tool. Along with a --up flag to only give the upward migration information.

@gregwebs gregwebs mentioned this pull request May 5, 2023
@amacneil
Copy link
Owner

amacneil commented May 12, 2023

As mentioned in the PR description, I think there is a pretty clear request from users for at least the following features:

  • Apply all changes up to a certain version (inclusive), rather than up to the latest
    • i.e. dbmate <migrate up to> 0003 would apply everything up to 0003
  • Rollback migrations to a certain version (exclusive), or back to nothing
    • i.e. dbmate <roll back to> 0003 would rollback everything higher than 0003
  • Apply or rollback a specific version

Agree with @dossy we don't want to turn dbmate into something complex by tracking dependencies etc, other tools do that. Dbmate likes to keep things simple. However, I do agree the above 3 features would be nice to support.

I originally liked the range syntax proposed, but I thought about it some more and I think it's a bit weird for the rollback case. I guess I would have to do dbmate -m 0003... rollback? I think that would also roll back 0003 though, whereas I want to specify the target schema version.

The range syntax would also get a bit weird if we added support for period character in migration versions, e.g. to support #371

Here's another idea (open to bike shedding the flag names):

dbmate migrate               # existing: migrate to latest version
dbmate rollback              # existing: rollback one version

dbmate migrate --only 0003   # apply only this version
dbmate rollback --only 0003  # rollback only this version

dbmate migrate --to 0003     # migrate up to this version (inclusive)
dbmate rollback --to 0003    # roll back to this version (exclusive, i.e. the specified version will become the current applied version)

dbmate rollback --all        # special case to rollback all migrations
dbmate migrate --all         # maybe we should support this too for consistency (it wouldn't do anything since "all" is the default)

# support these flags for `dbmate up` and `dbmate down` too

I think this solves all 3 use cases while keeping a fairly simple CLI. Anything I'm missing?

@amacneil
Copy link
Owner

I guess there is a 4th request, coming from #256, which is to migrate or rollback a certain number of migrations, without regard for migration versions (e.g. "rollback 2 migrations"). I'm not sure what the expected behavior would be if not all available migrations were applied. We could maybe support it with an additional flag:

dbmate migrate --limit 2     # apply two migrations
dbmate rollback --limit 2     # rollback two migrations

But I don't know if this is a good idea. What would happen in the following case when I run migrate or rollback with --limit parameter?

[x] 0001
[ ] 0002
[x] 0003
[ ] 0004
[x] 0005
[ ] 0006

I think maybe we should leave this use case out for now, and see if people can survive by specifying the target version to migrate/roll back to.

@gregwebs
Copy link
Author

But I don't know if this is a good idea. What would happen in the following case when I run migrate or rollback with --limit parameter?

The way I wrote it, this feature is for migration filtering. It filters regardless of whether the migration is applied or unapplied. It is important to operate that way. It allows you to use it with dbmate status before and after. With the before you see it not applied and after you see it applied.

As implemented in this PR it would use: -m -2...

This single flag I started calling "migration filtering" (I was considering changing the flag name to be --filter). I can see how by using multiple different flags users might get an impression that the different flags have different semantics here.

@gregwebs
Copy link
Author

I originally liked the range syntax proposed, but I thought about it some more and I think it's a bit weird for the rollback case. I guess I would have to do dbmate -m 0003... rollback? I think that would also roll back 0003 though, whereas I want to specify the target schema version.

There are 2 concerns here. One is whether there syntax is intuitive. At first it doesn't seem to match the thinking of down. But I went for uniformity with up and down. In this way the same filter can be used with the status command (which doesn't know if you want to do up or down). So it is always a filter that filters the same migrations regardless of the command being used.

The second concern would be inclusive versus non-inclusive range. I don't think of this is a big problem because you can always start with dbmate -m FILTER status first and then adjust it. But there could be a syntax or flag to indicate inclusive versus non-inclusive.

@gregwebs
Copy link
Author

Here's another idea (open to bike shedding the flag names):

  • The --all flag is orthogonal to the rest of this discussion- it alters the behavior of rollback and would do that without filtering as well
  • --only VERSION is my most common use case and corresponds to my original PR, the syntax I always had was -m VERSION
  • --to VERSION is a range with one side specified and one side open. If you think about using it with migrate or rollback, it makes intuitive sense. The problem is to be able to see ahead of time what will be done by using it with status: which direction is it going? It would be possible to do migrate/rollback --dry-run --to VERSION instead of status to get the same behavior as status but with the direction known.

@gregwebs
Copy link
Author

tern has a single migrate command.

This solves the issues we are talking about with the difference between up/down. I do think though that there is an issue around using this approach when there are unapplied migrations. In this scenario:

  • A
  • B
  • C

with a bi-directional migrate command, migrate --destination B would be ambiguous as to whether it is up only and just B should be applied or whether C should be rolled back first. This is where a --dry-run flag or supporting the exact same commands with the status command would certainly be useful.

One might think that up/down only commands or flags would be the obvious solution, but note that tern supports migrating down and re-running migrations in a single command with --destination -+3 so there is a potential "remigrate" workflow as well.

I do think it is reasonable to split these up into separate commands (migrate, rollback, remigrate) and use a --dry-run flag. The approach outlined in this PR is to use the status command rather than a --dry-run command. I think it would work for a remigrate command as well. Technically the status command is not telling users exactly what will happen and they have to do some inference, so there is an argument to be made that a --dry-run approach would be superior.

@gregwebs
Copy link
Author

Merging this does not preclude having --to and --dry-run.

I can change the flag to -f --filter. Keeping it as a the top-level argument rather than sub-command I think helps explain that all it does is filter the migrations, there is nothing contextual about the command (whereas --to is contextual).

@dossy
Copy link
Collaborator

dossy commented Nov 15, 2023

But I don't know if this is a good idea. What would happen in the following case when I run migrate or rollback with --limit parameter?

[x] 0001
[ ] 0002
[x] 0003
[ ] 0004
[x] 0005
[ ] 0006

I know this is only my opinion, but I feel strongly that any schema migration tool that allows this state to occur is either broken-by-design or is being improperly used.

Migrations are a linear journal of operations that mutate the system from one state to another. Schema versions must be strictly monotonically increasing. The primary use case of a schema migration tool is that it makes the hard guarantee that if migrations are applied in order, that the system will resolve to the same state each time.

A tool which is used, or otherwise allows for, the application of migrations in an arbitrary order, cannot make the guarantee that the resulting state will be identical.

I feel it's a mistake to implement or introduce features that makes it easier for users to misuse it, either intentionally or worse, accidentally.

I feel it's a mistake to implement or introduce features that encourage users to misuse it.

The tool's design should encourage proper practices, discourage misuse, and prevent abuse, as much as reasonably possible.

If others feel strongly opposed to my views, I'd like to hear from anyone who cares to share their reasoning.

Ultimately, a good tool should also meet its users needs and I'm only one user out of many, but I hope sharing my views can help explain what I'm looking to get out of dbmate when I choose it.

@gregwebs
Copy link
Author

@dossy fundamentally this tool is not designed the way you want it to be designed. You can use it the way you want, and I can use it the way I want.

I am using it in a team scenario. People are making branches and making changes. When a branch is started, a prefix is chosen for a migration. I could require that the prefix be updated. But still at some point there can be 2 pull requests with migrations and I would have to merge one and then update the prefix on the other. Even still though, it's possible at deployment time to perform a point release with the newer migration first. Now I would need to merge a new pull request to rename an existing migration. Then maybe we decide to rollback the point release due to some issue. Now I need to rollback that PR to change the prefix. In the meantime the local developers need to reset their schema. Obviously I am going from common occurrences to less common. But the point is that any of this is a bunch of make work to satisfy an ideal that you have.

Applying the migrations as is in any order works just fine with transactional migrations in Postgres. If a migration fails, it gets rolled back. Obviously, for data migrations or non-transactional migrations there would be more concerns here. However, in practice I haven't once had an issue of migration dependency. Whereas if we followed your philosophy we would be doing prefix renaming on a weekly basis.

If you are worried about migration safety, the biggest threat to safety would be if it was unclear to users whether or not migrations are going to get rolled back!

@dossy
Copy link
Collaborator

dossy commented Nov 15, 2023

Applying the migrations as is in any order works just fine with transactional migrations in Postgres.

I don't understand how that addresses the key issue I'm describing.

You're saying that your team regularly operates in an environment where newly created migrations are introduced between previously applied migrations?

You're saying that your team regularly operates in an environment where the order in which migrations are defined differs from the order that they are applied in each database across your platform?

How do transactional migrations make this work?

If migration 1 is "create table foo" and that is then applied, and then a new migration 2 is introduced that is "alter table foo add column bar" and then this is applied, but then someone introduces a new migration 1.5 that is "drop table foo" ... and you apply that out of order in production, which "works" but anyone who plays all the migrations from empty initial state in order (1, 1.5, 2) will end up with an error thrown when migration 2 is applied after 1.5, because the table being referred to no longer exists.

I feel bad for any team who regularly operates this way. I cannot see how this is tenable or doesn't result in constant frustration.

How do transactional migrations solve this problem?

@gregwebs
Copy link
Author

transactional migrations solve the problem to the extent that when there is a migration failure things won't be left in an invalid state. So it's simple at that point to fix things if there were a problem running a migration for any reason, including if it were accidentally run out of order.

With that being said, no one on my team has eever had an issue with migrations being ran out of order. The reason is that when we do new work on branches with migrations they never depend on each other. By the time there is a branch with a migration any migrations it depends on have already been applied. The only time we have unapplied dependencies are on the same branch.

I feel bad for any team who regularly operates this way. I cannot see how this is tenable or doesn't result in constant frustration.

Although this is couched in your own imagination, this statement represents the tenor of your comments here and they are really out of line with being a good maintainer. It's the job of maintainer to focus on understanding the users rather than lashing out at them with emotional statements. I appreciate being asked to explain my workflow, but I have done it multiple times now in various ways and at this point it doesn't seem that you are actually curious about my workflows but instead focused on figuring out how to declare it as wrong.

@dossy
Copy link
Collaborator

dossy commented Nov 15, 2023

I feel bad for any team who regularly operates this way. I cannot see how this is tenable or doesn't result in constant frustration.

Although this is couched in your own imagination, this statement represents the tenor of your comments here and they are really out of line with being a good maintainer. It's the job of maintainer to focus on understanding the users rather than lashing out at them with emotional statements. I appreciate being asked to explain my workflow, but I have done it multiple times now in various ways and at this point it doesn't seem that you are actually curious about my workflows but instead focused on figuring out how to declare it as wrong.

I am trying really hard to understand your workflow and, as you've described it, how you make it work.

The key value proposition of a schema migration tool is acknowledging that having developers apply ad-hoc schema changes in arbitary order leads to problems such as version drift between environments and makes testability difficult, as you don't have a consistently repeatable process.

I'm not interested in declaring what you do as being wrong. I'm still struggling to see how what you've described can be done without creating avoidable problems. And I'm geniunely puzzled and trying to figure out the right question to ask to get an answer that helps me understand it. If I had already decided what you're doing is wrong, I wouldn't bother continuing to ask questions for which I already knew the answer. But, I am clearly missing something here, so I continue to ask questions as best I can.

But, you are right: I am not a good maintainer. I know this, and I appreciate you having the candor to say so. I stepped up as a volunteer to participate as a maintainer in this project as an opportunity to learn how to become a better maintainer. A good way to learn is by doing, or at least trying.

I also know that you don't owe me anything and I appreciate the time you've already spent answering my questions, and you have no obligation to help me become a better maintainer, so I appreciate whatever you're willing to share.

@gregwebs
Copy link
Author

I appreciate what you are doing to move this project forward: you are a being a great maintainer in all other aspects. There's just some things we can improve on communication here.

What I explained should be pretty clear: whenever we create a branch, all dependencies of the migration on the branch have been applied already. It is logically impossible for there to be any missing dependency issues in this scenario when creating a single migration, regardless of the merge order of branches and application order of commits. If someone creates 2 migrations in one branch (happens more rarely), the second they create might depend on the first. However, those two will still be ordered with respect to each other.

Now it's true that logically there can be conflicts (dependencies aren't missing) where two branches attempt to migrate the same thing. But such conflicts are resolved pre-merge where depending on how your CI operates, the migration would fail or the sql for the schema would conflict, etc. It's also worth noting that I don't know if we have even seen such conflicts other than with pull requests that have been abandoned for months.

I don't know what workflows you are using, but it seems like you might not have a CI that is able to validate migrations?

@dossy
Copy link
Collaborator

dossy commented Nov 15, 2023

I don't know what workflows you are using, but it seems like you might not have a CI that is able to validate migrations?

We do, but the CI always starts with a clean, empty database, and applies all the migrations in the branch in-order. This is the "first checkpoint"--if the migrations fail to apply successfully, it's generally the result of a newly introduced migration that has a defect, and it gets fixed.

However, it is possible for two or more branches to diverge, and depending on the order in which those branches are merged and the version numbers of the migrations from those branches, they may both successfully apply cleanly in the CI environment where each is applied to an empty database, but could result in migrations being applied out-of-order in the upstream branch.

This is why the introduction of the --strict option made a lot of sense to me: dbmate shouldn't even attempt to apply a migration that is a "hole" between other already-applied migrations, to prevent exactly this applying migrations out-of-order scenario. In this situation, the merge should be prevented, and the yet-unapplied migration should be re-numbered so that it will be applied after the latest applied migration, again tested in through CI to make sure that it will apply successfully in this order, and then finally merged.

An environment where allowing different environments to have migrations applied in differing orders leaves me wondering if you have some other external process that periodically does a full schema comparison between environments to prove that they match? Otherwise, how do you avoid unintentional schema version drift between environments, if you allow migrations to be applied in different orders in different environments?

That's what I'm trying to understand. Either you are accepting the risk that your schemas may differ between environments, which means testing in one may not reveal problems that will manifest in another, or you have some additional process to enforce consistency between environments?

@gregwebs
Copy link
Author

However, it is possible for two or more branches to diverge, and depending on the order in which those branches are merged and the version numbers of the migrations from those branches, they may both successfully apply cleanly in the CI environment where each is applied to an empty database, but could result in migrations being applied out-of-order in the upstream branch.

It isn't actually possible for the migrations to be out of order if both branches are branched off of the same base branch. That is if there is an issue it isn't an ordering (dependency issue), it is a conflict.

But how are the migrations passing CI and then failing in the upstream branch? Is this because branches are being merged when they are not up to date?
Are you using git flow? I find git flow can create branching confusion and use trunk based development

@Retch
Copy link

Retch commented Feb 17, 2024

How is the state at the moment?
image

@dossy
Copy link
Collaborator

dossy commented Feb 17, 2024

How is the state at the moment?

@Retch This is still an open pull request. If you want to test it, you'll need to build dbmate from this PR branch.

@gregwebs
Copy link
Author

@dossy do you use git flow or trunk-based development?

@dossy
Copy link
Collaborator

dossy commented Feb 22, 2024

@dossy do you use git flow or trunk-based development?

I use GitHub flow or branch-based development.

@gregwebs
Copy link
Author

@dossy pretty much everyone uses branch-based development, at least that uses Github and has more than one contributor.

With trunk-based there is only one special long-lived trunk branch (main). Releases are points on main and if need be temporary branches off a point of main.
With git-flow there are multiple special long-lived branches that indicate different levels of quality (develop and main) or different deployed environments (main, qa, prod).

Which approach are you taking?

@gregwebs
Copy link
Author

@Retch I merged the latest into my branch. I use the migration filtering feature all the time, give it a try and report back.

@dossy
Copy link
Collaborator

dossy commented Feb 23, 2024

Which approach are you taking?

The latter: multiple long-lived branches, plus feature branches.

@gregwebs
Copy link
Author

@dossy I think that explains the difference in perspective. With git flow there are lots of merges going on and sometimes different entry points for new code and thus merges in both directions. Although I think it is too complicated for normal code maintenance, migrations could create some of the worst issues.

With trunk-based development none of the issues you are concerned about end up being issues.

I don't think we should design software with the primary constraint that it should work with git flow development. I believe git flow is a worse approach theoretically as this discussion shows. In practice Github doesn't properly support trunk based development but does properly support git flow, so I do understand why one would end up using git flow with Github.

Right now this PR is in the state of being held back by objections based around a personal choices of how to manage branching strategy and assuming users cannot intelligently use this feature. I am more than happy to add some warnings to the documentation.

@dossy
Copy link
Collaborator

dossy commented Feb 23, 2024

Right now this PR is in the state of being held back by objections based around a personal choices of how to manage branching strategy and assuming users cannot intelligently use this feature.

I see why you say this, but I'd like to state that I am not holding back this PR based on my objections.

As my personal working style has no need for this PR, I have no horse in this race.

I've shared my opinion for others to consider, but am recusing myself from making any decisions about this PR. Someone who is properly qualified to judge this PR's appropriateness should chime in here and carry this forward.

In hindsight, I probably should have done this from the start, but since you tagged me specifically asking for a review, I shared my opinions about the PR, but I'm not qualified to decide if it should be merged or not.

Edited to add: I'd like to personally apologize to Greg for having wasted so much of his time asking questions about this PR, when I really had no business getting involved in the first place. Because it's so easy to share an opinion, I open my mouth (or, type with my fingers) when I really should refrain from speaking. It's a character flaw I know I need to work on, and it's even harder online because of how easy it is to just throw out a few cents without thinking it through more carefully. People close to me have said I "need to learn when to just shut up and keep your thoughts to yourself." I'll remember this as another instance where I've struggled to do just that, and hope to do better in the future.

@Retch
Copy link

Retch commented Mar 6, 2024

I don't know whether i'm using it wrong, but i can't get it to work properly.
I built it on the latest commit of @gregwebs as dbmate2.

These are my migrations, just a test project:
image

I dropped the db and executed all migrations.
image
image
image

As i understand, with the -m -3 flag, when rolling back, the last 3 migrations should get rolled back, but it seems only the last migration is rolled back, but not the 3-recent.
image
image
image

I also tested to specify the fourth-recent migration directly, and rollback the 3 most-recent, but only the last migration was rolled back.
image
image
image

Maybe i use it wrong.

@gregwebs
Copy link
Author

gregwebs commented Mar 6, 2024

There may be some issues to work out with the range ... support. Thanks for reporting that- I will make test cases out of those. I almost always use this by specifying single migrations -m specific_migration. This was what the changes first supported and I think I got accustomed to it- I will try using the open-ended syntax when possible now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants