Why? What? How? #1

nelsonic · 2017-01-09T19:41:14Z

@iteles Please add the photos you took of the sketches from today into this description of the issue (or send them to me!) thanks! ⭐️

Note: @alexeyklyukin has good experience with PostgreSQL and gave a Presentation on the problem at PGConf 2014: https://docs.google.com/presentation/d/1TV0bExFwVy-_d6C7A8Z2JL9Z9tvtkuZv3D58fkC3GWQ
it might be worth reaching out to him to understand what the "state of the tools" is in 2017...?

nelsonic · 2017-01-09T19:42:32Z

@des-des could this be useful:

http://sqitch.org/ (perl)
https://github.com/ameingast/postgresql-simple-migration (haskell)
https://github.com/yandex/pgmigrate (python)
https://github.com/mbryzek/schema-evolution-manager (ruby)

des-des · 2017-01-11T12:32:11Z

Ok this exists for sql server. It finds the difference between two dbs, and builds a migration script. This is the kind of thing we are interested in but for postgresql. https://opendbdiff.codeplex.com/

https://www.npmjs.com/package/dbdiff

des-des · 2017-01-11T13:01:23Z

Ok, This is some js diffing two dbs: https://github.com/gimenete/dbdiff/blob/master/dbdiff.js.

Works for postgresql!

des-des · 2017-01-11T13:21:11Z

What are we proposing to build?

The title here is postgres-schema-migration-checker. To me this suggests that we are building a ci tool to check if a schema migration will work.

Ie given a migration script, can we apply it to the database without breaking anything. This seems different to my impression of our discussion but makes more sense.

Given this outlook, I possible steps as.

Create a dummy database with existing schema. Fill it with either anonymised or generated dummy data.

Attempt to apply the migration, if it works we are okay to apply the migration to production.

To outline what I think @nelsonic was suggesting (the whole migration process).

Push up feature branch with schema change.
build new db from schema.
try to transfer anomymised / mock data into the new database.
run tests on the new and populated db.
success
merge
build a new db from the schema
transfer data from old db to new db.
Switch production databases.

In this process, I am not sure if we can depend on reliability of transactions happening during the copy in step 8.

What problem are we trying to solve?

Oxford abstracts have many migration scripts.
Each one is run on server startup. (I assume this will happen when new code is deployed)
These migration scripts need to work even if they have been run before, as they get applied many times to a production db.

As the number of migration scripts grow, this process is becoming hard to manage. The db schema become more confused, as to understand the current schema the effect of many migration scripts must be taken into account.

I think another problem here is testing time. Buillding a test db is taking a long time, as all the migration scripts need to be applied.

@Conorc1000 @roryc89 Do you feel my description of your problem is accurate? Is there anything you can add.

Conorc1000 · 2017-01-11T15:34:35Z

@des-des I think this is a good description of the problem. (Yes we currently run our migrations each time we deploy to heroku).

roryc89 · 2017-01-11T15:41:09Z

@des-des I agree that this an accurate description of our most pressing schema migration issues. Thanks for helping out! :)

des-des · 2017-01-18T13:11:10Z

@nelsonic It seems to me that normally you would apply a migration schema to a db. Rather than build a script that moved data between two dbs with different schema, I think this might be confusing me.

This is also related to my previous question: During the actual deployment step, how can we safely keep the client / server live while the copy is in progress? eg If we send a POST to both live db and db being created can we be sure that the data will not get added twice.

@iteles Do you have the photos?

iteles · 2017-01-19T20:05:43Z

@des-des So sorry, I remember uploading them and checking the preview to make sure they were there but must have forgotten to hit 'Update comment' after that 😭

I've updated the top comment with them now. Apologies again.

nelsonic · 2017-01-26T09:48:20Z

Hi @des-des yes, "normally" a migration would be applied to the DB.
What we need to know is if the migration schema will "break" any existing data in the Database.

More "advanced" or "mature" schema migrators like Active Record will attempt to do this for you.
We need a similar process for our hand-written SQL table definitions.
It could be that there is already a tool out there that looks at a two versions a database
with slightly different schemas e.g. changing a date field's type from varchar to date
which then Tests if the existing data will be converted without corruption/loss.

Our idea was to investigate the <option> of using an existing Schema Migration script/tool
or if we don't find one to write one that does exactly what we need.

If we need to clarify the requirements further, I'm happy to do so.
I agree that knowing exactly what we are trying to achieve before we set out on the journey
will save huge amounts of time and frustration,
so thanks for asking questions and please keep them coming! 👍

Also, to be clear if we can make this project/work reasonably generic,
i.e. it can be used to test if data in any existing Postgres database
will be affected by a schema migration (change in the structure of tables).
we will be able to re-use this for all/any project that has a PoSQL DB.
that will include many of our existing projects and several future ones.
and crucially there are tens of thousands of projects where this would be useful.
Because Postgres is not "going away" any time soon... dwyl/how-to-choose-a-database#4

des-des · 2017-01-27T07:41:59Z

@nelsonic awesome. That makes sense! Will try to put something together soon!

des-des · 2017-01-29T16:04:48Z

Ok, so I see there being three parts here

1: Managing the application of migration scripts in production.

This is the focus of most of the tools @nelsonic has linked to. The main idea here is to ensure migration scripts are only applied once. The target db will have a table of previously applied schema.
@Conorc1000 @roryc89 This provides a partial solution to your problem. Deploying a tool similar to these would help you, since your migration scripts would not need to guard against being doubly applied. You could have confidence, that for any db the migrations would be applied only once and in order.

2: Automatic creation of migration schema

Given a pull request with a change of schema, can we automate the creation of a migration script?
Yes, we can use a tool like dbdiff.
Q: What is the ux when this fails?
Q: This tool does not provide a rollback, is this a problem?
Q: how can we insure that

GetSchemaFromLiveDb(ApplyMigration(dbN-1, createMigrationFromDiff(schemaN-1, schemaN))

and

GetSchemaFromLiveDb(createDbFromSchema(schemaN))

are the same
Where are the migration schema kept? Does the ci tool commit them to the repo? I think we just run the creation tool locally

3: Testing if a schema does not lose data.

Too me, although it may be tangential to the problem at hand, this project needs a way of being confident that migrations are not losing data before it leaves beta.

This is the problem Nelson's drawings are describing.
We can break this problem into two parts

Create a test db as a playground to test the migration. We can do this in three ways: 1 (May not keep client data secure) copy the db; 2, create mock data by inspecting the schema; 3, anonymise data from prod db to use in mock. Q: Our tool does not want to access the data for security reasons, that means we cannot access the data to anonimyse it. We would need to build two separate tools? One has access to the data and can spit out what we want, the other (the ci tool we are discussing) does not have access. I am not 100% sure of the benefit here.
Run tests on the mock db:
1. Build a tool to automate the creation of these tests. Q. is this the way we want to go, this seems like a hard problem, I can focus my reseach here?
2. Run tests build by the user. Ie we have a staging db with lots of mock data in it. We have integration tests pulling data from this db. We can check if the migration makes them fail.

nelsonic · 2017-01-29T17:22:07Z

@des-des yes, this would be a good approach. 👍

des-des · 2017-01-29T18:55:50Z

@nelsonic could you be more specific?

nelsonic · 2017-01-29T20:27:42Z

@des-des the steps you have described don't appear logical to me.
The UX when the migration step fails is: exit with a "non-zero" terminal output
consisting of the reason why the script failed.

Migration schemas would be automatically created on the developer's machine by the script.
i.e. on the localhost. and then preferably they should be included in the commit/PR so that we can version-control them.

As for having access to production data, we can simulate valid records for both schemas based on the data types for the columns.
Then it should be straightforward to attempt to insert a row that is valid for one schema into the same table(s) in the revised schema and check if it worked. 👍

des-des · 2017-03-06T07:36:15Z

Sorry to have left this so long. FAC1N happened and kinda took over for a couple of weeks then updating back here got lost in my todo list.

About a month ago I spoke to the OA team, I'll quickly summarise the outcome of that conversation. OA have actually gone ahead and solved a part of their problem. Their DB now holds a table of run migrations. This means that for any instance of their db, they can make sure migrations are run once in order.

Anyway, I think we need to step back a little and think about his problem in the context of dwyl's new stack..

iteles · 2017-03-06T09:17:10Z

@roryc89 @Conorc1000 @naazy Is this still required for your project?

Postgres is still part of our stack, so this functionality may still be required.

des-des · 2017-03-06T09:34:11Z

@iteles

@roryc89 @Conorc1000 @naazy and if the answer is yes, can you be explicit about what is needed?

roryc89 · 2017-03-06T10:21:19Z

@iteles and @des-des. As our main DB pain point, the running of migrations multiple times, has been addressed, I don't think we require a data migrations tool in our project at the moment. Although we may need it in the future

nelsonic · 2017-03-07T11:01:47Z

@roryc89 thanks for replying and confirming. 👍
if that's the case and given that we are moving most of our other projects to "PETE" which has really good schema migrations, should we put this project on the "back burner" for now? 💭

khairnarTK · 2019-07-17T07:18:52Z

Is possible to write python script for production db data migrate to staging db for postgresql??

nelsonic · 2019-07-19T11:25:12Z

@khairnarTK it's definitely possible, Django has migrations: https://docs.djangoproject.com/en/2.2/topics/migrations/

nelsonic added help wanted question labels Jan 9, 2017

nelsonic assigned iteles Jan 9, 2017

iteles removed their assignment Mar 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why? What? How? #1

Why? What? How? #1

nelsonic commented Jan 9, 2017 •

edited by iteles

Loading

nelsonic commented Jan 9, 2017 •

edited

Loading

des-des commented Jan 11, 2017 •

edited

Loading

des-des commented Jan 11, 2017

des-des commented Jan 11, 2017

Conorc1000 commented Jan 11, 2017

roryc89 commented Jan 11, 2017

des-des commented Jan 18, 2017 •

edited

Loading

iteles commented Jan 19, 2017

nelsonic commented Jan 26, 2017 •

edited

Loading

des-des commented Jan 27, 2017

des-des commented Jan 29, 2017

nelsonic commented Jan 29, 2017

des-des commented Jan 29, 2017

nelsonic commented Jan 29, 2017

des-des commented Mar 6, 2017

iteles commented Mar 6, 2017 •

edited

Loading

des-des commented Mar 6, 2017

roryc89 commented Mar 6, 2017 •

edited

Loading

nelsonic commented Mar 7, 2017

khairnarTK commented Jul 17, 2019

nelsonic commented Jul 19, 2019

Why? What? How? #1

Why? What? How? #1

Comments

nelsonic commented Jan 9, 2017 • edited by iteles Loading

nelsonic commented Jan 9, 2017 • edited Loading

des-des commented Jan 11, 2017 • edited Loading

des-des commented Jan 11, 2017

des-des commented Jan 11, 2017

What are we proposing to build?

What problem are we trying to solve?

Conorc1000 commented Jan 11, 2017

roryc89 commented Jan 11, 2017

des-des commented Jan 18, 2017 • edited Loading

iteles commented Jan 19, 2017

nelsonic commented Jan 26, 2017 • edited Loading

des-des commented Jan 27, 2017

des-des commented Jan 29, 2017

1: Managing the application of migration scripts in production.

2: Automatic creation of migration schema

3: Testing if a schema does not lose data.

nelsonic commented Jan 29, 2017

des-des commented Jan 29, 2017

nelsonic commented Jan 29, 2017

des-des commented Mar 6, 2017

iteles commented Mar 6, 2017 • edited Loading

des-des commented Mar 6, 2017

roryc89 commented Mar 6, 2017 • edited Loading

nelsonic commented Mar 7, 2017

khairnarTK commented Jul 17, 2019

nelsonic commented Jul 19, 2019

nelsonic commented Jan 9, 2017 •

edited by iteles

Loading

nelsonic commented Jan 9, 2017 •

edited

Loading

des-des commented Jan 11, 2017 •

edited

Loading

des-des commented Jan 18, 2017 •

edited

Loading

nelsonic commented Jan 26, 2017 •

edited

Loading

iteles commented Mar 6, 2017 •

edited

Loading

roryc89 commented Mar 6, 2017 •

edited

Loading