Replace a database column with ActiveRecord with zero downtime
There are times with your web application where you want to switch a database column. Perhaps you discovered that the data sits better on a different table or perhaps the data stored in that single column needs to be split across two. This is a problem teams I’ve been on at GDS seem to face every now and again and I thought I’d take the opportunity to document the process I recommend to developers.
Often developers approach this task by trying to complete this task with their changes all wrapped into a single deployment, which is reasonable if you’re prepared to take your application down for maintenance. However if you want to keep the application available and avoid serving errors you’ll need to follow a multi-step process with a number of deploys with distinct migrations. Depending on the risks involved in switching your column you may also want to keep the door open to rolling back the application to using the old column.
The basic steps to this are: create the new column(s) in the db and set the application to fill this whenever it writes or updates the old value; create a task to loop through the old data to backfill the new column; switch the application to use the new column; then drop the old one. What follows are the considerations to make at each one of these steps and examples of how this is done using ActiveRecord with Ruby on Rails.
Step 1: Introduce new column and populate it when new records are entered
The first step is to add the new column to the database. This can be done by a standard Rails migration. At GDS we’ll regularly consult braintree’s handy guide to safe operations for high volume PostgreSQL to determine our initial migration won’t block the DB. There’s also an interesting gem, strong migrations, that can warn you for PostgreSQL when you’re going to run a dangerous migration. There are likely similar resources for other popular database flavours.
The application should then be adjusted so that whenever a record is created or updated this new column is populated. This step is required before backfilling this new column with existing data, the reason for this is that during the process of backfilling new data may be written to the application and thus the backfill will end up missing data or having out of date data.
At the point of adding this new column there is also an opportunity to make amends to the column that is being migrated away from, such as dropping a non-null constraint.
Given we’ve decided the field document_type needs moving from the Document model to the Edition:
class Document < ApplicationRecord has_many :editions # Document has a document_type value that is moving to editions end class Edition < ApplicationRecord belongs_to :document end
We can create the column with a migration:
class AddDocumentTypeToEdition < ActiveRecord::Migration[6.0] def change add_column :editions, :document_type, :string end end
We can populate records that are saved with an ActiveRecord callback (to avoid changing application business logic) if our change is simple:
class Edition < ApplicationRecord belongs_to :document before_save { self.document_type = document.document_type } end
If migrations are run automatically as part of a deploy (before the application code switches over) this whole step can be done in a single deploy, if not the migration will have to be deployed and run before the change to populate
Step 2: Backfill the new column
Now that any new writes to the table populate the column the next aim is to fill the column for all the existing records of the database. To do this some code is needed to iterate through all the records in the database and updating the record for each one.
You may choose to do this with another migration or a Rake task. Your choice here will depend on your attitude to what belongs in a Rails migration - a purist may only want to have commands that alter the database structure and nothing that alters data structure. Myself, I don’t mind them being as a migration because it keeps it’s a consistent place with the other database operations and doesn’t clutter parts of the app once they’re run.
Some things to consider with this task are: a backfill can take a long time and thus may fail in some way, it’s good if they can pick up where they failed if that happens; doing everything in a single transaction may cause performance problems once the transaction becomes huge; and using any application ActiveRecord classes carries risks if those classes have changed at the point a different user runs the migration.
class BackfillDocumentTypeOnEdition < ActiveRecord::Migration[6.0] # We use this to run the migration outside a transaction, which allows resuming and reduces DB load disable_ddl_transaction! # create a private instance of an ActiveRecord object so that we don't need to use our application code class Edition < ApplicationRecord belongs_to :document end def up # by removing nil records we can resume a failed run of this and only update what was missed to_update = Edition.includes(:document).where(document_type: nil) total = to_update.count to_update.find_each.with_index do |edition, index| # for a long running migration/task it's useful to provide progress puts "Updated #{i}/#{total} edition document types" if (i % 1000).zero? edition.update!(document_type: edition.document.document_type) end puts "Updated #{total} edition document types" end def down # don't do anything, all these changes can be removed by rolling back the previous migration that added the column end end
This backfill will need to be deployed and run before we move onto the next step of starting to use the new column.
Step 3: Swith to using the new column
The new column should now be fully populated and the application can have it’s logic switched to make use of it. This is also an opportuntity to write any migrations that apply constraints (such as non-null) or defaults to the new column.
A key decision at this point is whether or not to provide a safety net to allow rolling back if the deploy has problems. Once the application stops populating the old column then rolling back is not possible, you’ll have to fix forward. In my experiences I’ve tended to not set-up the need for rollback and instead be cautious my deploy isn’t coupled with other changes but your circumstances may differ as may the impact a problem would have on your application/business.
If you are in a situation where you run staggered deploys (for example gradually deploy to a number of machines then others) your hand will be forced to need the rollback so that the application can work with both versions of the codebase.
If a rollback is not considered necessary then the next deploy of the application is focussed on the application making use of the new column. This would be represented by changing the application logic to use the new column that was introduced and no longer make references to the old column.
Without a rollback we can combine the deploy to change application logic with step 4 of deprecating the old column to reduce the number of deploys needed in this process.
To allow rolling back a deploy that switches the application to use the new column it is necessary to have a process which keeps the old column populated while the rest of the application switches to using the new column. To do this you’ll need to make all the necessary logic changes that favour the new column and then do a reversal of the process in step 1 to keep the old column populated.
class Edition < ApplicationRecord belongs_to :document after_save do document.update!(document_type: document_type) if document.document_type != document_type end end
Step 4: Deprecate the old column
Once the new column is in use and you are satisified that the process is successful the work can begin to get rid of the old column. Any code that was added to populate the old column can be removed.
It’s common for a developer to proceed to add a migration to remove the column that is no longer needed, this however carries a risk of application errors which can be alleviated with an intermediatry step (which unfortunately requires another deployment). The type of error you may see if you remove the column are: "PG::UndefinedColumn: ERROR: documents.document_type does not exist", the reason you’d see these is because ActiveRecord caches the underlying columns when the application starts. It then uses this cache when an ActiveRecord object instance is loaded to select each column from the database.
This error can be avoided by using the ignored_columns method that was introduced to ActiveRecord in Rails 5. Once Rails has been deployed with the column ignored we are then safe to remove the column.
class Document < ApplicationRecord has_many :editions self.ignored_columns = %w[document_type] end
Step 5: Remove old column then relax and celebrate
Finally, the last step is the simplest one. All that is needed is the actual removal of the old column. This can be done by a simple migration. We can then set out a sigh of relief that this process is complete.
class RemoveDocumentTypeFromDocument < ActiveRecord::Migration[6.0] def change remove_column :documents, :document_type end end
Has this procress helped you or do you do something different? Let me know on Twitter.