Introduction
At OmbuLabs , we are big fans of Ruby on Rails and design patterns , especially convention over configuration ! The beauty of Rails is that you can inherit a legacy project and easily find the different layers of code in different directories.
When it comes to database migrations the policy of Rails is very clear. It’s all about altering the database structure with gradual migration files: “Migrations are a convenient way to alter your database schema over time in a consistent and easy way.” (source )
But, what about data migrations? What’s the best way to write, maintain, and run migrations that alter the data in your production database?
In this article I will talk about three different patterns for writing and maintaining your data migrations:
- Data migrations in db/migrate
- Data migrations with a set of Rake tasks
- Data migrations with data_migrate
Data migrations in db/migrate
When you are starting with Ruby on Rails, it seems like a good idea to add data migrations to your database migrations. You can quickly take advantage of core Rails functionality and start modifying records across environments.
Let’s say that you need to change the default state of a blog post, you can create a migration file like this:
rails generate migration ChangeDefaultState
  invoke  active_record
  create  db/migrate/20190211222539_change_default_state.rb
In that file, you can write a statement like this one:
class ChangeDefaultState < ActiveRecord::Migration[5.1]
  def up
    Post.where(state: "Draft").update_all(state: "TODO")
  end
  def down
    raise ActiveRecord::IrreversibleMigration
  end
end
Great! You’re done. You run your migration and then you forget about it. Months
later your client comes along and wants you to change the table name from posts
to articles. Sounds like a pretty simple change, right? Sure!
You change the table name, rename the model, submit your pull request, fix the
build, the build passes, the code review passes, and it gets deployed to
production. What you may have missed is that now you no longer have a Post model
in your app.
The next time you need to set up a development environment, you create your
database, you run your migrations, and you get a failure in your console: Rails
does not know what Post is:
$ rails db:migrate
== 20190211222539 AddDefaultState: migrating ==================================
rails aborted!
StandardError: An error has occurred, this and all later migrations canceled:
uninitialized constant AddDefaultState::Post
/Users/etagwerker/Projects/ombulabs/points/db/migrate/20190211222539_change_default_state.rb:3:in `change'
The solution is quite simple: You need to rename the model in your migration. There are patches that you could apply, but I won’t go down the rabbit hole. This pattern is not the best approach.
To sum things up, here are the pros and cons of this pattern:
Pros
- It’s quick because you can ship a data migration in a few minutes
- You use Rails core logic that makes sure you don’t run a migration more than once
Cons
- Migrations fall out of date very quickly and you don’t have a quick way to find out
- You’re not following Rails conventions: Files in db/migrateshould migrate database structure, not database records
At OmbuLabs we consider this practice an anti-pattern .
We strongly advise you not to include data migration code in your db/migrate
files.
Disclaimer: One way to make sure that your migrations don’t fall out of date is to run them constantly. You could run
rake db:migrateevery time you run your build, but that would slow down your build. Usually you want to use the much fasterrake db:schema:loadbefore you start running your test suite.
Data migrations with a set of rake tasks
If you have gotten bitten by the first approach before, you know there is a better way. You can write a bit more code and solve this problem using plain old rake tasks that follow a new set of conventions.
namespace :data do
  task :migrations do
    Rake.application.in_namespace(:data) do |namespace|
      namespace.tasks.each do |t|
        next if t.name == "data:migrations"
        puts "Invoking #{t.name}:"
        t.invoke
      end
    end
  end
  task change_default_state: :environment do
    puts "Changing default state for posts"
    Post.where(state: "Draft").update_all(state: "TODO")
    puts "Changed default state for posts"
  end
  ...
The first rake task (rake data:migrations) will run all tasks in the data
namespace excluding itself. This can be a little scary (and dangerous!) so you
should make sure that all rake tasks in that namespace are idempotent. You want
to be able to run rake data:migrations as much as you want without risking
data loss.
The problem with this approach is that it doesn’t actually fix the problem we mentioned in the first pattern: As your application evolves, your data rake tasks will fall out of date. How can we fix that?
The best way is to fix it with an additional class. We can call it
ChangeDefaultStateForPosts. It will be a plain old Ruby object (PORO) that will
run the data migration. This will help us add test coverage for it.
class ChangeDefaultStateForPosts
  def self.call
    Post.where(state: "Draft").update_all(state: "TODO")
  end
end
Now that we have a simple class, we can write a simple spec:
RSpec.describe ChangeDefaultStateForPosts do
  describe "ChangeDefaultStateForPosts.call" do
    let!(:post) do
      Post.create(state: state)
    end
    context "when post is in 'Draft' state" do
      let(:state) { "Draft" }
      it "changes the state of the Draft" do
        expect do
          ChangeDefaultStateForPosts.call
        end.to change(post, :state)
      end
    end
    context "when post is in another state" do
      let(:state) { "Published" }
      it "doesn't change the state" do
        expect do
          ChangeDefaultStateForPosts.call
        end.not_to change(post, :state)
      end
    end
  end
end
Now we know that the data migration is covered by a spec. When someone tries
to rename the posts table to articles, they will get a test failure. This
will force them to update the data migration file.
Pros
- You have test coverage for your data migrations
- You will find out there is a failure right away
Cons
- You have to write more code
- You have to document this new convention
Data migrations with data_migrate
I know what you are thinking: “There has to be a gem for this!”
You are right! You could use data_migrate 
for all your data migrations. Just like Rails with the table schema_migrations,
data_migrate uses a table called data_migrations to keep track of new and
old migrations.
For the problem we are trying to solve, you could create a new data migration like this:
rails generate data_migration change_default_state_for_posts
That will add a new data migration to the db/data directory. You will need to
define the up and down methods:
class ChangeDefaultStateForPosts < ActiveRecord::Migration[5.1]
  def up
    Post.where(state: "Draft").update_all(state: "TODO")
  end
  def down
    Post.where(state: "TODO").update_all(state: "Draft")
  end
end
To make sure that your data migrations don’t fall out of date, you can set up
your build to run rake db:schema:load:with_data before your test suite.
Pros
- You have a quick way to run the data migrations without writing extra code
- You will find out there is a failure right away
Cons
- You have to add a new dependency to your project
- You have to document this new convention
What’s the best strategy?
I believe that the best way to solve this problem is to use data_migrate. You
will write less code, keep all data migrations in a db/data directory, and
have a good way to keep track of data changes.