In today’s post we will see a tool that can help you to bring your ActiveRecord queries to the next level in terms of performance. Quite surprisingly to me, after interviewing many Ruby on Rails developers I realized that many people do not know it even exists. Today we will talk about ActiveRecord’s explain command.
ActiveRecord Explain
When running explain on a query, ActiveRecord will execute and pretty print the output of the DBMS’s explain command. The output of this command will be different depending on whether your application uses Postgres, MySQL, MariaDB or SQLite (you can check the details in Rails guides) but in general it will contain the SQL query that is actually going to be executed from that ActiveRecord query and the query’s execution plan that the DBMS’s query planner has created for it. Understanding this query plan is crucial for ensuring good performance for our queries.
To check how this works, I have created a simple application to check how this works.
Example application
The application I have used for this example is a Rails 7 application with a Postgres database.
For the purpose of our example, the application will just have a User model, with a name and an email, which we can create with the following migration:
1 2 3 4 5 6 7 8 9 10 11 |
class CreateUsers < ActiveRecord::Migration[7.0] def change create_table :users do |t| t.string :name t.string :email t.timestamps endthat end end |
Note that this migration does not contain any indexes. This has been done on purpose. You will see later why.
To make things more realistic, I have preloaded the application with 100,000 random users. In my case, I have used the gems FFaker and FactoryBot, but using any other generation tools to seed your database would also be perfectly valid.
Now, let’s imagine that you want to fetch users by email, a pretty common use case in most web applications, which would typically be something like User.find_by(email: '[email protected]')
. Since find_by
returns an instance of User or nil, we cannot execute explain on it, but we can write its where counterpart, which is User.where(email: '[email protected]').limit(1)
. Let’s see what is the output of ActiveRecord’s explain for this query.
Open a rails console
and execute the following:
1 2 3 4 5 6 7 8 9 10 11 |
User Load (11.1ms) SELECT "users".* FROM "users" WHERE "users"."email" = $1 LIMIT $2 [["email", "[email protected]"], ["LIMIT", 1]] => EXPLAIN for: SELECT "users".* FROM "users" WHERE "users"."email" = $1 LIMIT $2 [["email", "[email protected]"], ["LIMIT", 1]] QUERY PLAN --------------------------------------------------------------- Limit (cost=0.00..2440.98 rows=1 width=62) -> Seq Scan on users (cost=0.00..2440.98 rows=1 width=62) (3 rows) |
As you can see, the first hint that explains gives us, is how our ActiveRecord’s query translates into SQL.
And second, how the query planner intends to execute it. Let’s analyze the query plan in detail.
The query plan
The first thing our planner is telling us is how does it plan to fetch the data. In our case, it plans to do a Sequential Scan (Seq Scan) on the users table. That basically means, going record by record and Filter by email to see if it matches our condition. Which does not seem very efficient.
The second thing the planner is telling us, is a cost estimation cost=0.00..2440.98. For that it gives us two values. The first one, 0.00, is the start-up cost, which is the estimated cost they query will have to initialize everything before the output phase starts. The second one, 2440.98, is the total cost, which is an estimation on how much cost it will take to fully execute the query.
As a side note, it is worth mentioning that these cost values are not given in milliseconds or any other fixed units, but instead they are arbitrary units determined by the planner’s cost parameters. If we change those cost parameters, they cost measurements will also change proportionally. Check the DBMS’s documentation if you want to know more about this.
Rows is the estimated number of rows the output will contain, in this case 1. While width is the average width in bytes (62) for those rows.
Looking for improvements
With this data at hand, we can think of ways of improving the performance of our query.
Since the query planner is telling us that it is performing a sequential scan checking all the rows, adding an index should most likely improve our current results.
1 2 3 4 5 |
class AddIndexOnEmailToUsers < ActiveRecord::Migration[7.0] def change add_index :users, :email end end |
Let’s now run the migration with rails db:migrate
, and check explain again to see if we got any performance improvement.
1 2 3 4 5 6 7 8 9 10 |
User Load (7.1ms) SELECT "users".* FROM "users" WHERE "users"."email" = $1 LIMIT $2 [["email", "[email protected]"], ["LIMIT", 1]] => EXPLAIN for: SELECT "users".* FROM "users" WHERE "users"."email" = $1 LIMIT $2 [["email", "[email protected]"], ["LIMIT", 1]] QUERY PLAN ----------------------------------------------------------------------------------------- Limit (cost=0.42..8.44 rows=1 width=62) -> Index Scan using index_users_on_email on users (cost=0.42..8.44 rows=1 width=62) (3 rows) |
Now we see a completely different situation.
As you can see neither the query, nor the values rows and width have changed. But we have a completely different query plan, and with it, a completely different cost estimation.
The query plan is now performing an Index Scan checking the index condition, and we can also check that the new index that we just created (index_users_on_email) is being used.
In terms of cost, we have a slightly higher start-up cost, although negligible, and we have reduced the total cost to just 8.44! That is a 289X cost reduction!!! Again, 289 times less!!!
You have just discovered a new secret tool that can help you improve your application’s performance to levels that you had not imagined before.
If you enjoyed this post do not forget to subscribe so that you do not miss any of my future updates.
See you in my next post! Adiรณs!