In this post I will share with you the 7 most common reasons for Flaky tests in Ruby on Rails, but before I start, let me define what a flaky test is to ensure that we are all on the same page.
What are flaky tests?
As a developer there is probably nothing more annoying than flaky tests, but what are they? Flaky tests are tests that fail in a nondeterministic way. In other words, tests that most of the time pass, but sometimes fail with no apparent reason.
The 7 most common reasons for Flaky tests in Ruby on Rails
Great, now that we all know what are we talking about, let’s get into the topic. Let’s see each case in detail and how you could avoid it using RSpec.
1. Your test checks for array equality when ordering does not matter
This is a very common scenario. Basically it happens when your code returns a collection with no specific ordering, but the developer checks for equality. Let’s see this with an example. Imagine that you have a blog application that has Posts. Posts can have comments. Now you want to fetch the comments of a post in no particular order. The test for that using RSpec could look similar to this:
1 2 3 4 5 6 7 8 9 10 11 |
context "do not check for equality when order is not relevant" do it "has comments (FLAKY)" do post = Post.create!(title: "1st post", content: "Lorem ipsum") comment = Comment.create!(content: "I love this", post: post) other_comment = Comment.create!(content: "Interesting, thanks!", post: post) post.reload expect(post.comments).to eq([comment, other_comment]) # FLAKY! end end |
The problem with the text above is on the expectation. It is checking for equality, but if for some reason the database returns the data in a different order it will fail. To fix it, use match_array. The good thing about match_array is that it will check that all elements from the collection are present, but it does not care about ordering.
1 2 3 4 5 6 7 8 9 10 11 |
context "do not check for equality when order is not relevant" do it "has comments" do post = Post.create!(title: "1st post", content: "Lorem ipsum") comment = Comment.create!(content: "I love this", post: post) other_comment = Comment.create!(content: "Interesting, thanks!", post: post) post.reload expect(post.comments).to match_array([comment, other_comment]) end end |
2. Capybara system tests that check an element that takes too long to load
Capybara tests are extremely valuable. They emulate your users clicking through your application and checking that everything works as expected for them in the user interface. A source of flakiness that gets introduced with Capybara is that you need to have UI load times into account. If your tests expect to find an element in they UI but the element has not been loaded yet, the test will fail. As you can imagine a test will be flaky if the element loads most of the times, but sometimes it does not.
Fixing this type of test is more complicated and you need to look at each case in particular. Some smoking guns would be checking your test’s configuration or checking that everything is OK in your UI. As your last bullet you can always add wait time to your find methods. Bear in mind that wait will also slow down your test suite, although some times it is better to slow it down than to have to retry the job that is running it.
3. Explicitly setting the id
This is a classic. It typically happens when you set the id in test records to later compare them, An example would be the following code snippet:
1 2 3 4 5 6 7 8 9 10 11 |
describe "Post.recent" do it "returns posts within the last month (Comparing IDs FLAKY)" do Post.create!(id: 1, title: "1st post", content: "Lorem ipsum", created_at: 2.months.ago) Post.create!(id: 2, title: "2nd post", content: "Lorem ipsum") recent_post_ids = Post.recent.map(&:id) expect(recent_post_ids).to include(2) # FLAKY! expect(recent_post_ids).not_to include(1) # FLAKY! end end |
The problem with this is that it should be the database’s responsibility to set IDs based on its internal sequences, not the code’s. If you keep setting IDs eventually there will be a collision when a record already exist with the ID hardcoded in the test.
A much cleaner non flaky alternative is to let the database do its job setting the IDs and just compare if the element is in the results, like this:
1 2 3 4 5 6 7 8 9 10 11 |
describe "Post.recent" do it "returns posts within the last month" do old_post = Post.create!(title: "1st post", content: "Lorem ipsum", created_at: 2.months.ago) recent_post = Post.create!(title: "2nd post", content: "Lorem ipsum") recent_posts = Post.recent expect(recent_posts).to include(recent_post) expect(recent_posts).not_to include(old_post) end end |
4. Trusting that DB sequences will create a predictable id
This is related to the previous one, but a little more subtle. At this point you let the database sequence set the ID for your new record, which is good. The problem comes when you expect the database to generate a specific ID. An example would look like this:
1 2 3 4 5 6 7 |
describe "dom_id" do it "returns the model name and the ID (Expecting explicit ID FLAKY)" do post = Post.create!(title: "1st post", content: "Lorem ipsum") expect(dom_id(post)).to eq("post_1") # FLAKY! end end |
The reality is that you should never expect the DB sequence status to be predictable. Probably most of the times you would get that result, but you never know if the sequence is going to return 1. On top of that, the goal of the test in this example is not to check that Post#dom_id will always return post_1, but that it will concatenate post with whatever id it has. A better non flaky alternative would be:
1 2 3 4 5 6 7 |
describe "dom_id" do it "returns the model name and the ID" do post = Post.create!(title: "1st post", content: "Lorem ipsum") expect(dom_id(post)).to eq("post_#{post.id}") end end |
After seeing this and the previous situations I hope you have understood the moral of the story: Do not have expectations on IDs in your tests.
5. Time dependent tests
This is a classic. It happens when your code checks a time dependent variable that could change in some situations. Let’s illustrate this with a definitely contrived example, but that I hope it will help you understand the issue:
1 2 3 4 5 6 7 8 9 |
describe "Post#created_at" do it "returns the model name and the ID" do post = Post.create!(title: "1st post", content: "Lorem ipsum", created_at: 2.hours.ago) # Do something expect(post.created_at).to eq(2.hours.ago) # FLAKY! end end |
Depending on how fast the test is executed, 2.hours.ago
at the beginning could not mean the same at the end of the test. The problem is that, it depends, and when this is not true, it will fail.
Other types of time dependant flakiness would be the ones introduced by time zone changes and also a very odd one that always strikes back: saving time changes.
To solve this try rethinking your tests, ensure you are using the same time zone or explore using freeze_time
blocks in which time does not change. Whatever is best for your case.
6. Depending on third parties
These errors happen when your code has an integration with a third party system. Imagine that your code performs a request to a third party API. If the API is not available your test will fail.
To fix those I recommend you exploring a gem called VCR. VCR introduces the concept of “cassettes” that you can record once and reply after that. In VCR you indicate a file to record the request. The first time you execute the test, the file will be empty. At this point VCR lets the system perform a real request and it stores both the request and the response that the third party system sent back to us. The next time this request is made, VCR will realize it already has a recorded cassette for it and will automatically return the recorded response, without fetching the third party API this time.
7. Tests that share state
These are less common, but they can happen if you use globally cached values that are modified and accessed by different tests. Some examples are global variables, class variables, storing values in Redis, modifying ENV variables, etc.
Team work
If you ask me what would be the best way to fix those I will answer team work!
Probably you were not expecting that. Of course you should apply the techniques and solutions I have proposed in this post and you can also investigate others, but without a team commitment to fix flaky tests they will outgrow you. This is part of the engineering culture. Not only one or two engineers should fix flaky tests. Everyone in the team should. I have been in teams that addressed flaky tests as soon as they were detected, the situation was under control. I have also been in teams that ignored them until it was impossible to ship code. As you can imagine I have better memories of the first team.
And that’s it for today’s post! Remember to always fix flaky tests as soon as they appear!
If you enjoyed this post, please consider subscribing so that you do not miss any further updates! Please also let me know in the comment section below which topics would you like me to talk about.
And without further ado, see you in my next post. AdiĆ³s!