The Curmudgeon Coder Blog

by Mike Bishop

Thoughts, rants and ramblings on software craftmanship from someone who’s been around the block a few times.

The Curmudgeon’s Guide to Unit Testing

What, yet another blog entry on testing? Yeah, but there’s something special about this one. I wrote it, to pass on some things I’ve learned about testing over the years.

Why test?

Well, that’s an easy question. We test to make sure code functions correctly. More specifically, we’re doing validation, which is ensuring that the specifications from which the code was written correctly capture the requirements, and verification, which is ensuring that the code performs in a way that meets the specifications. There are other reasons to test, however.

We write tests to figure out what a unit of code is doing, which can be especially useful if we didn’t write the code. We write tests to experiment with code, to answer questions like what happens if I give it a negative number for a parameter?

We can also write tests to document how a unit of code is to be used. A unit test is the best developer’s guide you can build. It’s by definition an as-built document that must be kept up-to-date when code changes, or builds will fail. It’s a great way of documenting your code so that future developers will be able to learn and understand it.

What’s a unit?

What do I mean by unit? Is it a class, an integration of classes that responds to a request, a unit of behavior, or a subsystem? Yes! In the old days, a unit was considered to be a class, a subprogram or any other small set of code. I believe that a unit is anything that we feel is worthy of being tested. We can use unit testing techniques on any granularity of code.

Test suites are apps

A collection of test code is an app every bit as much as the app that it’s testing. A testing app is typically invoked by a framework, like JUnit or NUnit for example, instead of being invoked directly. It evolves independently of the app being tested. You will refactor unit test code just like application code.

The easiest way to write unit test code is to write a test class corresponding to each class in the application that needs to be tested. It doesn’t have to be that way. Another way is to write tests for use cases. A use case usually involves multiple classes working together to solve a problem. Writing a test for a use case is quite valuable because it tests meaningful functionality, and it provides documentation at a level that most developers will be looking for when trying to understand the app. Instead of your test code being structured the same way as the app, you can structure it in a completely different and more meaningful way.

For example, let’s say that we’re writing some unit tests for an e-commerce app of some kind. Let’s focus on two use cases: one in which people buy widgets and one in which widgets are added to the inventory. We could build a package or namespace structure for our tests that looks like this:

com.acme.widgetapp
    .purchasing
        PurchaseWidgetTest
    .inventory
        .management
            AddWidgetToInventoryTest

This structure reads like the index of a user’s guide. If a developer who is new to the project wants to see an example of how to write code to add a widget to the inventory or handle the purchase of a widget, it’s easy to find the unit test that will provide examples.

Naming test cases

Naming is important for software development in general, and it’s especially important for test cases. Here are a couple of examples of what I would consider to be acceptable test case names:

invoice_is_created_and_emailed_to_the_user
error_notification_is_sent_when_invoice_email_fails

I know what you’re thinking. Holy crapballs! Those are some long method names! and Underscores? People still use underscores?

A good test case name describes the case that is being tested. If the code is testing a use case, then the test case name should uniquely identify the use case and provide a high-level description of it. If the code is testing some lower-level functionality, the test case name should clearly indicate the outcome being tested. Don’t worry about the length of the name. You’re only going to type it once. And by the way, those example names aren’t all that long. I’ve written test cases with names twice as long as those.

Look, I don’t like underscores either. I use them because it’s possible, though perhaps unlikely, that I may ask a domain expert or business analyst to look at the test code to make sure that I’m covering all of the conditions. I’m not going to ask them to look at detailed code (like support methods, which I’ll still write using camel case), just the test case and test step (see below) names. Civilians like BAs and DEs can read names in snake case (i.e., with underscores) much more easily than names in camel case.

Writing test cases

I believe that the best way to write unit test cases is by following the style of behavioral specifications (from Behavioral-Driven Development) written in Gherkin. Here’s an example of a unit test case written that way (it’s in Java but easily translatable to other languages):

public void invoice_is_created_and_emailed_to_the_customer() {
  given_a_customer();
  and_an_order();
  and_an_order_processor();
  when_processing_the_order();
  then_an_invoice_is_created();
  and_the_invoice_is_emailed_to_the_customer();
}

Each test step is implemented by a method. The first three steps arrange the test, the next one (starting with when) performs the test, and the last two steps do the assertions. Isn’t that easy to read and understand? You can find the full test class for this example here.

I’ve written a Java-based behavioral unit testing framework called gwt-test1 that I like to use for writing unit tests. The above test written with gwt-test looks like this:

public void invoice_is_created_and_emailed_to_the_customer() {
  gwt.test()
    .given(a_customer)
    .and(an_order)
    .and(an_order_processor)
    .when(processing_the_order)
    .then(an_invoice_is_created)
    .and(the_invoice_is_emailed_to_the_customer);
}

In this example, each test step is implemented by a function, and the use of behavioral specifications is enforced by methods that take the functions as arguments. You can find the full test class for this example here.

Testing legacy code

You may spend more time during your career writing tests for legacy code than for new code. If the original developer of the code is no longer around, you may have to assume that the code works, and that failing tests are probably the fault of the tests. Even then, giving a test to code that doesn’t have one provides more value than you may realize. Tests can help you figure out what the legacy code does. And with tests in place, you’ll be notified when application code updates cause the assumed behavior of legacy code to be violated. When that happens, it may mean that the code was broken or that the test was incorrect. Either way, you’ll be prompted to figure it out.

When you start writing tests for legacy code, you’ll probably follow the unit test-per-class approach at first. It’s the quickest way to add tests to the code base. Over time, you’ll learn more about how the code works and be able to identify groups of classes that cooperate to implement use cases. Then, you’ll be in a position to write use case-level tests that document the code in a more meaningful way.

Building a test base for legacy code takes a lot of time. As you work on code to add new features or fix broken functionality, you can add a unit test or two. Every little bit helps. You may not be able to build a complete test base for that legacy app while you’re there, but if you set the example of adding tests when you have some time, others will follow your example.

Code coverage metrics are overrated

Look at this code:

@AllArgsConstructor
@Getter
@Builder
public class SomeAbstraction {
    private String aProperty;
    private Integer anotherProperty;
    private List<String> aListOfStuff;
}

This is a Java abstract data type written using a plugin called Lombok. Given the above code, Lombok will generate a class that has a constructor (with one argument for each of the properties), three getter methods and a builder companion class. How much test code should you write for this class? The answer is not one damn line of code. All you would be doing is testing the compiler and Lombok. That’s not your job.

When determining whether you need to write tests for code, ask yourself this question: How much more confidence would tests of this code give me that the code and the application it’s a part of function properly? Code containing business logic needs to be tested, but writing a test for the above abstract data type doesn’t move the needle of confidence one bit.

What am I getting at here? If your project has a static test code coverage threshold, you can ensure that you’ll meet that threshold by writing unit tests for code like that abstract data type. What good does that do? It just adds more effort for no good reason. And what’s going to happen if you’re not able to release because your test code coverage threshold isn’t being met?

If you’re willing to stop the release process no matter how critical or politically sensitive that release is, and you’re willing and able to explain that decision to the C-suite, then keep your test code coverage thresholds.

If you would lower the threshold so that you can complete the release process, then eliminate your test code coverage thresholds. They’re useless. Continue gathering test code coverage metrics, though. But use them to identify trends, not as thresholds. If your test code coverage declines over several sprints, then you can address the need for more testing.

Testing your tests

How do you know if your tests work? There may be important code that your tests aren’t covering. You may also have test code that isn’t really testing anything. A good way to check how good your tests are is mutation testing. Mutation tests modify the test code in an attempt to force tests to fail. For example, an arithmetic operator may be changed from addition to subtraction, or a comparison may be negated. If your test code is good, such mutations will cause your tests to fail. If the tests pass despite the mutations, either some code isn’t being tested or some test code isn’t testing anything.

Other random testing advice

Mock or containerize components that are outside of the test scope

Most of you already know this. However, you may encounter code that you want to test that isn’t designed to easily facilitate mocking or containerizing. In those cases, you’ll have to refactor the code before you can write a unit test for it.

Test cases must be functionally and temporally decoupled

Each test case must be completely self-contained. It must not depend on another test case for setup or other functionality, and the test cases must be able to run in any order.

Test suites must be runnable in build pipelines

Make sure that your test code runs correctly in build pipelines as well as locally on your laptop. Creating self-contained test cases that mock external dependencies will help make that happen.

Disabled test cases are technical debt

If you have to disable a test case for any reason, that constitutes technical debt. Many testing frameworks allow you to supply a text string indicating why the test has been disabled. Make use of that, but also add an item to the product backlog to get that test working. That item isn’t a user story, it’s a tech story but that’s OK.

1 You can get gwt-test from Maven Central.