The Curmudgeon Coder Blog

by Mike Bishop

Thoughts, rants and ramblings on software craftmanship from someone who’s been around the block a few times.

How to Safely Lose Your Abstraction

For years, I’ve heard complaints from developers about boilerplate code in programming languages. These complaints are usually not about, and shouldn’t be about, writing the boilerplate code. First of all, any decent IDE can generate the vast majority of it for you. Secondly, how much code you have to write is irrelevant since you’re only going to write it once. What’s far more important is the cognitive effort required to understand the code. In this regard, boilerplate code isn’t a huge hindrance as much as it’s the residue of the ability that OO languages give you to create abstractions that make code easier to understand.

For example, take a look at this LogEventRecorder Java class. Its API (that is, its public methods) depict a digital or tape recorder of some sort. That abstraction is fairly easy to understand as most people have experience with a recording device. The implementation of this class is quite different: a list of LogEvents, a reference to a current position in the list, a recorder state and something that can play LogEvents. The key to building abstractions is the capability to completely separate a type’s API from its representation. Completely separating the LogEventRecorder‘s API from its representation enables me to build any abstraction of that class that I want as long as the representation can support it. Boilerplate code is required in order for me do that.

What do I mean when I say a type’s API is completely separated from its representation? Take a look at the following code:

public class Point {
    private double x;
    private double y;

    public Point(double x, double y) {
        this.x = x;
        this.y = y;
    }

    public double getX() {
        ...
    }

    public double getY() {
        ...
    }
}

What do getX and getY do? If you said “they return the x and y coordinates of a Point object,” that’s an outstanding guess! But until you see the code within those two methods, it’s only a guess. There is nothing that guarantees that getX and getY must return a Point‘s x and y coordinates. They should do that, of course, or the code would be difficult to understand. But they can do whatever we want them to do, and that goes for any method. That’s the key to building abstractions. I can create any method, with any name, and it can do whatever I want it to. The boilerplate code that allows us to create classes with APIs and representations that are completely separate is essential in order for us to build abstractions in a programming language. So stop treating boilerplate code like the enemy. It isn’t.

Sometimes, as in the Point class example above, we don’t need an abstraction. Everyone knows that a point in two-dimensional space has an x coordinate and a y coordinate. Thus, there is no reason to separate the API of a point from its representation. Not only is there no advantage in creating an abstraction of it, doing so may make your code more difficult to understand. In this case, boilerplate code gets in the way of writing code with low cognitive-effort demand. We need the ability to lose our abstraction and benefit from the resulting conciseness of expression. However, it’s important that we lose our abstraction safely.

Let’s us take a look at some alternatives to boilerplate code for creating classes without abstractions, whose APIs and representations are the same. Here’s how we can do this in Groovy:

@Immutable
class Point {
    double x
    double y
}

It doesn’t get much simpler or clear than that. We have a Point with x and y coordinates, and it’s immutable. Keep the immutability in mind, by the way, because we’re going to come back to it. With Groovy’s approach to unabstracted classes, the boilerplate code is still there, you just don’t see it. The compiled bytecode still has the boilerplate in it. The Immutable annotation makes the compiler generate a constructor with x and y arguments and getters (but no setters). Since the compiler is generating the constructor and the getters, the constructor will assign the given x and y values to the corresponding fields and the getters will return x and y, respectively. You don’t have a choice in the matter, having given that up when you decided to use the abbreviated code above. As a result, the API of Point is the same as its representation, so there is no longer an abstraction.

Now that we’ve removed the abstraction, what we have is a named tuple or a transparent state vector. You might ask why we don’t use unnamed tuples like pair types that exist in some languages. We use named tuples because names matter, even though they’re arbitrary.

Kotlin is a language that provides a named tuple construct called a data class. Here’s how we can use it to model a point:

data class Point(val x: Double, val y: Double)

How’s that for brevity? One line of code! Note the val keyword; it means that x and y are immutable. Kotlin doesn’t require this. I could have written var (mutable) instead.

At this point, I want to mention something regarding the expectation that I have for named tuples like Point, which is described by the following equation that in a moment of delusional grandeur, I am calling Bishop’s Named Tuple Principle (BNTP):

p’ = c(d(p)), p’ ≡ p

where p is a named tuple, d is a deconstructor that takes a named tuple and returns an unnamed tuple containing the same values, and c is a constructor that takes one or more values (an unnamed tuple) and returns a named tuple containing those values. The equation means that two named tuples with the same name and same coordinate values are by definition equivalent. In the Point example, it means that two Point objects with the same x and y coordinates are by definition equivalent. This is a reasonable proposition because I can see the entire state of a named tuple and if I see two named tuples with the same state, I should expect them to be equivalent.

There are two major implications that derive from the BNTP: a named tuple must be immutable and its definition must reveal, in Java language architect Brian Goetz’s words, the state, the whole state and nothing but the state (i.e., no hidden representation). It’s possible for a mutable named tuple with hidden state to satisfy the BNTP. However, as Goetz points out, such a named tuple “puts pressure on the alignment between the state and the API,” which could have unexpected and adverse consequences. For example, if a named tuple’s hash code is used to place it in a collection, mutable coordinate values and hidden state could cause the hash code to change, resulting in it disappearing from that collection. Thus, the implications mentioned above must be supported in order for a named tuple to satisfy the BNTP.

The Kotlin data class does not satisfy the BNTP because its state vector components can be mutable and there’s nothing preventing you from adding hidden representation. For example, this is valid Kotlin code:

data class Point(var x: Double, var y: Double) {
    private var quadrant: Int
}

It’s difficult to predict whether two instances of the above data class with the same x and y values will be equivalent because x and y could change at any time and there’s no telling what quadrant‘s value will be. I suspect the creators of Kotlin were looking at a data class from the standpoint of syntax and didn’t fully consider the semantics of their new construct. But I could be wrong. I’m not saying that you shouldn’t use Kotlin data classes, but make sure you use them correctly: immutable and no hidden representation.

The Java 14 release includes a preview of records, which is Java’s take on named tuples. Here is a record version of a Point:

public record Point(double x, double y) {}

It looks fairly similar to a Kotlin data class, but there is a major difference. By definition, the components of a Java record are immutable. By immutable, I’m referring to immutability of primitive type values and object references. Thus, if a record definition includes objects, it is up to you to ensure that the classes of those objects are defined to be immutable1. In addition, you cannot add hidden state or representation to a Java record; any attempt to do so will be flagged by the compiler as an error. Thus, two Point records with the same x and y coordinate values will be equivalent to each other. So Java’s record satisfies the BNTP as long as you ensure the immutability of all objects in the record definition.

To summarize, boilerplate code allows us to create customized abstractions using the constructs in an object-oriented language. It is not a bad thing. Sometimes, we don’t need to create abstractions for certain classes; boilerplate code gets in the way in those cases. Several languages have constructs that allow us to remove abstractions and boilerplate code, but we need to be careful when using those constructs so that we can take full advantage of them without causing problems for ourselves.

1 If a record has a primitive array as part of its definition, you need to add defensive copying to the constructor and the accessor, both of which can be overridden, because there is no way to make a Java primitive array immutable. But don’t include arrays as part of a record; use immutable collections instead.