The importance of observability
Modern software systems are becoming more distributed, and running on less reliable infrastructure, thanks to current trends like Microservices and Cloud. Building observability into our systems has always been necessary, and these trends are making it more critical than ever.
But, how do we add observability to our systems without clogging up our codebase with instrumentation logic? And if this instrumentation is important, how do we test it? In this article, I’ll show how a pattern called Semantic Instrumentation can help, by treating observability as a first-class concept within our code.
What to observe
“Observability” has a broad scope, from low-level technical metrics through to high-level business KPIs. On the technical end of the spectrum we can track things like memory and CPU utilization, network and disk IO, thread counts and GC pauses. On the other end of the spectrum, our business/domain metrics might track things like cart abandonment rate, session duration, or payment failure rate.
Because these higher-level metrics are specific to each system, they usually require hand-rolled instrumentation logic. This is in contrast to lower-level technical instrumentation, which is more generic and can often be achieved without much modification to a system’s codebase beyond perhaps injecting some sort of monitoring agent at boot time.
It’s also important to note that higher-level, product-oriented metrics are more valuable, since by definition they more closely reflect how well the system is delivering against it’s intended purpose.
The problem with observability
So product-oriented observability is important, but usually requires hand-rolled instrumentation logic. That custom instrumentation lives right alongside the core domain logic of our systems - where clear, maintainable code is vital. Unfortunately, instrumentation code tends to be noisy, and if we’re not careful it can lead to a distracting mess.
Let’s see an example of the kind of mess that the introduction of instrumentation code can cause. Here’s a hypothetical e-commerce system’s (somewhat naive) discount code logic, prior to any observability being added:
I’d say we have some clearly-expressed domain logic here. We look up a discount based on a discount code, then apply the discount to the cart. Finally, we return the amount that was discounted.
This application of discounts to a cart is a key feature. It’s important that we have good observability here, so let’s add some instrumentation:
In addition to performing the actual business logic of looking up and applying a discount, we are now also calling out to various instrumentation systems. We’re logging some diagnostics for developers, we’re recording some metrics for the people operating this system in production, and we’re also publishing an event into our analytics platform for use by product and marketing folks.
Unfortunately, adding observability has made a mess of our nice clean domain logic. We now have only 25% of the code in our
applyDiscountCode method involved in its stated purpose of looking up and applying a discount. The clean business logic that we started out hasn’t changed, and remains clear and concise, but it’s lost amongst the low-level instrumentation code which now takes up the bulk of the method. We’ve introduced code duplication and magic strings into the middle of our domain logic.
In short, our instrumentation code is a huge distraction to anyone trying to read this method and see what it actually does.
Cleaning up the mess
Let’s see if we can clean up this mess, by refactoring our implementation. First of all, we’ll extract that icky low-level instrumentation logic into seperate methods:
This is a good start. It’s easier to read and understand
applyDiscountCode now that the distracting details of the various instrumentation systems have been pushed down into those
However, it doesn’t seem right that
ShoppingCart now has a bunch of private methods that are entirely focused on instrumentation - that’s not really
ShoppingCart’s responsibility. A cluster of functionality within a class which is unrelated to that class’s primary responsibility is often an indication that there’s a new class trying to emerge.
Let’s follow that hint by gathering up those instrumentation methods and moving them out into their own
We now have a nice, clear separation of responsibilities -
ShoppingCart is entirely focused on domain concepts like applying discounts, while our new
DiscountInstrumentation class encapsulates all the details of instrumenting the process of applying a discount.
DiscountInstrumentation is an example of a pattern I call Semantic Instrumentation. Semantic Instrumentation involves encapsulating low-level instrumentation plumbing within a specific class. That class exposes a higher-level instrumentation API which is oriented around domain semantics. This enables us to add observability to domain logic, while still talking in the language of the domain, avoiding the distracting details of the instrumentation technology. In our example above, our semantically instrumented
ShoppingCart implemented observability by reporting discount codes being applied and discount code lookups failing (domain language), rather than writing log entries or tracking analytics events (technical language). This may seem a subtle distinction, but keeping domain code focused on the domain is a valuable part of keeping a codebase readable, maintainable and extensible.
It’s rare to see good test coverage of instrumentation logic. I don’t often see automated tests that verify that an error is logged if an operation fails, or that an analytics event containing the correct fields is published when a conversion occurs. This is perhaps partially due to observability historically being regarded as less valuable, but it’s also because it’s a pain to write good tests for low-level instrumentation code.
Testing instrumentation code is a pain
To demonstrate, let’s look at some instrumentation for a different part of our hypothetical e-commerce system and see how we might write some tests that verify the correctness of that instrumentation code.
ShoppingCart has a well-instrumented method for adding an item to the cart:
Let’s look at how we might start to test this instrumentation logic:
In this test, we’re setting up a shopping cart for test, wired up with a spy logger (a “spy” is a type of test double used to verify how our test subject is interacting with other objects). In case you’re wondering,
testableShoppingCart is just a little helper function that creates an instance of
ShoppingCart with faked out dependencies by default. With our spy in place, we call
shoppingCart.addToCart(...), and then check that the shopping cart used the logger to log an appropriate message.
As written, this test does provide reasonable assurance that we are logging when products are added to a cart. However, it is very much coupled to the details of that logging. If we decided to change the format of the log message at some point in the future we’d break this test for no good reason. This test shouldn’t be concerned with the exact details of what was logged, just that something was logged with the correct contextual data.
We could try and reduce how tightly the test is coupled to the details of the log message format by matching against a regular expression instead of an exact string. However, this would make the validation a little opaque. Additionally, the effort required to craft a robust regex is usually a poor investment of time.
Moreover, this was just a simple example of testing how things are logged. More complex scenarios (i.e. logging exceptions) are even more of a pain - the APIs of logging frameworks and their ilk don’t lend themselves to easy verification when they’re being mocked out.
Let’s move on and take a look at another test, this time verifying our analytics integration:
This test is a little more involved as we need to control the product which is passed back to the shopping cart from
productService.lookupProduct(...), which means we need to inject a stub product service which is rigged to always return a specific product. We also inject a spy
analytics, just as we injected a spy
logger in our previous test. Once that’s all set up we call
shoppingCart.addToCart(...), and then finally verify that our analytics instrumentation was asked to create an event with the expected parameters.
I’m reasonably happy with this test. It’s a bit of a pain to get that product sent into the cart as indirect input, but that’s an acceptable tradeoff in order to gain confidence that we include that product’s sku in our analytics event. It’s also a bit of a shame that our test is coupled to the exact format of that event - as with our logging test above, I’d prefer that this test didn’t care about the details of how observability is achieved, just that it’s being done using the correct data.
After completing that test, I’m daunted by the fact that if I want to also test the other bits of instrumentation logic - the
shopping-cart-size metric gauges - I’ll have to create 2 or 3 extra tests which will look extremely similar to this one. Each test will need to go through the same fiddly dependency setup work, even though that’s not the focus of the test. Some developers, when confronted with this task, would grit their teeth, copy-paste the existing test, change what needs to be changed, and go on with their day. In reality, many developers would decide that the first test is good enough, and risk a bug being introduced in our instrumentation logic later on (a bug which could go unnoticed for a while, since broken instrumentation is not always immediately apparent).
Semantic Instrumentation yields cleaner, more focused tests
Let’s see how using Semantic Instrumentation can improve the testing story. Here’s our
ShoppingCart again, now refactored to use Semantic Instrumentation:
and here are the tests for the instrumentation of
The application of Semantic Instrumentation has raised the level of abstraction a little, making the code and the tests a bit easier to read, as well as less brittle. We’re still testing that instrumentation has been implemented correctly - in fact, our tests now completely verify our observability requirements - without needing to go into the details of how the instrumentation is implemented. Our tests capture the essential complexity of adding observability without dragging in too much accidental complexity.
It would still be wise to verify that the mucky lower-level instrumentation details are correctly implemented though. We achieve that with tests of our instrumentation class:
Here again our tests can become a little more focused. We can pass
product in directly rather than doing the earlier indirect injection dance via the mocked out
productService in our
Since our tests for
ShoppingCartInstrumentation are focused on how that class uses the 3rd party instrumentation libraries, we can make our tests a little terser by using a
before block to set up pre-wired spies for those dependencies:
Our tests are now very clear and focused. Each test verifies that one specific part of our low-level technical instrumentation is correctly triggered as part of a higher-level semantic instrumentation event. The tests capture the intent of Semantic Instrumentation - presenting a higher-level abstraction over the boring technical details of our various instrumentation systems.
When to apply Semantic Instrumentation
This is a useful pattern; where should we apply it? My recommendation is always to use Semantic Instrumentation when adding observability to domain code - the areas of your codebase that are focused on business logic, as opposed to technical plumbing. Using Semantic Instrumentation keeps that domain code decoupled from the technical details of your instrumentation infrastructure. Additionally, the types of observability you’ll be adding within your domain code are typically the more valuable product-oriented metrics. It’s worth investing in the more rigorous approach of Semantic Instrumentation here.
A simple rule to follow is that your domain classes should never have a direct reference to any instrumentation systems, only to semantic instrumentation classes that abstract over those systems.
Semantic Instrumentation isn’t something I invented or personally discovered. As with any pattern write-up, I’m just documenting a practice that I’ve seen various teams apply over the years, and which many other teams have undoubtedly used elsewhere.
I think either Andrew Kiellor or Toby Clemson first introduced these ideas to me while we were on a ThoughtWorks project - I believe under the name Semantic Logging - and I’m sure that this concept has been doing the rounds within the broader ThoughtWorks hivemind for a long time.
I haven’t seen this pattern explicitly called out before - hence this article. The closest analog I could find is the Semantic Logging Application Block from Microsoft’s Patterns & Practices group. From what I can tell, their take on Semantic Logging is a concrete library which makes it easier to do structured logging within .NET applications. That’s a much narrower focus than the Semantic Instrumentation pattern I have been describing here.