Logging - A Primer

Logging is the primary user interface for supporting and troubleshooting your system. It is critical that you get it right. I know this because I have gotten it wrong often enough. If you move to microservice architectures and containers, the criticality only increases because errors and anomalies are as distributed as your runtime. Here are some opinionated recommendations about how to not get it wrong.

Do not create your own logging framework. This is a solved problem and you have better things to work on ¹ . For most language ecosystems, there are probably multiple logging frameworks that you can choose. Pick one that is well supported or frequently used for your platform. Ideally, it will have the following capabilities:

Levels
Appenders
Layout formatting
Configurability
Runtime Modification

Levels allow you to log messages based on the importance of the information. Levels are additive, so if you have the DEBUG level turned on you get that plus everything more important. Most frameworks allow for five or more levels but in practice I’ve found two or three to be sufficient for most of my needs. Appenders define where the log information is sent. Again, most frameworks have a number of capabilities but I’ve found console, file, and syslog appenders to cover most use cases. Layout formatting allows you to specify the message structure and include details like thread information, timestamp, source location, and additional context info. Note that some of this can be expensive to calculate. Configurability is simply the ability to adjust all of these details, usually by way of a config file, environment variables or code. Runtime modification means you can update the configuration without restarting your application. Depending on your use case, this might fall in the ‘nice to have’ category.

Picking your logging framework is the easy part. Now comes time to use it. Writing log messages should not be an afterthought. How you do this could make the difference between a critical outage being solved in minutes vs. it taking hours of chasing red herrings and rebooting the world. Clarity, conciseness and consistency matter. It’s a balancing act of conveying the necessary information without overwhelming the reader with the nonessential. This complicated by the fact that what’s necessary and what’s nonessential might vary depending on the circumstances. When authoring log messages, consider the following factors.

Context.

If you are writing a log message, it means you have something to say about the state of the program. As with all conversation, those of us on the other side really need for what you say to be useful and interesting. What isn’t useful is context-free information. Especially if that requires me to have a copy of the code open on the screen as well. It’s especially fun to turn on a lower logging level and see different varieties of “here” messages provide absolutely no assistance. No, my fellow programmer, you are not here. If you are, then you need to take over at this point because a problem has arisen and you did not have the foresight to make this easier. You had something to say when you wrote this, but you didn’t take into account that a future reader might not have the necessary level of familiarity to make sense of your babbling.

Noise

Your logging framework has log levels for a reason. Even if your message passes the context test, logging it at the wrong level isn’t helpful. Now you’ve just added details that we have to wade through to find useful information. Make sure that messages you log for a certain threshold are at the same approximate level of importance. You should be able to justify why a message exists and why it needs to be at the level you’ve set it at. And when I say justify it, I mean to an irritable systems engineer at 3:00 in the morning - the kind that maybe has personal problems and knows where you live.

While we’re talking about noise, let’s talk about exceptions (I come from a Java background so this is something of a special annoyance of mine). Error handling is important but no one wants to skip past the same stack trace repeatedly because you decided to log the error and then throw it up the stack five different times. Either handle it or throw it, but do not log the exception with the stack trace and then throw the same or another exception. Logging that the error occurred along with additional context might be useful in the immediate vicinity of the problem, but that doesn’t mean logging 100 line stack traces in different catch blocks. Especially do not do this in a loop, ad infinitum where this nonsense will happen so much that your log aggregator ends up throttling your account because they are reasonable people and you clearly don’t know what you’re doing.

Searchable

Large systems might produce a lot of logging information, even if you’ve logged things at the appropriate level to minimize the noise. In that case, additional filtering is needed to track down problems. The logging infrastructure can provide some of this. Details like the thread ID can be included in every message. Log aggregation systems should include useful host information and source categorization. Common failure modes can be given explicit error codes that can be searched for. If the system operates on objects with unique identifiers then those can be included as context in every message. Requests to the system can be assigned an identifier as well. Sophisticated architectures can pass said request ID through a distributed system to allow for powerful tracing capabilities.

It’s possible to have monitoring tools do log scraping to look for error codes or other uniquely identifiable context that you’ve created. You probably want to avoid alerting on common words that might occur when you aren’t expecting it. Make it explicit if you can. Note, this is only one aspect of a comprehensive monitoring set up. It is not sufficient to just do log scanning.

Performance

Don’t incur performance penalties that you don’t have to. Logging ends up being a circumstance where that can happen because you invoke unnecessary string concatenations for messages that won’t actually be displayed (because that log level is not enabled). In most cases, this isn’t something that you will have to worry about. Even so, it’s good practice to avoid doing it at all. That way you won’t accidentally do it inside a frequently executed loop that has real-world impact. Avoid explicit concatenation by using placeholders in log messages (if supported by your framework). Alternately, wrap the concatenation and logging of the message in a conditional that checks to see if a log level is enabled before doing the work.

Security

It should go without saying that you don’t log anything that has potential security ramifications. Should being the operative word. Of course, some people still do it so let me make it explicit - don’t log anything that has potential security ramifications. Need more explicit advice? No passwords, no credit card numbers, no SSNs. Same goes for addresses, name and net worth, or any other personally identifying information that could be problematic if released to everyone. And since you have logging, use that for capturing a stack trace (so avoid calling printStackTrace() on an exception in Java). Don’t return a stack trace in an error response to the user, this can reveal details about your system that a legitimate user won’t care about and an illegitimate one can use to their advantage.

Some of these considerations can be in conflict. This is to be expected in a pursuit that is all about tradeoffs (such as programming). Calling toString() on an object and writing a formatted string of two thousand characters to the log might provide context, but it also adds noise and might be insecure. You’ll have to find the balance between these competing factors that enables you to support your system when things go sideways. As a last request, make your messages readable and use accurate language. This should not be that big of a challenge if you are already writing good code - which is to say readable, essential and accurate. Logs might be processed by other code, but when it comes time for humans to read, useful information, white space, and concise, understandable words are greatly appreciated.

Unless you are working on some brand new language / ecosystem, or you have unusual constraints, or you are in learning mode and your goal is to understand how to create logging systems. ↩︎

2018.09.12 08:00