Logging best practices; or how to ruin your day off

Logging is a corner stone of any good application. Believe it or not, your program is going to fail, and it is going to fail hard. The only thing to save you here are the logs you created, in order to figure out exactly what went wrong and how to fix it. Believe me when I say this: there is nothing worse then getting customers calling in, complaining about some functionality on a service that has no logging.

In this article, I’ll be talking about what logging is, how I log, what I log and how I think I can improve. I’ll also cover some best practices, and some things I have seen that I think I really should never have seen. I won’t tell you though, If I am the reason for them ever existing. Gotta keep an air of mystery.

What is logging?

This is probably the reason why you are here. You asked yourself, what is logging?, typed that in Google, and voila, here you are. Logging is the process of saving information about how your how your program is running, so that you can trace it’s execution path whenever something goes on. The more information you have, the easier it is to trace that path, and by extension to recreate the problem.

There are a lot of different places to save that information. This is also referred to as a Sink. Some of the most popular ones are as follows:

Console
File
Some database
A centralized log management system, such as Graylog or Seq

Currently at work, we log into a a local file and a database. Files are pulled by a Graylog agent regularly and imported into the centralized system. They then get deleted from the local disk every time we have a release. I’m not that big a fan of the database log my self, since I very rarely use it anymore after we got Graylog up and running. It just feels like a waste of space and a performance hit. I wish that at some point we can drop the database logs and skip the agent and just log directly to some Graylog sink. I think that keeping the files until the next release is not a bad idea, in case something goes wrong.

What should I log?

You should honestly log as much as possible about where you are and the context around that given location. Exceptions should of course always be logged, since they provide great stack traces as well as meaningful errors. Log as many variables as possible that are currently in whatever scope you have access to. That null reference exception you just got would be a lot easier to debug if you actually knew which one it was talking about.

Machine names are actually also quite useful if you aren’t running your server side applications purely in containers. This might lead to the discovery that some of your environments are misconfigured.

Logger Names

Loggers also always have a name, or a category, related to them. This is usually seen when you create them through a logger factory like logFactory.Create(“InvoiceController”) or getting a ILogger<InvoiceController> injected as a dependency. Usually leaving the full name of the class as a name is a great idea, it makes it much easier to find where that log entry actually got logged in the code. Be aware though, that injecting a ILogger, instead of a factory or a generic ILogger, makes it much easier to test your code. It makes mocking up your dependencies much easier and cleaner. Register the ILogger with some contextual binding, where it is resolved with the factory or with a generic ILogger, based on the calling class.

System usage

As a final note, log how your system is used! At some (probably many) point(s) during your career, you are going to make some service you want to deprecate for whatever reason. Can you really afford just shutting of any of the users to that system? See how they use it and how much it is used. This will make it much easier for you to put together a strategy of getting them off your system in a graceful way.

How should I log?

There are a lot of different things that you should do when writing your logs, for the simple reason of making your life easier when going through them. Let’s go through two of the most impactful ones in my opinion.

Structured Logging

You should start out by writing structured logs. This is where contextual data is logged in a structured and consistent way. Lets have a quick look at how it’s done in a unstructured way.

log.Information($"User {user.Username} ({user.ID}) just got deleted");

String interpolation would kick in and embed the variable value into the string itself. This would make it hard to perform queries on these kinds of log entries, since the interesting values are located in the middle of the message it self. Not only that, but it makes it hard to search for any logs that act on the Username or ID field itself. By using structured logging, we can do the following:

var user = new { Username = "ericvruder", ID = 1 };
log.Information("User {Username} just got deleted", user);

Here, the logging framework got to format the message itself. The message looks like this:

{
    "@t":"2019-08-31T13:37:27.7908834Z",
    "@mt":"User {Username} just got deleted",
    "Username":"{ Username = ericvruder, ID = 1 }"
}

This kind of logging is usually most helpful if developers agree on a given variable naming standard, so that it’s easier to find specific fields. Not only that, but we can now simply search for “User {Username} just got deleted” on our log management system. This would give us all the entries, regardless of the values of the variables.

Note that not all logging frameworks support structured logging. Serilog and NLog do, but depending on which framework you use it might come bundled with a performance hit.

Logging levels

There are 5 main levels of logging that you should use.

Debug: Used for debugging issues. Should not be on in production by default
Information: Used for observing the current state and flow of the system.
Warnings: Used for abnormal events, that should be looked at, but did not halt the execution of whatever activity the application was currently processing.
Errors: Used for noting events that halt the execution of the current activity. This could be a web request or a scheduled job.
Fatal: Used for events that caused the application to crash or that need immediate attention.

Choosing the correct log level to use is critical for maintaining your sanity. Just because you think your scheduled job is critical, doesn’t mean you should smack a fatal on it. In case it happened 04:00 in the morning, do you really need to take action right now? Or can it wait until you get to work? This might not be the case, but most of the time it isn’t.

Remember that sometimes, the only reason that error is a fatal is because you tagged it as a fatal. I’ve done that my self, where my Sunday was ruined since our server side application started casting fatals. The only reason I needed to react was because I had decided to make them fatals.

Debug level can be turned on in order to help you track bugs in production. It should not be on by default due to the sheer volume of data it generates.

How much should I log?

As much as possible actually. Just be aware of certain things when adding in those log records.

Notification fatigue: I recently saw this incredibly interesting talk by Microsoft’s head of security, Josh Brown-White, where he discusses notification fatigue in a static analyses context. I really recommend watching it. The gist of it is that you should be aware of how much you are actually logging, and how much of it is actionable. This goes back to using the correct logging levels: Don’t just use a level you think is correct. Follow the standards, for your colleagues, as well as your own, sake.
Be aware of adding error logs on iterators! If the code is not able to discard whatever element ended up causing that error, you are going end up with a log filled with the same error log, over and over again.
This might be a bit controversial, but try not to catch an exception, log it, and throw it on. This usually leads to the same error being logged multiple times, in different places. This is because at some point in the application, there will most likely be another piece of code that does exactly that. This is in order to keep the application running. Doing this usually just adds extra, unnecessary data to sift through.

Conclusion

I actually believe that the most important thing to take into consideration when logging is the usage of the correct log levels. Using the above standard will safeguard yours and your colleagues sanity. By over escalating any log level, you might be causing severe unnecessary stress, especially if you are on vacation.

I would also recommend reading my other article about exception handling. That just might cut down on the amount of logs you actually have to write!