The Syslog Hell
Syslog. You’ve probably heard about that, especially if you are into monitoring or security. Syslog is perceived to be the common, unified way that systems can send logs to other systems. Linux supports syslog, many network and security appliances support syslog as a way to share their logs. On the other side, a syslog server is receiving all syslog messages. It sounds great in theory – having a simple, common way to represent logs messages and send them across systems.
Reality can’t be further from that. Syslog is not one thing – there are multiple “standards”, and each of those is implemented incorrectly more often than not. Many vendors have their own way of representing data, and it’s all a big mess.
First, the RFCs. There are two RFCs – RFC3164 (“old” or “BSD” syslog) and RFC5424 (the new variant that obsoletes 3164). RFC3164 is not a standard, while RFC5424 is (mostly).
Those RFCs concern the contents of a syslog message. Then there’s RFC6587 which is about transmitting a syslog message over TCP. It’s also not a standard, but rather “an observation”. Syslog is usually transmitted over UDP, so fitting it into TCP requires some extra considerations. Now add TLS on top of that as well.
Then there are content formats. RFC5424 defines a key-value structure, but RFC 3164 does not – everything after the syslog header is just a non-structured message string. So many custom formats exist. For example firewall vendors tend to define their own message formats. At least they are often documented (e.g. check WatchGuard and SonicWall), but parsing them requires a lot of custom knowledge about that vendor’s choices. Sometimes the documentation doesn’t fully reflect the reality, though.
Instead of vendor-specific formats, there are also de-facto standards like CEF and the less popular LEEF. They define a structure of the message and are actually syslog-independent (you can write CEF/LEEF to a file). But when syslog is used for transmitting CEF/LEEF, the message should respect RFC3164.
And now comes the “fun” part – incorrect implementations. Many vendors don’t really respect those documents. They come up with their own variations of even the simplest things like a syslog header. Date formats are all over the place, hosts are sometimes missing, priority is sometimes missing, non-host identifiers are used in place of hosts, colons are placed frivolously.
Parsing all of that mess is extremely “hacky”, with tons of regexes trying to account for all vendor quirks. I’m working on a SIEM, and our collector is open source – you can check our syslog package. Some vendor-specific parsers are yet missing, but we are adding new ones constantly. The date formats in the CEF parser tell a good story.
If it were just two RFCs with one de-facto message format standard for one of them and a few option for TCP/UDP transmission, that would be fine. But what makes things hell is the fact that too many vendors decided not to care about what is in the RFCs, they decided that “hey, putting a year there is just fine” even though the RFC says “no”, that they don’t really need to set a host in the header, and that they didn’t really need to implement anything new after their initial legacy stuff was created.
Too many vendors (of various security and non-security software) came up with their own way of essentially representing key-value pairs, too many vendors thought their date format is the right one, too many vendors didn’t take the time to upgrade their logging facility in the past 12 years.
Unfortunately that’s representative of our industry (yes, xkcd). Someone somewhere stitches something together and then decades later we have an incomprehensible patchwork of stringly-typed, randomly formatted stuff flying around whatever socket it finds suitable. And it’s never the right time and the right priority to clean things up, to get up to date, to align with others in the field. We, as an industry (both security and IT in general) are creating a mess out of everything. Yes, the world is complex, and technology is complex as well. Our job is to make it all palpable, abstracted away, simplified and standardized. And we are doing the opposite.
Syslog. You’ve probably heard about that, especially if you are into monitoring or security. Syslog is perceived to be the common, unified way that systems can send logs to other systems. Linux supports syslog, many network and security appliances support syslog as a way to share their logs. On the other side, a syslog server is receiving all syslog messages. It sounds great in theory – having a simple, common way to represent logs messages and send them across systems.
Reality can’t be further from that. Syslog is not one thing – there are multiple “standards”, and each of those is implemented incorrectly more often than not. Many vendors have their own way of representing data, and it’s all a big mess.
First, the RFCs. There are two RFCs – RFC3164 (“old” or “BSD” syslog) and RFC5424 (the new variant that obsoletes 3164). RFC3164 is not a standard, while RFC5424 is (mostly).
Those RFCs concern the contents of a syslog message. Then there’s RFC6587 which is about transmitting a syslog message over TCP. It’s also not a standard, but rather “an observation”. Syslog is usually transmitted over UDP, so fitting it into TCP requires some extra considerations. Now add TLS on top of that as well.
Then there are content formats. RFC5424 defines a key-value structure, but RFC 3164 does not – everything after the syslog header is just a non-structured message string. So many custom formats exist. For example firewall vendors tend to define their own message formats. At least they are often documented (e.g. check WatchGuard and SonicWall), but parsing them requires a lot of custom knowledge about that vendor’s choices. Sometimes the documentation doesn’t fully reflect the reality, though.
Instead of vendor-specific formats, there are also de-facto standards like CEF and the less popular LEEF. They define a structure of the message and are actually syslog-independent (you can write CEF/LEEF to a file). But when syslog is used for transmitting CEF/LEEF, the message should respect RFC3164.
And now comes the “fun” part – incorrect implementations. Many vendors don’t really respect those documents. They come up with their own variations of even the simplest things like a syslog header. Date formats are all over the place, hosts are sometimes missing, priority is sometimes missing, non-host identifiers are used in place of hosts, colons are placed frivolously.
Parsing all of that mess is extremely “hacky”, with tons of regexes trying to account for all vendor quirks. I’m working on a SIEM, and our collector is open source – you can check our syslog package. Some vendor-specific parsers are yet missing, but we are adding new ones constantly. The date formats in the CEF parser tell a good story.
If it were just two RFCs with one de-facto message format standard for one of them and a few option for TCP/UDP transmission, that would be fine. But what makes things hell is the fact that too many vendors decided not to care about what is in the RFCs, they decided that “hey, putting a year there is just fine” even though the RFC says “no”, that they don’t really need to set a host in the header, and that they didn’t really need to implement anything new after their initial legacy stuff was created.
Too many vendors (of various security and non-security software) came up with their own way of essentially representing key-value pairs, too many vendors thought their date format is the right one, too many vendors didn’t take the time to upgrade their logging facility in the past 12 years.
Unfortunately that’s representative of our industry (yes, xkcd). Someone somewhere stitches something together and then decades later we have an incomprehensible patchwork of stringly-typed, randomly formatted stuff flying around whatever socket it finds suitable. And it’s never the right time and the right priority to clean things up, to get up to date, to align with others in the field. We, as an industry (both security and IT in general) are creating a mess out of everything. Yes, the world is complex, and technology is complex as well. Our job is to make it all palpable, abstracted away, simplified and standardized. And we are doing the opposite.
Hi, you got the RFC number wrong. You wrote RFC 5254 instead of 5424, three times. Cheers.
It’s even worse than you describe.
RHEL (and likely other distros) have the most infuriating syslog, in that it logs the hostname to the *local* log files, but does *not* log the hostname in messages sent over the network (use tcpdump to see this). This breaks relaying as the syslog server (to which the relay redirects) assumes the log is from the IP address it receives the syslog message from when logging it to a file, which blinds you to the actual source.
Add to this problem infuriating configurations from megacorporations that just don’t care to fix anything, like IBM (WebSphere) and Oracle (their DB). These companies use two (or more) word fields *describing* the daemon instead of using the daemon’s name (per RFC/standard). What you end up with (with this combination of dumb misconfigurations) is a syslog message that is interpreted as coming from the host $FIRSTWORD with the daemon $SECONDWORD.
Working around this means custom configurations on the relay for each one of these applications to add in the hostname and change the daemon name to a single word so the SIEM that ultimately receives the logs can actually properly parse it.
This is on top of the custom config required to work around the first problem, itself (adding in a hostname at the relay before relaying the message to the syslog server).
Don’t get me wrong, I love syslog… it’s actually fantastic at what it does, when used properly. It’s a love/hate relationship due to vendors like IBM and Oracle, and implementations like the utterly b0rk3d one in Linux (Solaris’s doesn’t drop the hostname from network packets, BTW) that sour the milk.
I suspect that *many* years ago, someone (working on the daemon used by Linux) said “We don’t need the hostname logged locally, but we can keep it on the network packets for relaying” and got the logic inverted, and no-one ever got them to fix it. It’s hard to believe anyone ever thought this was a good idea.
I’m assuming that it’s not on the network packets because the receiver should be tagging that log line with the connection information rather than trusting what’s in the log.
@Colin Hines:
The (actual) standard RFC states the hostname field should be populated:
https://datatracker.ietf.org/doc/html/rfc5424#page-13
Relays do not add this field, they only relay. You have to use a more advanced daemon (rsyslog or syslog-ng) to add the logic to insert a value where none exists, rather than simply relaying.
Did you check PCap-NG specification? https://wiki.wireshark.org/Development/PcapNg
There is a mention on the new syslog format. I believe it should be supported by syslogng and journald.
I have feels for this post as I’ve authored Parse::Syslog::Line for the CPAN.
This is especially fun: https://metacpan.org/release/Parse-Syslog-Line/source/lib/Parse/Syslog/Line.pm#L116
There’s some interesting things. Cisco will log the NTP status of the device by a single character between the year and month in their date format.
The python syslog library does not append a colon to the syslog tag (often program name). It’s true the RFC doesn’t specify it, but it’s also true that we live in a world where 99.999% of the other stuff on my Linux, FreeBSD, and OpenBSD systems append a colon after the tag to separate the program name from the message content. Talk to a python dev about that and they’ll regale you with how the library is technically RFC compliant, while making the default spew unparseable non-sense to logs and generally making the world a terrible place by not s/:?$/:/ on the syslog tag. Thanks, python. 🙂
That’s “awesome”, thanks for sharing
Well, people are content with legacy cruft – as long as it only needs to be touched once a month. Hence the lack of standardization from vendors. There’s simply more inertia than demand. Also, the protocol deviations are irrelevant, it’s indeed the haphazard message/string formatting. And you can’t really migrate to more structured formats until the legacy ones are fully documented. (Gave it a try recently, https://pypi.org/project/logfmt1/, but I doubt there’s any much interest. And the Cloud logging services pretty much just confound the issue IMO)
nice