diff options
author | Rainer Gerhards <rgerhards@adiscon.com> | 2008-08-08 12:51:15 +0200 |
---|---|---|
committer | Rainer Gerhards <rgerhards@adiscon.com> | 2008-08-08 12:51:15 +0200 |
commit | f9a45dc801bf9ef3adb53a9cf697a3d919db048b (patch) | |
tree | 2e2d97bf1ab70e7e8a4a6d7a3699de6ed69b8733 /doc/syslog_protocol.html | |
parent | 0fc05fff8866fd6d083fd2a09ef621b0277797f1 (diff) | |
download | rsyslog-f9a45dc801bf9ef3adb53a9cf697a3d919db048b.tar.gz rsyslog-f9a45dc801bf9ef3adb53a9cf697a3d919db048b.tar.bz2 rsyslog-f9a45dc801bf9ef3adb53a9cf697a3d919db048b.zip |
fixed problematic file name
it lead to invalid link on the web due to web server's rewrite rules.
Thanks to Jan Korbel for spotting the problem.
Diffstat (limited to 'doc/syslog_protocol.html')
-rw-r--r-- | doc/syslog_protocol.html | 196 |
1 files changed, 196 insertions, 0 deletions
diff --git a/doc/syslog_protocol.html b/doc/syslog_protocol.html new file mode 100644 index 00000000..72de5c27 --- /dev/null +++ b/doc/syslog_protocol.html @@ -0,0 +1,196 @@ +<html> +<head> +<title>syslog-protocol support in rsyslog</title> +</head> +<body> +<h1>syslog-protocol support in rsyslog</h1> +<p><b><a href="http://www.rsyslog.com/">Rsyslog</a> provides a trial +implementation of the proposed +<a href="http://www.monitorware.com/Common/en/glossary/syslog-protocol.php"> +syslog-protocol</a> standard.</b> The intention of this implementation is to +find out what inside syslog-protocol is causing problems during implementation. +As syslog-protocol is a standard under development, its support in rsyslog is +highly volatile. It may change from release to release. So while it provides +some advantages in the real world, users are cautioned against using it right +now. If you do, be prepared that you will probably need to update all of your +rsyslogds with each new release. If you try it anyhow, please provide feedback +as that would be most beneficial for us.</p> +<h2>Currently supported message format</h2> +<p>Due to recent discussion on syslog-protocol, we do not follow any specific +revision of the draft but rather the candidate ideas. The format supported +currently is:</p> +<p><b><code><PRI>VERSION SP TIMESTAMP SP HOSTNAME SP APP-NAME SP PROCID SP MSGID SP [SD-ID]s +SP MSG</code></b></p> +<p>Field syntax and semantics are as defined in IETF I-D syslog-protocol-15.</p> +<h2>Capabilities Implemented</h2> +<ul> + <li>receiving message in the supported format (see above)</li> + <li>sending messages in the supported format</li> + <li>relaying messages</li> + <li>receiving messages in either legacy or -protocol format and transforming + them into the other one</li> + <li>virtual availability of TAG, PROCID, APP-NAME, MSGID, SD-ID no matter if + the message was received via legacy format, API or syslog-protocol format (non-present + fields are being emulated with great success)</li> + <li>maximum message size is set via preprocessor #define</li> + <li>syslog-protocol messages can be transmitted both over UDP and plain TCP + with some restrictions on compliance in the case of TCP</li> +</ul> +<h2>Findings</h2> +<p>This lists what has been found during implementation:</p> +<ul> + <li>The same receiver must be able to support both legacy and + syslog-protocol syslog messages. Anything else would be a big inconvenience + to users and would make deployment much harder. The detection must be done + automatically (see below on how easy that is).</li> + <li><b>NUL characters inside MSG</b> cause the message to be truncated at + that point. This is probably a major point for many C-based implementations. + No measures have yet been taken against this. Modifying the code to "cleanly" + support NUL characters is non-trivial, even though rsyslogd already has some + byte-counted string library (but this is new and not yet available + everywhere).</li> + <li><b>character encoding in MSG</b>: is is problematic to do the right + UTF-8 encoding. The reason is that we pick up the MSG from the local domain + socket (which got it from the syslog(3) API). The text obtained does not + include any encoding information, but it does include non US-ASCII + characters. It may also include any other encoding. Other than by guessing + based on the provided text, I have no way to find out what it is. In order + to make the syslogd do anything useful, I have now simply taken the message + as is and stuffed it into the MSG part. Please note that I think this will + be a route that other implementors would take, too.</li> + <li>A minimal parser is easy to implement. It took me roughly 2 hours to add + it to rsyslogd. This includes the time for restructuring the code to be able + to parse both legacy syslog as well as syslog-protocol. The parser has some + restrictions, though<ul> + <li>STRUCTURED-DATA field is extracted, but not validated. Structured data + "[test ]]" is not caught as an error. Nor are any other errors caught. For + my needs with this syslogd, that level of structured data processing is + probably sufficient. I do not want to parse/validate it in all cases. This + is also a performance issue. I think other implementors could have the same + view. As such, we should not make validation a requirement.</li> + <li>MSG is not further processed (e.g. Unicode not being validated)</li> + <li>the other header fields are also extracted, but no validation is + performed right now. At least some validation should be easy to add (not + done this because it is a proof-of-concept and scheduled to change).</li> +</ul> + </li> + <li>Universal access to all syslog fields (missing ones being emulated) was + also quite easy. It took me around another 2 hours to integrate emulation of + non-present fields into the code base.</li> + <li>The version at the start of the message makes it easy to detect if we + have legacy syslog or syslog-protocol. Do NOT move it to somewhere inside + the middle of the message, that would complicate things. It might not be + totally fail-safe to just rely on "1 " as the "cookie" for a syslog-protocol. + Eventually, it would be good to add some more uniqueness, e.g. "@#1 ".</li> + <li>I have no (easy) way to detect truncation if that happens on the UDP + stack. All I see is that I receive e.g. a 4K message. If the message was e.g. + 6K, I received two chunks. The first chunk (4K) is correctly detected as a + syslog-protocol message, the second (2K) as legacy syslog. I do not see what + we could do against this. This questions the usefulness of the TRUNCATE bit. + Eventually, I could look at the UDP headers and see that it is a fragment. I + have looked at a network sniffer log of the conversation. This looks like + two totally-independent messages were sent by the sender stack.</li> + <li>The maximum message size is currently being configured via a + preprocessor #define. It can easily be set to 2K or 4K, but more than 4K is + not possible because of UDP stack limitations. Eventually, this can be + worked around, but I have not done this yet.</li> + <li>rsyslogd can accept syslog-protocol formatted messages but is able to + relay them in legacy format. I find this a must in real-life deployments. + For this, I needed to do some field mapping so that APP-NAME/PROCID are + mapped into a TAG.</li> + <li>rsyslogd can also accept legacy syslog message and relay them in + syslog-protocol format. For this, I needed to apply some sub-parsing of the + TAG, which on most occasions provides correct results. There might be some + misinterpretations but I consider these to be mostly non-intrusive. </li> + <li>Messages received from the syslog API (the normal case under *nix) also + do not have APP-NAME and PROCID and I must parse them out of TAG as + described directly above. As such, this algorithm is absolutely vital to + make things work on *nix.</li> + <li>I have an issue with messages received via the syslog(3) API (or, to be + more precise, via the local domain socket this API writes to): These + messages contain a timestamp, but that timestamp does neither have the year + nor the high-resolution time. The year is no real issue, I just take the + year of the reception of that message. There is a very small window of + exposure for messages read from the log immediately after midnight Jan 1st. + The message in the domain socket might have been written immediately before + midnight in the old year. I think this is acceptable. However, I can not + assign a high-precision timestamp, at least it is somewhat off if I take the + timestamp from message reception on the local socket. An alternative might + be to ignore the timestamp present and instead use that one when the message + is pulled from the local socket (I am talking about IPC, not the network - + just a reminder...). This is doable, but eventually not advisable. It looks + like this needs to be resolved via a configuration option.</li> + <li>rsyslogd already advertised its origin information on application + startup (in a syslog-protocol-14 compatible format). It is fairly easy to + include that with any message if desired (not currently done).</li> + <li>A big problem I noticed are malformed messages. In -syslog-protocol, we + recommend/require to discard malformed messages. However, in practice users + would like to see everything that the syslogd receives, even if it is in + error. For the first version, I have not included any error handling at all. + However, I think I would deliberately ignore any "discard" requirement. My + current point of view is that in my code I would eventually flag a message + as being invalid and allow the user to filter on this invalidness. So these + invalid messages could be redirected into special bins.</li> + <li>The error logging recommendations (those I insisted on;)) are not really + practical. My application has its own error logging philosophy and I will + not change this to follow a draft.</li> + <li>Relevance of support for leap seconds and senders without knowledge of + time is questionable. I have not made any specific provisions in the code + nor would I know how to handle that differently. I could, however, pull the + local reception timestamp in this case, so it might be useful to have this + feature. I do not think any more about this for the initial proof-of-concept. + Note it as a potential problem area, especially when logging to databases.</li> + <li>The HOSTNAME field for internally generated messages currently contains + the hostname part only, not the FQDN. This can be changed inside the code + base, but it requires some thinking so that thinks are kept compatible with + legacy syslog. I have not done this for the proof-of-concept, but I think it + is not really bad. Maybe an hour or half a day of thinking.</li> + <li>It is possible that I did not receive a TAG with legacy syslog or via + the syslog API. In this case, I can not generate the APP-NAME. For + consistency, I have used "-" in such cases (just like in PROCID, MSGID and + STRUCTURED-DATA).</li> + <li>As an architectural side-effect, syslog-protocol formatted messages can + also be transmitted over non-standard syslog/raw tcp. This implementation + uses the industry-standard LF termination of tcp syslog records. As such, + syslog-protocol messages containing a LF will be broken invalidly. There is + nothing that can be done against this without specifying a TCP transport. + This issue might be more important than one thinks on first thought. The + reason is the wide deployment of syslog/tcp via industry standard.</li> +</ul> +<p><b>Some notes on syslog-transport-udp-06</b></p> +<ul> + <li>I did not make any low-level modifications to the UDP code and think I + am still basically covered with this I-D.</li> + <li>I deliberately violate section 3.3 insofar as that I do not necessarily + accept messages destined to port 514. This feature is user-required and a + must. The same applies to the destination port. I am not sure if the "MUST" + in section 3.3 was meant that this MUST be an option, but not necessarily be + active. The wording should be clarified.</li> + <li>section 3.6: I do not check checksums. See the issue with discarding + messages above. The same solution will probably be applied in my code.</li> +</ul> +<p> </p> +<h2>Conlusions/Suggestions</h2> +<p>These are my personal conclusions and suggestions. Obviously, they must be +discussed ;)</p> +<ul> + <li>NUL should be disallowed in MSG</li> + <li>As it is not possible to definitely know the character encoding of the + application-provided message, MSG should <b>not</b> be specified to use UTF-8 + exclusively. Instead, it is suggested that any encoding may be used but + UTF-8 is preferred. To detect UTF-8, the MSG should start with the UTF-8 + byte order mask of "EF BB BF" if it is UTF-8 encoded (see section 155.9 of + <a href="http://www.unicode.org/versions/Unicode4.0.0/ch15.pdf"> + http://www.unicode.org/versions/Unicode4.0.0/ch15.pdf</a>) </li> + <li>Requirements to drop messages should be reconsidered. I guess I would + not be the only implementor ignoring them.</li> + <li>Logging requirements should be reconsidered and probably be removed.</li> + <li>It would be advisable to specify "-" for APP-NAME is the name is not + known to the sender.</li> + <li>The implications of the current syslog/tcp industry standard on + syslog-protocol should be further evaluated and be fully understood</li> +</ul> +<p> </p> +</body> +</html> + |