diff options
Diffstat (limited to 'doc/lookup_tables.html')
-rw-r--r-- | doc/lookup_tables.html | 205 |
1 files changed, 205 insertions, 0 deletions
diff --git a/doc/lookup_tables.html b/doc/lookup_tables.html new file mode 100644 index 00000000..d72810f1 --- /dev/null +++ b/doc/lookup_tables.html @@ -0,0 +1,205 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html><head> +<title>Lookup Tables</title> +</head> + +<body> +<h1>Lookup Tables</h1> + +<p><b><font color="red">NOTE: this is</font> proposed functionality, which is +<font color="red">NOT YET IMPLEMENTED</font>!</b> + +<p><b>Lookup tables</a> are a powerful construct +to obtain "class" information based on message content (e.g. to build +log file names for different server types, departments or remote +offices).</b> +<p>The base idea is to use a message variable as an index into a table which then +returns another value. For example, $fromhost-ip could be used as an index, with +the table value representing the type of server or the department or remote office +it is located in. A main point with lookup tables is that the lookup is very fast. +So while lookup tables can be emulated with if-elseif constructs, they are generally +much faster. Also, it is possible to reload lookup tables during rsyslog runtime without +the need for a full restart. +<p>The lookup tables itself exists in a separate configuration file (one per table). This +file is loaded on rsyslog startup and when a reload is requested. +<p>There are different types of lookup tables: +<ul> +<li><b>string</b> - the value to be looked up is an arbitrary string. Only exact +some strings match. +<li><b>array</b> - the value to be looked up is an integer number from a consequtive set. +The set does not need to start at zero or one, but there must be no number missing. So, for example +5,6,7,8,9 would be a valid set of index values, while 1,2,4,5 would not be (due to missing +2). +A match happens if the requested number is present. +<li><b>sparseArray</b> - the value to be looked up is an integer value, but there may +be gaps inside the set of values (usually there are large gaps). A typical use case would +be the matching of IPv4 address information. A match happens on the first value that is +less than or equal to the requested value. +</ul> +<p>Note that index integer numbers are represented by unsigned 32 bits. +<p>Lookup tables can be access via the lookup() built-in function. The core idea is to +set a local variable to the lookup result and later on use that local variable in templates. +<p>More details on usage now follow. +<h2>Lookup Table File Format</h2> +<p>Lookup table files contain a single JSON object. This object contains of a header and a +table part. +<h3>Header</h3> +<p>The header is the top-level json. It has paramters "version", "nomatch", and "type". +The version parameter +must be given and must always be one for this version of rsyslog. The nomatch +parameter is optional. If specified, it contains the value to be used if lookup() +is provided an index value for which no entry exists. The default for +"nomatch" is the empty string. Type specifies the type of lookup to be done. +<h3>Table</h3> +This must be an array of elements, even if only a single value exists (for obvious +reasons, we do not expect this to occur often). Each array element must contain two +fields "index" and "value". +<h3>Example</h3> +<p>This is a sample of how an ip-to-office mapping may look like: +<pre> +{ "version":1, "nomatch":"unk", "type":"string", + "table":[ {"index":"10.0.1.1", "value":"A" }, + {"index":"10.0.1.2", "value":"A" }, + {"index":"10.0.1.3", "value":"A" }, + {"index":"10.0.2.1", "value":"B" }, + {"index":"10.0.2.2", "value":"B" }, + {"index":"10.0.2.3", "value":"B" } + ] +} +</pre> +Note: if a different IP comes in, the value "unk" +is returend thanks to the nomatch parameter in +the first line. +<p> +<h2>RainerScript Statements</h2> +<h3>lookup_table() Object</h3> +<p>This statement defines and intially loads a lookup table. Its format is +as follows: +<pre> +lookup_table(name="name" file="/path/to/file" reloadOnHUP="on|off") +</pre> +<h4>Parameters</h4> +<ul> + <li><b>name</b> (mandatory)<br> + Defines the name of lookup table for further reference + inside the configuration. Names must be unique. Note that + it is possible, though not advisible, to have different + names for the same file. + <li><b>file</b> (mandatory)<br> + Specifies the full path for the lookup table file. This file + must be readable for the user rsyslog is run under (important + when dropping privileges). It must point to a valid lookup + table file as described above. + <li><b>reloadOnHUP</b> (optional, default "on")<br> + Specifies if the table shall automatically be reloaded + as part of HUP processing. For static tables, the + default is "off" and specifying "on" triggers an + error message. Note that the default of "on" may be + somewhat suboptimal performance-wise, but probably + is what the user intuitively expects. Turn it off + if you know that you do not need the automatic + reload capability. +</ul> + +<h3>lookup() Function</h3> +<p>This function is used to actually do the table lookup. Format: +<pre> +lookup_table("name", indexvalue) +</pre> +<h4>Parameters</h4> +<ul> + <li><b>return value</b><br> + The function returns the string that is associated with the + given indexvalue. If the indexvalue is not present inside the + lookup table, the "nomatch" string is returned (or an empty string + if it is not defined). + <li><b>name</b> (constant string)<br> + The lookup table to be used. Note that this must be specificed as a + constant. In theory, variable table names could be made possible, but + their runtime behaviour is not as good as for static names, and we do + not (yet) see good use cases where dynamic table names could be useful. + <li><b>indexvalue</b> (expression)<br> + The value to be looked up. While this is an arbitrary RainerScript expression, + it's final value is always converted to a string in order to conduct + the lookup. For example, "lookup(table, 3+4)" would be exactly the same + as "lookup(table, "7")". In most cases, indexvalue will probably be + a single variable, but it could also be the result of all RainerScript-supported + expression types (like string concatenation or substring extraction). + Valid samples are "lookup(name, $fromhost-ip & $hostname)" or + "lookup(name, substr($fromhost-ip, 0, 5))" as well as of course the + usual "lookup(table, $fromhost-ip)". +</ul> + + +<h3>load_lookup_table Statement</h3> + +<p><b>Note: in the final implementation, this MAY be implemented as an action. +This is a low-level decesion that must be made during the detail development +process. Parameters and semantics will remain the same of this happens.</b> + +<p>This statement is used to reload a lookup table. It will fail if +the table is static. While this statement is executed, lookups to this table +are temporarily blocked. So for large tables, there may be a slight performance +hit during the load phase. It is assume that always a triggering condition +is used to load the table. +<pre> +load_lookup_table(name="name" errOnFail="on|off" valueOnFail="value") +</pre> +<h4>Parameters</h4> +<ul> + <li><b>name</b> (string)<br> + The lookup table to be used. + <li><b>errOnFail</b> (boolean, default "on")<br> + Specifies whether or not an error message is to be emitted if + there are any problems reloading the lookup table. + <li><b>valueOnFail</b> (optional, string)<br> + This parameter affects processing if the lookup table cannot + be loaded for some reason: If the parameter is not present, + the previous table will be kept in use. If the parameter is + given, the previous table will no longer be used, and instead + an empty table be with nomath=valueOnFail be generated. In short, + that means when the parameter is set and the reload fails, + all matches will always return what is specified in valueOnFail. +</ul> + +<h3>Usage example</h3> +<p>For clarity, we show only those parts of rsyslog.conf that affect +lookup tables. We use the remote office example that an example lookup +table file is given above for. +<pre> +lookup_table(name="ip2office" file="/path/to/ipoffice.lu" + reloadOnHUP="off") + + +template(name="depfile" type="string" + string="/var/log/%$usr.dep%/messages") + +set $usr.dep = lookup("ip2office", $fromhost-ip); +action(type="omfile" dynfile="depfile") + +# support for reload "commands" +if $fromhost-ip == "10.0.1.123" + and $msg contains "reload office lookup table" + then + load_lookup_table(name="ip2office" errOnFail="on") +</pre> + +<p>Note: for performance reasons, it makes sense to put the reload command into +a dedicated ruleset, bound to a specific listener - which than should also +be sufficiently secured, e.g. via TLS mutual auth. + +<h2>Implementation Details</h2> +<p>The lookup table functionality is implemented via highly efficient algorithms. +The string lookup is based on a parse tree and has O(1) time complexity. The array +lookup is also O(1). In case of sparseArray, we have O(log n). +<p>To preserve space and, more important, increase cache hit performance, equal +data values are only stored once, no matter how often a lookup index points to them. +<p>[<a href="rsyslog_conf.html">rsyslog.conf overview</a>] +[<a href="manual.html">manual index</a>] [<a href="http://www.rsyslog.com/">rsyslog site</a>]</p> +<p><font size="2">This documentation is part of the +<a href="http://www.rsyslog.com/">rsyslog</a> project.<br> +Copyright © 2013 by <a href="http://www.gerhards.net/rainer">Rainer Gerhards</a> and +<a href="http://www.adiscon.com/">Adiscon</a>. +Released under the GNU GPL version 3 or higher.</font></p> +</body> +</html> |