summaryrefslogtreecommitdiffstats
path: root/doc/lookup_tables.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/lookup_tables.html')
-rw-r--r--doc/lookup_tables.html205
1 files changed, 205 insertions, 0 deletions
diff --git a/doc/lookup_tables.html b/doc/lookup_tables.html
new file mode 100644
index 00000000..d72810f1
--- /dev/null
+++ b/doc/lookup_tables.html
@@ -0,0 +1,205 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html><head>
+<title>Lookup Tables</title>
+</head>
+
+<body>
+<h1>Lookup Tables</h1>
+
+<p><b><font color="red">NOTE: this is</font> proposed functionality, which is
+<font color="red">NOT YET IMPLEMENTED</font>!</b>
+
+<p><b>Lookup tables</a> are a powerful construct
+to obtain "class" information based on message content (e.g. to build
+log file names for different server types, departments or remote
+offices).</b>
+<p>The base idea is to use a message variable as an index into a table which then
+returns another value. For example, $fromhost-ip could be used as an index, with
+the table value representing the type of server or the department or remote office
+it is located in. A main point with lookup tables is that the lookup is very fast.
+So while lookup tables can be emulated with if-elseif constructs, they are generally
+much faster. Also, it is possible to reload lookup tables during rsyslog runtime without
+the need for a full restart.
+<p>The lookup tables itself exists in a separate configuration file (one per table). This
+file is loaded on rsyslog startup and when a reload is requested.
+<p>There are different types of lookup tables:
+<ul>
+<li><b>string</b> - the value to be looked up is an arbitrary string. Only exact
+some strings match.
+<li><b>array</b> - the value to be looked up is an integer number from a consequtive set.
+The set does not need to start at zero or one, but there must be no number missing. So, for example
+5,6,7,8,9 would be a valid set of index values, while 1,2,4,5 would not be (due to missing
+2).
+A match happens if the requested number is present.
+<li><b>sparseArray</b> - the value to be looked up is an integer value, but there may
+be gaps inside the set of values (usually there are large gaps). A typical use case would
+be the matching of IPv4 address information. A match happens on the first value that is
+less than or equal to the requested value.
+</ul>
+<p>Note that index integer numbers are represented by unsigned 32 bits.
+<p>Lookup tables can be access via the lookup() built-in function. The core idea is to
+set a local variable to the lookup result and later on use that local variable in templates.
+<p>More details on usage now follow.
+<h2>Lookup Table File Format</h2>
+<p>Lookup table files contain a single JSON object. This object contains of a header and a
+table part.
+<h3>Header</h3>
+<p>The header is the top-level json. It has paramters "version", "nomatch", and "type".
+The version parameter
+must be given and must always be one for this version of rsyslog. The nomatch
+parameter is optional. If specified, it contains the value to be used if lookup()
+is provided an index value for which no entry exists. The default for
+"nomatch" is the empty string. Type specifies the type of lookup to be done.
+<h3>Table</h3>
+This must be an array of elements, even if only a single value exists (for obvious
+reasons, we do not expect this to occur often). Each array element must contain two
+fields "index" and "value".
+<h3>Example</h3>
+<p>This is a sample of how an ip-to-office mapping may look like:
+<pre>
+{ "version":1, "nomatch":"unk", "type":"string",
+ "table":[ {"index":"10.0.1.1", "value":"A" },
+ {"index":"10.0.1.2", "value":"A" },
+ {"index":"10.0.1.3", "value":"A" },
+ {"index":"10.0.2.1", "value":"B" },
+ {"index":"10.0.2.2", "value":"B" },
+ {"index":"10.0.2.3", "value":"B" }
+ ]
+}
+</pre>
+Note: if a different IP comes in, the value "unk"
+is returend thanks to the nomatch parameter in
+the first line.
+<p>
+<h2>RainerScript Statements</h2>
+<h3>lookup_table() Object</h3>
+<p>This statement defines and intially loads a lookup table. Its format is
+as follows:
+<pre>
+lookup_table(name="name" file="/path/to/file" reloadOnHUP="on|off")
+</pre>
+<h4>Parameters</h4>
+<ul>
+ <li><b>name</b> (mandatory)<br>
+ Defines the name of lookup table for further reference
+ inside the configuration. Names must be unique. Note that
+ it is possible, though not advisible, to have different
+ names for the same file.
+ <li><b>file</b> (mandatory)<br>
+ Specifies the full path for the lookup table file. This file
+ must be readable for the user rsyslog is run under (important
+ when dropping privileges). It must point to a valid lookup
+ table file as described above.
+ <li><b>reloadOnHUP</b> (optional, default "on")<br>
+ Specifies if the table shall automatically be reloaded
+ as part of HUP processing. For static tables, the
+ default is "off" and specifying "on" triggers an
+ error message. Note that the default of "on" may be
+ somewhat suboptimal performance-wise, but probably
+ is what the user intuitively expects. Turn it off
+ if you know that you do not need the automatic
+ reload capability.
+</ul>
+
+<h3>lookup() Function</h3>
+<p>This function is used to actually do the table lookup. Format:
+<pre>
+lookup_table("name", indexvalue)
+</pre>
+<h4>Parameters</h4>
+<ul>
+ <li><b>return value</b><br>
+ The function returns the string that is associated with the
+ given indexvalue. If the indexvalue is not present inside the
+ lookup table, the "nomatch" string is returned (or an empty string
+ if it is not defined).
+ <li><b>name</b> (constant string)<br>
+ The lookup table to be used. Note that this must be specificed as a
+ constant. In theory, variable table names could be made possible, but
+ their runtime behaviour is not as good as for static names, and we do
+ not (yet) see good use cases where dynamic table names could be useful.
+ <li><b>indexvalue</b> (expression)<br>
+ The value to be looked up. While this is an arbitrary RainerScript expression,
+ it's final value is always converted to a string in order to conduct
+ the lookup. For example, "lookup(table, 3+4)" would be exactly the same
+ as "lookup(table, "7")". In most cases, indexvalue will probably be
+ a single variable, but it could also be the result of all RainerScript-supported
+ expression types (like string concatenation or substring extraction).
+ Valid samples are "lookup(name, $fromhost-ip &amp; $hostname)" or
+ "lookup(name, substr($fromhost-ip, 0, 5))" as well as of course the
+ usual "lookup(table, $fromhost-ip)".
+</ul>
+
+
+<h3>load_lookup_table Statement</h3>
+
+<p><b>Note: in the final implementation, this MAY be implemented as an action.
+This is a low-level decesion that must be made during the detail development
+process. Parameters and semantics will remain the same of this happens.</b>
+
+<p>This statement is used to reload a lookup table. It will fail if
+the table is static. While this statement is executed, lookups to this table
+are temporarily blocked. So for large tables, there may be a slight performance
+hit during the load phase. It is assume that always a triggering condition
+is used to load the table.
+<pre>
+load_lookup_table(name="name" errOnFail="on|off" valueOnFail="value")
+</pre>
+<h4>Parameters</h4>
+<ul>
+ <li><b>name</b> (string)<br>
+ The lookup table to be used.
+ <li><b>errOnFail</b> (boolean, default "on")<br>
+ Specifies whether or not an error message is to be emitted if
+ there are any problems reloading the lookup table.
+ <li><b>valueOnFail</b> (optional, string)<br>
+ This parameter affects processing if the lookup table cannot
+ be loaded for some reason: If the parameter is not present,
+ the previous table will be kept in use. If the parameter is
+ given, the previous table will no longer be used, and instead
+ an empty table be with nomath=valueOnFail be generated. In short,
+ that means when the parameter is set and the reload fails,
+ all matches will always return what is specified in valueOnFail.
+</ul>
+
+<h3>Usage example</h3>
+<p>For clarity, we show only those parts of rsyslog.conf that affect
+lookup tables. We use the remote office example that an example lookup
+table file is given above for.
+<pre>
+lookup_table(name="ip2office" file="/path/to/ipoffice.lu"
+ reloadOnHUP="off")
+
+
+template(name="depfile" type="string"
+ string="/var/log/%$usr.dep%/messages")
+
+set $usr.dep = lookup("ip2office", $fromhost-ip);
+action(type="omfile" dynfile="depfile")
+
+# support for reload "commands"
+if $fromhost-ip == "10.0.1.123"
+ and $msg contains "reload office lookup table"
+ then
+ load_lookup_table(name="ip2office" errOnFail="on")
+</pre>
+
+<p>Note: for performance reasons, it makes sense to put the reload command into
+a dedicated ruleset, bound to a specific listener - which than should also
+be sufficiently secured, e.g. via TLS mutual auth.
+
+<h2>Implementation Details</h2>
+<p>The lookup table functionality is implemented via highly efficient algorithms.
+The string lookup is based on a parse tree and has O(1) time complexity. The array
+lookup is also O(1). In case of sparseArray, we have O(log n).
+<p>To preserve space and, more important, increase cache hit performance, equal
+data values are only stored once, no matter how often a lookup index points to them.
+<p>[<a href="rsyslog_conf.html">rsyslog.conf overview</a>]
+[<a href="manual.html">manual index</a>] [<a href="http://www.rsyslog.com/">rsyslog site</a>]</p>
+<p><font size="2">This documentation is part of the
+<a href="http://www.rsyslog.com/">rsyslog</a> project.<br>
+Copyright &copy; 2013 by <a href="http://www.gerhards.net/rainer">Rainer Gerhards</a> and
+<a href="http://www.adiscon.com/">Adiscon</a>.
+Released under the GNU GPL version 3 or higher.</font></p>
+</body>
+</html>