summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorRainer Gerhards <rgerhards@adiscon.com>2013-03-04 12:59:53 +0100
committerRainer Gerhards <rgerhards@adiscon.com>2013-03-04 12:59:53 +0100
commita6aa2b75ee9da97a6d2d98701af8bff01064afe4 (patch)
tree30b4d5fe360aee350a7b05ad36e640485f4cbf37
parente17c0267dbda37641fd93e51ce28c9ab2307c0c3 (diff)
downloadrsyslog-a6aa2b75ee9da97a6d2d98701af8bff01064afe4.tar.gz
rsyslog-a6aa2b75ee9da97a6d2d98701af8bff01064afe4.tar.bz2
rsyslog-a6aa2b75ee9da97a6d2d98701af8bff01064afe4.zip
mmanon: complete ipv4 functionality
-rw-r--r--doc/mmanon.html58
-rw-r--r--plugins/mmanon/mmanon.c135
-rw-r--r--runtime/msg.c8
-rw-r--r--runtime/msg.h1
-rw-r--r--runtime/rsyslog.h1
5 files changed, 159 insertions, 44 deletions
diff --git a/doc/mmanon.html b/doc/mmanon.html
index af462d2e..81cf4e8e 100644
--- a/doc/mmanon.html
+++ b/doc/mmanon.html
@@ -7,15 +7,25 @@
<a href="rsyslog_conf_modules.html">back</a>
<h1>IP Address Anonimization Module (mmanon)</h1>
-<p><b>Module Name:&nbsp;&nbsp;&nbsp; omjournal</b></p>
+<p><b>Module Name:&nbsp;&nbsp;&nbsp; mmanon</b></p>
<p><b>Author: </b>Rainer Gerhards &lt;rgerhards@adiscon.com&gt;</p>
<p><b>Available since</b>: 7.3.7</p>
<p><b>Description</b>:</p>
<p>The mmanon module permits to anonymize IP addresses. It is a message
modification module that actually changes the IP address inside the message,
so after calling mmanon, the original message can no longer be obtained.
-Note that anonymization will break digital signutures on the message, if
-such exists.
+Note that anonymization will break digital signatures on the message, if
+they exist.
+<p><i>How are IP-Addresses defined?</i>
+<p>We assume that an IP address consists of four octets in dotted notation,
+where each of the octets has a value between 0 and 255, inclusively. After
+the last octet, there must be either a space or a colon. So, for example,
+"1.2.3.4 Test" and "1.2.3.4:514 Test" are detected as containing valid IP
+addresses, whereas this is not the case for "1.2.300.4 Test" or
+"1.2.3.4-Test". The message text may contain multiple addresses. If so,
+each of them is anonimized (according to the same rules).
+<b>Important:</b> We may change the set of acceptable characters after
+the last octet in the future, if there are good reasons to do so.
<p>&nbsp;</p>
<p><b>Module Configuration Parameters</b>:</p>
@@ -23,6 +33,18 @@ such exists.
<p>&nbsp;</p>
<p><b>Action Confguration Parameters</b>:</p>
<ul>
+<li><b>mode</b> - default "rewrite"<br>
+There exists the "simple" and "rewrite" mode. In simple mode, only octets
+as whole can be anonymized and the length of the message is never changed.
+This means that when the last three octets of the address 10.1.12.123 are
+anonymized, the result will be 10.0.00.000. This means that the length of the
+original octets is still visible and may be used to draw some privacy-evasive
+conclusions. This mode is slightly faster than "overwrite" mode, and this
+may matter in high throughput environments.<br>
+The default "rewrite" mode will do full anonymization of any number of bits
+and it will also normlize the address, so that no information about the
+original IP address is available. So in the above example, 10.1.12.123 would
+be anonymized to 10.0.0.0.
<li><b>ipv4.bits</b> - default 16<br>
This set the number of bits that should be anonymized (bits are from the
right, so lower bits are anonymized first). This setting permits to save
@@ -34,23 +56,20 @@ but conservative guideline for other countries.<br>
Note: when in simple mode, only bits on a byte boundary can be specified.
As such, any value other than 8, 16, 24 or 32 is invalid. If an invalid
value is given, it is rounded to the next byte boundary (so we favor stronger
-encyrption in that case). For example, a bit value of 12 will become 16 in
+anonymization in that case). For example, a bit value of 12 will become 16 in
simple mode (an error message is also emitted).
<li><b>replacementChar</b> - default "x"<br>
In simple mode, this sets the character
that the to-be-anonymized part of the IP address is to be overwritten
-with.
+with. In rewrite mode, this parameter is <b>not permitted</b>, as in
+this case we need not necessarily rewrite full octets. As such, the anonymized
+part is always zero-filled and replacementChar is of no use. If it is
+specified, an error message is emitted and the parameter ignored.
</ul>
<p><b>Caveats/Known Bugs:</b>
<ul>
-<li><b>This module is currently experimental.</b> This does not mean
-the code is not solid. What it means is that the functionality is limited
-and it got limited practice drill so far.
<li><b>only IPv4</b> is supported
-<li>The anonymization replaces the numerical parts of the ip address.
-However, the number of digits is not normalized. So one can probably
-draw conlusions just based on the length of the various octets.
</ul>
<p><b>Samples:</b></p>
@@ -67,16 +86,27 @@ action(type="omfile" file="/path/to/anon.log")
<p>This next snippet is almost identical to the first one, but
here we anonymize the full IPv4 address. Note that by
modifying the number of bits, you can anonymize different parts
-of the address. Keep in mind that in simple mode, the bit values
+of the address. Keep in mind that in simple mode (used here), the bit values
must match IP address bytes, so for IPv4 only the values 8, 16, 24 and
32 are valid. Also, in this example the replacement is done
-via zeros instead of lower-case "x"-letters.
+via zeros instead of lower-case "x"-letters. Also keep in mind that
+"replacementChar" can only be set in simple mode.
<p><textarea rows="5" cols="60">module(load="mmanon")
action(type="omfile" file="/path/to/non-anon.log")
-action(type="mmanon" ipv4.bits="32" replacementChar="0")
+action(type="mmanon" ipv4.bits="32" mode="simple" replacementChar="o")
action(type="omfile" file="/path/to/anon.log")
</textarea>
+<p>The next snippet is also based on the first one, but anonimzes an
+"odd" number of bits, 12. The value of 12 is used by some folks as a
+compromise between keeping privacy and still permiting to gain some
+more in-depth insight from log files. Note that anonymizing 12 bits
+may be insufficient to fulfill legal requirements (if such exist).
+<p><textarea rows="5" cols="60">module(load="mmanon")
+action(type="omfile" file="/path/to/non-anon.log")
+action(type="mmanon" ipv4.bits="12")
+action(type="omfile" file="/path/to/anon.log")
+</textarea>
<p>[<a href="rsyslog_conf.html">rsyslog.conf overview</a>] [<a href="manual.html">manual
index</a>] [<a href="http://www.rsyslog.com/">rsyslog site</a>]</p>
diff --git a/plugins/mmanon/mmanon.c b/plugins/mmanon/mmanon.c
index c9add1a1..fc0c8a03 100644
--- a/plugins/mmanon/mmanon.c
+++ b/plugins/mmanon/mmanon.c
@@ -46,11 +46,25 @@ DEF_OMOD_STATIC_DATA
/* config variables */
+/* precomputed table of IPv4 anonymization masks */
+static const uint32_t ipv4masks[33] = {
+ 0xffffffff, 0xfffffffe, 0xfffffffc, 0xfffffff8,
+ 0xfffffff0, 0xffffffe0, 0xffffffc0, 0xffffff80,
+ 0xffffff00, 0xfffffe00, 0xfffffc00, 0xfffff800,
+ 0xfffff000, 0xffffe000, 0xffffc000, 0xffff8000,
+ 0xffff0000, 0xfffe0000, 0xfffc0000, 0xfff80000,
+ 0xfff00000, 0xffe00000, 0xffc00000, 0xff800000,
+ 0xff000000, 0xfe000000, 0xfc000000, 0xf8000000,
+ 0xf0000000, 0xe0000000, 0xc0000000, 0x80000000,
+ 0x00000000
+ };
+/* define operation modes we have */
+#define SIMPLE_MODE 0 /* just overwrite */
+#define REWRITE_MODE 1 /* rewrite IP address, canoninized */
typedef struct _instanceData {
char replChar;
int8_t mode;
-# define SIMPLE_MODE 0 /* just overwrite */
struct {
int8_t bits;
} ipv4;
@@ -118,7 +132,7 @@ ENDfreeInstance
static inline void
setInstParamDefaults(instanceData *pData)
{
- pData->mode = SIMPLE_MODE;
+ pData->mode = REWRITE_MODE;
pData->replChar = 'x';
pData->ipv4.bits = 16;
}
@@ -145,6 +159,9 @@ CODESTARTnewActInst
if(!es_strbufcmp(pvals[i].val.d.estr, (uchar*)"simple",
sizeof("simple")-1)) {
pData->mode = SIMPLE_MODE;
+ } else if(!es_strbufcmp(pvals[i].val.d.estr, (uchar*)"rewrite",
+ sizeof("rewrite")-1)) {
+ pData->mode = REWRITE_MODE;
} else {
char *cstr = es_str2cstr(pvals[i].val.d.estr, NULL);
errmsg.LogError(0, RS_RET_INVLD_MODE,
@@ -183,6 +200,19 @@ CODESTARTnewActInst
"mmanon: invalid number of ipv4 bits "
"in simple mode, corrected to %d",
pData->ipv4.bits);
+ } else { /* REWRITE_MODE */
+ if(pData->ipv4.bits < 1 || pData->ipv4.bits > 32) {
+ pData->ipv4.bits = 32;
+ errmsg.LogError(0, RS_RET_INVLD_ANON_BITS,
+ "mmanon: invalid number of ipv4 bits "
+ "in rewrite mode, corrected to %d",
+ pData->ipv4.bits);
+ }
+ if(pData->replChar != 'x') {
+ errmsg.LogError(0, RS_RET_REPLCHAR_IGNORED,
+ "mmanon: replacementChar parameter is ignored "
+ "in rewrite mode");
+ }
}
CODE_STD_FINALIZERnewActInst
@@ -206,28 +236,52 @@ getnum(uchar *msg, int lenMsg, int *idx)
int num = 0;
int i = *idx;
-dbgprintf("DDDD: in getnum: %s\n", msg+(*idx));
while(i < lenMsg && msg[i] >= '0' && msg[i] <= '9') {
num = num * 10 + msg[i] - '0';
++i;
}
*idx = i;
-dbgprintf("DDDD: got octet %d\n", num);
return num;
}
+/* write an IP address octet to the output position */
+static int
+writeOctet(uchar *msg, int idx, int *nxtidx, uint8_t octet)
+{
+ if(octet > 99) {
+ msg[idx++] = '0' + octet / 100;
+ octet = octet % 100;
+ }
+ if(octet > 9) {
+ msg[idx++] = '0' + octet / 10;
+ octet = octet % 10;
+ }
+ msg[idx++] = '0' + octet;
+
+ if(nxtidx != NULL) {
+ if(idx + 1 != *nxtidx) {
+ /* we got shorter, fix it! */
+ msg[idx] = '.';
+ *nxtidx = idx + 1;
+ }
+ }
+ return idx;
+}
+
/* currently works for IPv4 only! */
void
-anonip(instanceData *pData, uchar *msg, int lenMsg, int *idx)
+anonip(instanceData *pData, uchar *msg, int *pLenMsg, int *idx)
{
int i = *idx;
- int octet[4];
+ int octet;
+ uint32_t ipv4addr;
int ipstart[4];
int j;
+ int endpos;
+ int lenMsg = *pLenMsg;
-dbgprintf("DDDD: in anonip: %s\n", msg+(*idx));
while(i < lenMsg && (msg[i] <= '0' || msg[i] >= '9')) {
++i; /* skip to first number */
}
@@ -236,35 +290,55 @@ dbgprintf("DDDD: in anonip: %s\n", msg+(*idx));
/* got digit, let's see if ip */
ipstart[0] = i;
- octet[0] = getnum(msg, lenMsg, &i);
- if(octet[0] > 255 || msg[i] != '.') goto done;
+ octet = getnum(msg, lenMsg, &i);
+ if(octet > 255 || msg[i] != '.') goto done;
+ ipv4addr = octet << 24;
++i;
ipstart[1] = i;
- octet[1] = getnum(msg, lenMsg, &i);
- if(octet[1] > 255 || msg[i] != '.') goto done;
+ octet = getnum(msg, lenMsg, &i);
+ if(octet > 255 || msg[i] != '.') goto done;
+ ipv4addr |= octet << 16;
++i;
ipstart[2] = i;
- octet[2] = getnum(msg, lenMsg, &i);
- if(octet[2] > 255 || msg[i] != '.') goto done;
+ octet = getnum(msg, lenMsg, &i);
+ if(octet > 255 || msg[i] != '.') goto done;
+ ipv4addr |= octet << 8;
++i;
ipstart[3] = i;
- octet[3] = getnum(msg, lenMsg, &i);
- if(octet[3] > 255 || !(msg[i] == ' ' || msg[i] == ':')) goto done;
+ octet = getnum(msg, lenMsg, &i);
+ if(octet > 255 || !(msg[i] == ' ' || msg[i] == ':')) goto done;
+ ipv4addr |= octet;
/* OK, we now found an ip address */
- if(pData->ipv4.bits == 8)
- j = ipstart[3];
- else if(pData->ipv4.bits == 16)
- j = ipstart[2];
- else if(pData->ipv4.bits == 24)
- j = ipstart[1];
- else /* due to our checks, this *must* be 32 */
- j = ipstart[0];
-dbgprintf("DDDD: ipstart is %d: %s\n", j, msg+j);
- while(j < i) {
- if(msg[j] != '.')
- msg[j] = pData->replChar;
- ++j;
+ if(pData->mode == SIMPLE_MODE) {
+ if(pData->ipv4.bits == 8)
+ j = ipstart[3];
+ else if(pData->ipv4.bits == 16)
+ j = ipstart[2];
+ else if(pData->ipv4.bits == 24)
+ j = ipstart[1];
+ else /* due to our checks, this *must* be 32 */
+ j = ipstart[0];
+ while(j < i) {
+ if(msg[j] != '.')
+ msg[j] = pData->replChar;
+ ++j;
+ }
+ } else { /* REWRITE_MODE */
+ ipv4addr &= ipv4masks[pData->ipv4.bits];
+ if(pData->ipv4.bits > 24)
+ writeOctet(msg, ipstart[0], &(ipstart[1]), ipv4addr >> 24);
+ if(pData->ipv4.bits > 16)
+ writeOctet(msg, ipstart[1], &(ipstart[2]), (ipv4addr >> 16) & 0xff);
+ if(pData->ipv4.bits > 8)
+ writeOctet(msg, ipstart[2], &(ipstart[3]), (ipv4addr >> 8) & 0xff);
+ endpos = writeOctet(msg, ipstart[3], NULL, ipv4addr & 0xff);
+ /* if we had truncation, we need to shrink the msg */
+ dbgprintf("existing i %d, endpos %d\n", i, endpos);
+ if(i - endpos > 0) {
+ *pLenMsg = lenMsg - (i - endpos);
+ memmove(msg+endpos, msg+i, lenMsg - i + 1);
+ }
}
done: *idx = i;
@@ -281,10 +355,11 @@ CODESTARTdoAction
pMsg = (msg_t*) ppString[0];
lenMsg = getMSGLen(pMsg);
msg = getMSG(pMsg);
- DBGPRINTF("DDDD: calling mmanon with msg '%s'\n", msg);
for(i = 0 ; i < lenMsg ; ++i) {
- anonip(pData, msg, lenMsg, &i);
+ anonip(pData, msg, &lenMsg, &i);
}
+ if(lenMsg != getMSGLen(pMsg))
+ setMSGLen(pMsg, lenMsg);
ENDdoAction
diff --git a/runtime/msg.c b/runtime/msg.c
index 68577ad0..c302a050 100644
--- a/runtime/msg.c
+++ b/runtime/msg.c
@@ -1468,6 +1468,14 @@ getRawMsg(msg_t *pM, uchar **pBuf, int *piLen)
}
+/* note: setMSGLen() is only for friends who really know what they
+ * do. Setting an invalid length can be desasterous!
+ */
+void setMSGLen(msg_t *pM, int lenMsg)
+{
+ pM->iLenMSG = lenMsg;
+}
+
int getMSGLen(msg_t *pM)
{
return((pM == NULL) ? 0 : pM->iLenMSG);
diff --git a/runtime/msg.h b/runtime/msg.h
index 564441b6..edf5ed98 100644
--- a/runtime/msg.h
+++ b/runtime/msg.h
@@ -198,6 +198,7 @@ uchar *getMSG(msg_t *pM);
char *getHOSTNAME(msg_t *pM);
char *getPROCID(msg_t *pM, sbool bLockMutex);
char *getAPPNAME(msg_t *pM, sbool bLockMutex);
+void setMSGLen(msg_t *pM, int lenMsg);
int getMSGLen(msg_t *pM);
char *getHOSTNAME(msg_t *pM);
diff --git a/runtime/rsyslog.h b/runtime/rsyslog.h
index 8dad09a4..e7a5dffb 100644
--- a/runtime/rsyslog.h
+++ b/runtime/rsyslog.h
@@ -401,6 +401,7 @@ enum rsRetVal_ /** return value. All methods return this if not specified oth
RS_RET_NO_RULEBASE = -2310,/**< mmnormalize: rulebase can not be found or otherwise invalid */
RS_RET_INVLD_MODE = -2311,/**< invalid mode specified in configuration */
RS_RET_INVLD_ANON_BITS = -2312,/**< mmanon: invalid number of bits to anonymize specified */
+ RS_RET_REPLCHAR_IGNORED = -2313,/**< mmanon: replacementChar parameter is ignored */
/* RainerScript error messages (range 1000.. 1999) */
RS_RET_SYSVAR_NOT_FOUND = 1001, /**< system variable could not be found (maybe misspelled) */