Improved NetFlowV9 support #21

kroepke · 2017-08-21T15:59:35Z

This PR changes the way templates are handled.

Since for V9 template data flows are not sent with every packet, the implementation must buffer packets until it receives the necessary templates to know how to parse them. The same is true for the option template.

This implementation moves the buffering and template aggregation into a custom codec aggregator, so that the codec itself, which runs after journalling the message, can assume that it has all the templates it needs to successfully parse a packet. This is even more important when processing a journal after a restart.
The RFC requires not to write templates to disk (or otherwise store them independent of data flows), because they might change at any time. We therefore colocate them with the data itself, taking the performance hit of writing more bytes to disk, but with the benefit of a safer implementation.

Thus this implementation does not lose data in the case it doesn't have templates yet. Those are resent regularly by the exporter, for each observation domain.

To be compatible with Graylog 2.3, this change comes with a custom codec aggregator, for 3.0 we can migrate the code back into the server.

fixes #18
fixes #19
fixes #20

much of this is still wip

currently fixing tests

aggregating the data flowsets does not work, because all records are based on the packet's timestamp to simplify parsing in the codec, the aggregator now collects all templates/option templates into a protobuf and adds received and buffered data flows to be parsed. the netflow packets are preserved completely and can also contain templates. the codec will not use them, though, only the aggregator does

kroepke · 2017-08-21T16:02:47Z

src/main/java/org/graylog/plugins/netflow/codecs/NetflowV9CodecAggregator.java

+        }
+    }
+
+    private void queueBufferedPackets(Set<TemplateKey> templates, Set<ChannelBuffer> packetsToSend, TemplateKey templateKey) {


This actually requires checking all templates used by a packet, so this code will change soon and is not the final version.

the codec-aggregator is null in super class, so that the put call added it after the raw-message handler thus the code never ran and screwed up parsing

the previous implementation only checked for a single template id to be present for each packet, which in general is wrong if not all templates arrive at the same time (which might happen for large numbers of active templates) the new implementation manually checks each packet's template requirements agains the ids of received templates, for the current remoteaddress/source id combination

bernd

I am seeing a field nf_nf_field_153 when ingesting v9 via pmacctd. Is this a dynamically generated field? Also, the nf prefix is duplicated.

kroepke · 2017-08-24T18:41:15Z

Ah shit. Yeah the prefix is broken now. I'll fix it tomorrow

…

On Aug 24, 2017 7:59 PM, "Bernd Ahlers" ***@***.***> wrote: ***@***.**** requested changes on this pull request. I am seeing a field nf_nf_field_153 when ingesting v9 via pmacctd. Is this a dynamically generated field? Also, the nf prefix is duplicated. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#21 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADLLnGSYWdEbHckJWhd79yvEP6DUAC7ks5sbboMgaJpZM4O9ibc> .

bernd · 2017-08-25T06:57:40Z

I also saw this. Can this still happen? I thought we are waiting until we get a template before we pass on the packet.

2017-08-24 20:24:01,298 ERROR: org.graylog.plugins.netflow.codecs.NetFlowCodec - Error parsing NetFlow packet <66e509d0-88f9-11e7-8379-024278d7ba18> received from <127.0.0.1:42934>
org.graylog.plugins.netflow.flows.EmptyTemplateException: Unable to parse NetFlow 9 records without template. Discarding packet.
	at org.graylog.plugins.netflow.v9.NetFlowV9Parser.parsePacket(NetFlowV9Parser.java:65) ~[classes/:?]
	at org.graylog.plugins.netflow.codecs.NetFlowCodec.lambda$decodeV9Packets$7(NetFlowCodec.java:186) ~[classes/:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_144]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_144]
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) ~[?:1.8.0_144]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[?:1.8.0_144]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[?:1.8.0_144]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_144]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_144]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[?:1.8.0_144]
	at org.graylog.plugins.netflow.codecs.NetFlowCodec.decodeV9Packets(NetFlowCodec.java:187) ~[classes/:?]
	at org.graylog.plugins.netflow.codecs.NetFlowCodec.decodeV9(NetFlowCodec.java:156) ~[classes/:?]
	at org.graylog.plugins.netflow.codecs.NetFlowCodec.decodeMessages(NetFlowCodec.java:134) [classes/:?]
	at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:144) [classes/:?]
	at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:87) [classes/:?]
	at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:74) [classes/:?]
	at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) [classes/:?]
	at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [disruptor-3.3.6.jar:?]
	at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [metrics-core-3.2.2.jar:3.2.2]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]

kroepke · 2017-08-25T09:18:56Z

That's weird. Do you have a config that causes this?
If nothing else I'll try to get better logging in there (e.g. printing the packet structure for when that error happens).

kroepke · 2017-08-25T09:42:01Z

The field 153 is because our default field definition list is missing the type. It is "flow end milliseconds" of https://www.iana.org/assignments/ipfix/ipfix.xhtml
We should probably ship with a better default list, taking the one from IPFIX.

joschi · 2017-08-25T09:51:40Z

We should probably ship with a better default list, taking the one from IPFIX.

I would strongly advise against this, as IPFIX ("NetFlow version 10") has some incompatible fields with NetFlow version 9.

For example, id 1 is "octetDeltaCount" in IPFIX (8 bytes), while it's the number of incoming bytes in NetFlow 9 (4 bytes) (see http://netflow.caligare.com/netflow_v9.htm).

kroepke · 2017-08-25T10:06:57Z

@joschi Fair point, the names sometimes differ, but semantically IPFIX is the same as NetFlow v9 when it comes to the 127 fields v9 defined (as stated in RFC 5102)

At the very least the definition list we ship should be consistent in the IPFIX extensions, while retaining the original v9 fields.

fixes double prefixing of fields

bernd

LGTM 👍

* move template caching and flow buffering in custom message aggregator much of this is still wip * remove template cache * wip for codec aggregation and parsing custom format of v9 currently fixing tests * wip * migrate tests * fix v9 parsing by preserving the complete packets during buffering aggregating the data flowsets does not work, because all records are based on the packet's timestamp to simplify parsing in the codec, the aggregator now collects all templates/option templates into a protobuf and adds received and buffered data flows to be parsed. the netflow packets are preserved completely and can also contain templates. the codec will not use them, though, only the aggregator does * remove duplicated license header * update protobuf comment * update comment * tweak license header to avoid diff * fix guice setup for transport * fix handler setup the codec-aggregator is null in super class, so that the put call added it after the raw-message handler thus the code never ran and screwed up parsing * change how the packet cache works the previous implementation only checked for a single template id to be present for each packet, which in general is wrong if not all templates arrive at the same time (which might happen for large numbers of active templates) the new implementation manually checks each packet's template requirements agains the ids of received templates, for the current remoteaddress/source id combination * prefix netflow v9 fields with nf_ * remove unused optional template atomic ref * don't prefix unknown fields names, that is done centrally now fixes double prefixing of fields * don't forget to fix test * add flow timestamp fields from ipfix * fix test after field definition list update (cherry picked from commit 4e9d68f)

michelealbrigo · 2017-08-28T10:14:16Z

I can confirm that #20 looks solved now: I am running rc5 and it's dealing with our full netflow configuration without any apparent problem since some hours.

moadiv · 2017-09-05T16:17:27Z

Hi everybody, i'm having troubles with an Invalid FlowVersion Exception (Invalid NetFlow version 0) when trying to log the flow of a netgear switch.
This is my environment:

Graylog v2.3.1+9f2c6ef
graylog-plugin-netflow-2.3.0-rc.5.jar
switch netgear M5300-28G3 sflow V5

And this is the server.log result:

2017-09-05T12:02:13.080-04:00 ERROR [NetFlowCodec] Error parsing NetFlow packet <94943050-9253-11e7-9ea1-000c29f4e228> received from <192.168.128.254:6343>
org.graylog.plugins.netflow.flows.InvalidFlowVersionException: **Invalid NetFlow version 0**
        at org.graylog.plugins.netflow.v5.NetFlowV5Parser.parseHeader(NetFlowV5Parser.java:67) ~[graylog-plugin-netflow-2.3.0-rc.5.tmp:?]
        at org.graylog.plugins.netflow.v5.NetFlowV5Parser.parsePacket(NetFlowV5Parser.java:33) ~[graylog-plugin-netflow-2.3.0-rc.5.tmp:?]
        at org.graylog.plugins.netflow.codecs.NetFlowCodec.decodeMessages(NetFlowCodec.java:127) [graylog-plugin-netflow-2.3.0-rc.5.tmp:?]
        at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:144) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:87) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:74) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) [graylog.jar:?]
        at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
        at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]

Thanks in advance !!!!

joschi · 2017-09-05T16:20:53Z

@moadiv This plugin currently doesn't support SFlow, see #3.

Please post questions about this plugin to our discussion forum or join the #graylog channel on freenode IRC.

Thank you!

kroepke added 9 commits August 15, 2017 18:31

move template caching and flow buffering in custom message aggregator

20f44fb

much of this is still wip

remove template cache

5a10aca

wip for codec aggregation and parsing custom format of v9

c35a39c

currently fixing tests

wip

32f031f

migrate tests

f9d23a1

remove duplicated license header

0992d59

update protobuf comment

cb59d2e

update comment

13b9795

kroepke added this to the 2.3.1 milestone Aug 21, 2017

tweak license header to avoid diff

4f17b71

kroepke commented Aug 21, 2017

View reviewed changes

kroepke requested a review from bernd August 21, 2017 16:02

kroepke added 2 commits August 22, 2017 11:02

fix guice setup for transport

b5e6570

fix handler setup

ba635fa

the codec-aggregator is null in super class, so that the put call added it after the raw-message handler thus the code never ran and screwed up parsing

kroepke changed the title ~~[WIP] V9 fixes~~ Improved NetFlowV9 support Aug 22, 2017

kroepke added 3 commits August 22, 2017 15:38

prefix netflow v9 fields with nf_

6b12b86

remove unused optional template atomic ref

1b003ac

kroepke added the ready-for-review label Aug 24, 2017

bernd self-assigned this Aug 24, 2017

bernd suggested changes Aug 24, 2017

View reviewed changes

kroepke added 2 commits August 25, 2017 12:29

don't prefix unknown fields names, that is done centrally now

9ac4554

fixes double prefixing of fields

don't forget to fix test

6dc3a56

kroepke added 2 commits August 25, 2017 13:06

add flow timestamp fields from ipfix

1a2dee9

fix test after field definition list update

46c594a

bernd approved these changes Aug 25, 2017

View reviewed changes

bernd merged commit 4e9d68f into master Aug 25, 2017

bernd deleted the v9-fixes branch August 25, 2017 12:17

Improved NetFlowV9 support #21

Improved NetFlowV9 support #21

Uh oh!

Conversation

kroepke commented Aug 21, 2017

Uh oh!

kroepke Aug 21, 2017

Choose a reason for hiding this comment

Uh oh!

bernd left a comment

Choose a reason for hiding this comment

Uh oh!

kroepke commented Aug 24, 2017 via email

Uh oh!

bernd commented Aug 25, 2017

Uh oh!

kroepke commented Aug 25, 2017

Uh oh!

kroepke commented Aug 25, 2017

Uh oh!

joschi commented Aug 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kroepke commented Aug 25, 2017

Uh oh!

bernd left a comment

Choose a reason for hiding this comment

Uh oh!

michelealbrigo commented Aug 28, 2017

Uh oh!

moadiv commented Sep 5, 2017 • edited by joschi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joschi commented Sep 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

joschi commented Aug 25, 2017 •

edited

Loading

moadiv commented Sep 5, 2017 •

edited by joschi

Loading

joschi commented Sep 5, 2017 •

edited

Loading