Netflow anonymization

From GEANT2-JRA1 Wiki

Netflow task within JRA1 will provide infrastructure which will be able to collect netflow data from routers and provide those data to the users. In most cases (excluding NOCs or security related parties like CERTs), Netflow data will be anonymized before it is exported and presented to the users. Therefore it is needed to evaluate anonymization methods and tools in order to provide anonymized netflow data.

Contents

LOBSTER

In LOBSTER project some tools and methods for anonymization are implemented. In LOBSTER FAQ there are some interesting notes about anonymization within LOBSTER framework:

'Anonymization is a core part of the LOBSTER infrastructure. It provides a large set of anonymization primitives that can be applied up to the application layer. Primitives include hashing (MD5, SHA, CRC32, AES and DES algorithms), mapping to sequential values, replace with constant, mapping based on distribution functions (uniform and Gaussian), prefix preserving (for IP addresses), regular expression substitution, checksum adjust (for all protocols) and removal of fields (for application level protocols), thus providing adequate functionality for every user needs. Functions can be applied to any field of most common protocols such as IP, TCP, UDP, ICMP, HTTP or FTP. Anonymization can also be transparently applied to streams rather than raw packets. The administrator is able to define practically any anonymization policy that will be forced to network packets. The anonymization function is currently part of the LOBSTER software.

Additionally, a graphical anonymization policy creation tool is provided. Since anonymization is a fundamental aspect of the LOBSTER framework, it is important to make the specification of anonymization policies as safe and convenient as possible. We do this by providing a specification language SiSaL (Scripting Sanitization Language). SiSaL allows an anonymization policy to be specified in a flexible and concise manner. This clarity of specification avoids errors, and allows a network administrator to confidently specify complex anonymization policies.

The LOBSTER software includes anonymization support for packets. This anonymization functionality has also been incorporated in this standalone tool called AnonTool. An anonymization application is provided, including some of the anonymization features available. Moreover, the user can use the anonymization API provided to create his own applications in the same way that a mapi application is written.'

There is also a deliverable: D1.1a - Anonymization Framework Definition

AnonTool

You can download AnonTool at http://www.ics.forth.gr/dcs/Activities/Projects/anontool.html.

AnonTool is an open-source implementation of the Anonymization API (AAPI), which provides an easy to use, flexible, and efficient set of functions for network traffic anonymization. AnonTool operates either on live traffic or on captured packet traces in the tcpdump format. Currently AnonTool supports selective anonymization for the fields of the following protocols: IP, TCP/UDP, HTTP, FTP, Netflow v5 and v9. Three applications have been implemented on top of this library. One provides basic anonymization functionality for the IP/TCP/UDP protocols, while two others anonymize version 5 and version 9 Netflow datagrams, respectively.

Following functions are available: prefix-preserving anonymization for IP addressses, mapping to intergers for TCP ports, zero TCP/IP options, replace TCP/UDP payload with hash and fix checksums, map IP addressses to integers, remove TCP/UDP payload with hash and print anonymized packets, read packets from ethernet interface or pcap file and dump anonymized packets to pcap file.

Compilation of tool is simple, just with make command in source directory. Tool depends on pcap, libnet, pcre libraries so they should be installed too. On Debian systems this is done by apt-get install libnet1 libpcre3 libpcap0.8.

Application options: ./anon_packets [ -f input_file | -i interface ] [-a -t -d -c -z -p -h] output

   -a ANONYMIZE IP addresses (PREFIX, MAP, ZERO)
   -t ANONYMIZE TCP ports (MAP, ZERO)
   -d ANONYMIZE TCP/UDP payload (STRIP, ZERO, HASH)
   -c Fix checksums
   -z Zero tcp and ip options
   -p Print anonymized packets
   -h Print this help message

When performing anonymization of Netflow datagrams, every field that might be included in a Netflow datagram can be anonymized. The application can read from a pcap compatible trace file or a live NIC and dump the anonymized packets to a pcap compatible file. One could as easily feed the file to another NIC through the use of UNIX pipes & tools such as tcpreplay.

./anonymize_netflow_v5 [-i -f -a -t -d -c -z -p -h] output

       -i Open interface as input (INTERFACE)
       -f Open file as input (FILE)
       -a ANONYMIZE IP addresses in Netflows (PREFIX, MAP,(IPv4 only) ZERO)
       -t ANONYMIZE TCP ports in Netflows (MAP, ZERO)
       -c Fix checksums
       -z Zero TCP flags
       -p Print anonymized packets
       --FIELD FUNCTION This kind of argument allows to anonymized any desired FIELD using any available FUNCTION

'./anonymize_netflow_v9' - same options as above


Anonymization API

Predefined Protocol Field Names for use with Netflow datagrams:

  • Common to all protocols: PAYLOAD
  • NETFLOW v5: NF5_VERSION, NF5_FLOWCOUNT, NF5_UPTIME, NF5_UNIX_SECS, NF5_UNIX_NSECS, NF5_SEQUENCE, NF5_ENGINE_TYPE, NF5_ENGINE_ID, NF5_SRCADDR, NF5_DSTADDR, NF5_NEXTHOP, NF5_INPUT, NF5_OUTPUT, NF5_DPKTS, NF5_DOCTETS, NF5_FIRST, NF5_LAST, NF5_SRCPORT, NF5_DSTPORT, NF5_TCP_FLAGS, NF5_PROT, NF5_TOS, NF5_SRC_AS, NF5_DST_AS, NF5_SRC_MASK, NF5_DST_MASK
  • NETFLOW v9: NETFLOW_VERSION, COUNT, UPTIME, UNIXSECS, PACKAGESEQ, SOURCEID, FLOWSET_ID, LENGTH, TEMPLATEID, FIELD_COUNT, IN_BYTES, IN_PKTS, FLOWS, PROTOCOL, SRC_TOS, NF9_TCP_FLAGS, L4_SRC_PORT, IPV4_SRC_ADDR, SRC_MASK, INPUT_SNMP, L4_DST_PORT, IPV4_DST_ADDR, DST_MASK, OUTPUT_SNMP, IPV4_NEXT_HOP, SRC_AS, DST_AS, BGP_IPV4_NEXT_HOP, MUL_DST_PKTS, MUL_DST_BYTES, LAST_SWITCHED, FIRST_SWITCHED, OUT_BYTES, OUT_PKTS, MIN_PKT_LENGTH, MAX_PKT_LENGTH, IPV6_SRC_ADDR, IPV6_DST_ADDR, IPV6_SRC_MASK, IPV6_DST_MASK, IPV6_FLOW_LABEL, ICMP_TYPE, MUL_IGMP_TYPE, SAMPLING_INTERVAL, SAMPLING_ALGORITHM, FLOW_ACTIVE_TIMEOUT, FLOW_INACTIVE_TIMEOUT, ENGINE_TYPE, ENGINE_ID, TOTAL_BYTES_EXP, TOTAL_PKTS_EXP, TOTAL_FLOWS_EXP, VENDOR_43, IPV4_SRC_PREFIX, IPV4_DST_PREFIX, MPLS_TOP_LABEL_TYPE, MPLS_TOP_LABEL_IP_ADDR, FLOW_SAMPLER_ID, FLOW_SAMPLER_MODE, FLOW_SAMPLER_RANDOM_INTERVAL, VENDOR_51, MIN_TTL, MAX_TTL, IPV4_IDENT, DST_TOS, IN_SRC_MAC, OUT_DST_MAC, SRC_VLAN, DST_VLAN, IP_PROTOCOL_VERSION, DIRECTION, IPV6_NEXT_HOP BGP_IPV6_NEXT_HOP, IPV6_OPTION_HEADERS, VENDOR_65, VENDOR_66, VENDOR_67, VENDOR_68, VENDOR_69, MPLS_LABEL_1, MPLS_LABEL_2, MPLS_LABEL_3, MPLS_LABEL_4, MPLS_LABEL_5, MPLS_LABEL_6, MPLS_LABEL_7, MPLS_LABEL_8, MPLS_LABEL_9, MPLS_LABEL_10, IN_DST_MAC, OUT_SRC_MAC, IF_NAME, IF_DESC, SAMPLER_NAME, IN_PERMANENT_BYTES, IN_PERMANENT_PKTS, VENDOR_87, NF9_FRAGMENT_OFFSET, FORWARDING_STATUS, SYSTEM, INTERFACE, LINE_CARD, NETFLOW_CACHE, TEMPLATE

Complete List of the Protocol Field Anonymization Functions

  • UNCHANGED: leaves field unchanged. This function takes no arguments.
  • MAP: maps a field to an integer. Each field will have different mapping except SRC_IP and DST_IP which share common mapping as well as SRC_PORT and DST_PORT. The rest of the fields share a common mapping based on their length: fields with length 4 have a common mapping, fields with length 2 have their own and finally fields with length 1 share their own mapping. Mapping cannot be applied to payload and IP/TCP options, only in header fields. This function takes no arguments.
  • MAP_DISTRIBUTION: field is replaced by a value extracted from a distribution like uniform or Gaussian, with user-supplied parameters. The first parameter defines the type of distribution and can be UNIFORM or GAUSSIAN. If type is UNIFORM the next 2 arguments specify the range inside which the distribution selects uniformly numbers. If type is GAUSSIAN the next 2 arguments specify the median and standard deviation. Similarly to MAP function, MAP_DISTRIBUTION can only be applied to IP, TCP, UDP and ICMP header fields, except IP and TCP options.
  • STRIP: removes the field from the packet. Optionally, STRIP may not remove the whole field but can keep a portion of it. The user defines the number of bytes to be kept. STRIP cannot be applied to IP, TCP, UDP and ICMP headers except IP and TCP options and can be fully applied to all HTTP and FTP fields.
  • RANDOM: replaces the field with a random number. This function takes no arguments.
  • FILENAME_RANDOM: a sub-case of RANDOM. If the field is in a filename format, e.g. “picture.bmp” then the extension is left untouched while the filename is replaced by random characters
  • HASHED: field is replaced by a hash value. Supported hash functions are MD5, SHA, SHA_2, CRC32 and AES and * TRIPLE_DES for encryption. Note that MD5, SHA, SHA_2 and CRC32 may generate values with less or greater length than the original field. The hash functions when applied to IP, TCP, UDP and ICMP header fields, their last bytes are used to replace the field. For all the other fields, the padding behavior is supplied as a parameter. If the hashed value has less length, the user can pad the rest bytes with zero by defining PAD_WITH_ZERO or can strip the remaining bytes by defining STRIP_REST as an argument to the function. If the hashed values has length greater than the original field, then the rest of packet contents are shifted accordingly. In all cases, the packet length in protocol headers is adjusted to the new length.
  • PATTERN_FILL: field is repeatedly filled with a pattern . The pattern can be an integer or string. This function takes as a parameter the type of pattern, INTEGER for integer and STR for strings, and the pattern to be used.
  • ZERO: a sub-case of pattern fill where field is set to zero. This function takes no arguments
  • REPLACE: field is replaced by a single value (a string). The packet length is reduced accordingly, based on the length of the replace pattern. The final length cannot exceed the maximum packet size. This function takes the pattern to be used as an argument.
  • PREFIX_PRESERVING: can only be applied to source and destination IP addresses and performs a key-hashing, preserving the prefixes of IP addresses.
  • PREFIX_PRESERVING_MAP: can only be applied to source and destination IP addresses and performs a preserving the prefixes of IP addresses using mapping table.
  • REGEXP: field is transformed according to regular expression. As an example, performing anonymize(p, TCP, PAYLOAD, REGEXP, “(.*) password:(.*) (.*)”,{NULL,”xxxxx”,NULL}) in a packet p we can substitute the value of a “password:” field with the “xxxxx” string. Each “(.*)” in the regular expression indicates a match and the last argument is a set of replacements for each match (NULL leaves match unmodified).
  • CHECKSUM_ADJUST: if we want the anonymized packet stream to be used by other applications, the anonymization modifications to each packet requires careful treatment of the checksum. This function can be only applied to CHECKSUM field.
  • SUBFIELD: with this function we can apply any of the functions defined above to a subfield of the given field. Therefore the arguments of SUBFIELD are the two offsets over the identified protocol field, which are the bounds of the subfield, followed by any of the above field anonymization functions with their parameters. The identified field anonymization function which is passed as parameter to SUBFIELD will be applied to the subfield that is bounded by the given offsets.

Internet2 method

Interent2 uses netflow collector capability in order to hide IP addresses in row netflow data. They use flow-tools as a collector. The flow-nfilter utility will filter flows based on user selectable criteria. Filters are defined in a configuration file and are composed of primitives and a definition. Definitions contain match lines grouped to form logical AND and OR operations on the flow using the selected primitives. A definition may contain the invert command which will invert the result of the evaluation.

Internet2 uses masking of 11 low-order bits in each IP address in a netflow data.

This method is easy to use, but unfortulately can be used only if flow-tools is used as a collector tool.

Nfdump method

Nfdump uses the the Crypto-PAn module to anonymize IP addresses in Netflow data (CriptoPAN). The anonymization can be performed only after the capturing process has taken place. This is done by supplying a key, which is either a 32 character string or a 64 digit hex string starting with 0x, to the command line tool that extracs flow information from raw files. Thus IP addresses are anonymized before they are printed or saved to ascii file. This means the filter applies to the original IP address. The construction of Crypto-PAn then preserves the secrecy of the key and the (pseudo)randomness of the mapping from an original IP address to its anonymized counterpart. The mapping besides is one-to-one,consistent across traces and prefix preserving.

CANINE

You can download the tool on CANINE download page. Some notes about the tool (more detail on the tool page):

'Canine is Netflow datagrams convertor and anonymizer. NetFlow tools often struggle with:

  1. NetFlows come in many different, incompatible formats,
  2. the sensitivity of NetFlow logs can hinder the sharing of these logs and thus make it difficult for developers to get real data to use.

Canine attempts to solve this two problems.

As a converter, CANINE augments existing flow tools as it enables tools working exclusively with one type of NetFlows to operate on data from NetFlows in other formats. This is very beneficial given the fact that different types of NetFlows can come from complementary sources as the format is often tied to the routing hardware or computer collecting the data. Currently, CANINE provides support for the following NetFlow formats:

  • Cisco NetFlow v5
  • Cisco NetFlow v7
  • NFDUMP
  • CiscoNCSA (fixed length binary format derived from Cisco flows)
  • ArgusNCSA (fixed length ASCII format derived from Argus-2.0.5 flows)

As an anonymizer, CANINE addresses problems with sharing sensitive logs. People often have concerns about information disclosure when publishing results or performing demonstrations that utilize sensitive NetFlow logs. CANINE provides multiple methods of anonymizing the following fields:

  • IP address
  • Timestamp
  • Port number
  • Protocol number
  • Byte count

Canine is Java program, however available are compiled executables for Windows, Linux, Mac OS and Solaris. To aquire Canine user registration is needed.'



Back to Flow monitoring page

Personal tools