Flow Subscription MP/Implementation Details
From GEANT2-JRA1 Wiki
Flow subscription mp navigation
Contents |
Design
- Zebedee for encrypted tunnel (on client and server)
- nfdump for flow collection, anonymization, filtering and replaying.
- Java for communication with perfSONAR framework
Inner Working
Note: this design is deprecated. The subscription manager code has been moved into the java code to simplify the service.
Design document
- June 21, 2006: Detailed design of the flow Subscription MP version 0.3
- July 05, 2006: Detailed design of the flow Subscription MP version 0.4
Service design/implementation details
Java Service
All classes reside in org.perfsonar.service.measurementPoint.flowsubscription
Key classes are:
- FlowTypeMPServiceEngine - (hearth of the service)
- FlowMpRequest & FlowMpResponse - encapsulates requests / responses
- NfReplayControll - Handles relay/replay of flowdata
- SubscriptionService - Used to manage subscriptions
- Subscription -class representing a subscription
- ZebedeeControl - Used to manage zeebedeeTunnels
- ZebedeeTunnel -class representing a zebedeetunnel
Helper classes:
- MessageType - enum of message types
- Pipe - java version of a unix pipe
- NamedthreadFactory - allows one to make threads that are easier to work with when debugging
Subscription
Since connections are essentially stateless this is because the perfsonar protocol is used as an control protocol. Subscriptions are there to tie subsequent requests/responses together. Subscriptions are to the service as sessions are to http. Subscriptions are indentified by thier keys wich are simple integers. Subscriptions are used to store information/keep the state. For example the reference to a zeebedee process. The zeebedee tunnel life cyle is closely tied to a Subscription as it is typically started as a subscription is made and ended as the subscription finishes.
Keepalive
Subscriptions need to be kept alive for order to them not to time out and end. Timing out is checked by SubscriptionService#TimeoutChecker. This is done by comparing the time contained in the subscription against the current time. This is based upon unixtime so winter->summer time should be ok or vice versa.
Handling flowdata
Each exporter(router) that we want to use should export flowdata. All exporters should be specified in the service configuration (service.properties). Upon startup(1) of the service a separate Nfcapd(eamon) is started to capture the flow data for every exporter that appears in the configuration. (So there is a 1-to-1 mapping between exporters and nfcapd processes. This data is stored in a directory, this directory name is derived from the information of the exporter that should be unique(currently this is the hostname/ip but should probably be an URN). Each router has his own directory. The location of the directories is the temp directory as specified by the Java EE specification. I a directory doesn't excist the service will try to create it.
1): Startup is a bit messed here because there is no definition of the service life cycle. In the current version of the base this means when the first request is received.
Relay process
The service monitors the directory in which nfcapd writes. When a new nfcapd file is detected processes are started to relay the received data stored in the nfcapd file to the clients. Because of the way the nfdump tools work a set of processes is started for each client interested in the exporter. After processing is finished the file is deleted and polling of the directory continues. If there are no client that are interested in the data the file will be deleted immediately.
Typically the processing of a file should be quicker then it takes for a new file to be written, if it doesn't the service is choking and the directory will slowly fill up given that the size of the stream stays the same and the amount of client stay the same. Perhaps in future versions of the service the service could detect this situation and disconnect clients/lower the maximum number of connected clients accordingly.
Open standing issues
-
The service uses non-standard messageTypes,(see control protocol for why this doesn't apply anymore) but eventTypes should stil be used. - Ls registration is incomplete (doesn't register topology elements)
See also checklist
Feature requests:
Facilitate better cooperation with the Flow Selection and Aggregation service. Better cooperation would be achieved by making the services work together so nfcapd only needs to capture the data once (this would also allow one reduce the amount of nfcapd processes by 50%) There are two options :
- configure nfcapd's used by flowsama to copy after a file is createded the file to directories used by flowsubscription-mp
- Change the flowsubscription-MP so it moves (instead of deletes) files that it has finished processing to a directory flow selection and aggregation MA reads from.
Actions to be taken to achieve the first:
- Configure nfcapd
- Allow one to disable running nfcapd processes(by the flowsubscription-MP)
- (optional) to allow one to override the default temp dir flowsubscription-MP uses
Actions to be taken to achieve the second:
- Modify code so it moves the files
- Add configuration options to allow one to specify where the file should move to.
Facilitate other channel/tunnel setup To allow flexibility and to allow the client to be able to know which type of tunnel would be used. This requires some support in the protocol. Other advantage is that both client and server would be able to specify there supported tunnel types and one could be selected. TLS/SSL might be an option java has build in support and there is stunnel.
Zebedee
Zebedee is the most easy way to set up this tunnel. Other tunnel cunstructions need more complicated configurations (stunnel), or need to be programmed by ourself (openSSL lib).
Zebedee isn't beeing activily developed anymore (we think), but security flaws are fixed. By now, it supports all the features we need. Also, linux distributions like debian support zebedee and would fix security related issues (and could be used).
The flow receiver will be the zebedee server, and only accept connections from the flow MP, and to a given local UDP port.
The client and server generate private key values. These are used to calculate public values which are exchanged and used to derive a shared secret key using the Diffie-Hellman key agreement mechanism. From this shared key a unique session key is derived to secure an individual connections between client and server.
The CPU load of zebedee on sonar1.amsterdam.surfnet.nl, with 2 3.6 xeon processors, is about 10% per 2mbit netflow stream (on one processor).
We use reverse server client setup procedure, so flow receiver can be behind firewall.
on flow MP subscription servcie side: zebedee -z 0 -u -d -l <inPort>:<clientIP>:<outPort> -T <BetweenPort> -x "udptimeout 5"
on flow subscriber side: zebedee -z 0 -u -s -d -c <serverIP> -T <betweenPort> -x "udptimeout 5"
- inPort: zebedee port at flow MP side where nfreplay should dump flow into (11110 + clientID)
- outPort: the port that the subscriber supplied. The port where it wants to receive the flows
- betweenPort: port used by zebedee to setup connection (22220 + ID)
- clientIP: de flow subscriber
- serverIP: de flow subscription service
Anonymization
we use nfdump anonimyzation; based upon cryptopan. http://www-static.cc.gatech.edu/computing/Networking/projects/cryptopan/
This supports:
- One-to-one The mapping from original IP addresses to anonymized IP addresses is one-to-one.
- Prefix-preserving In Cyrpto-PAn, the IP address anonymization is prefix-preserving. That is, if two original IP addresses share a k-bit prefix, their anonymized mappings will also share a k-bit prefix.
- Consistent across traces Crypto-PAn allows multiple traces to be sanitized in a consistent way, over time and across locations. That is, the same IP address in different traces is anonymized to the same address, even though the traces might be sanitized separately at different time and/or at different locations.
- Cryptography-based To sanitize traces, trace owners provide Crypto-PAn a secret key. Anonymization consistency across multiple traces is achieved by the use of the same key. The construction of Crypto-PAn preserves the secrecy of the key and the (pseudo)randomness of the mapping from an original IP address to its anonymized counterpart.
control protocol
The flow MP makes use of the infamous 'using nmwg as control protocol' approach. Within perfsonar there is/was a lot of debate over performance but they are mostly due to the base doing inefficient things.(imo) anyways for this approach to work one needs a couple of pre-requirements:
- data stream that is being controlled should be an pre-existing widely accepted open standard.
- tooling needs to be available to consume the data.
