Stager

From GEANT2-JRA1 Wiki

Contents

Stager

Stager is a system for aggregating and presenting network statistics. Initially Stager was created for presenting NetFlow statistics for the Scampi project. But it's generic design, allowed for other statistics to be included. Today UNINETT presents multiple network statistics system through Stager:

  • NetFlow
  • MPing (Response time measurement)
  • SNMP (Load, Temperature and more)

The reports are fully customizable, and their definitions are stored in the database.

Architecture

Stager is split into a backend and a frontend.

The backend collects data and stores reports with data in a database. Aggregation scripts also handles aggregating hourly statistics into days, weeks, and months, by the customizable aggregate definitions.

Image:Stager_arch.png

The Web frontend presents data in tables, matrices, or plots.

The Stager architecture can be run distributed. Different data collecting scripts can run on different hosts, aggregation on another, and database server can be a separate, and the web-server a dedicated server. Off course Stager can also be run on one host.

Technologies

Stager is using:

  • Perl
  • PHP
  • FlowTools (currently)
  • PostgreSQL

Stager is successfully installed on:

  • Debian Linux (we run on Debian)
  • Redhat Linux
  • Mandrake Linux
  • FreeBSD
  • Solaris
  • Mac OS X (only frontend tested)

Collecting NetFlow data

Stager does not include a flow-collector. Currently Stager uses flow-tools as flow-collector. Flow-tools do listen to the NetFlow UDP port, and stores netflow records in flat files. The Stager backend script get-netflow.pl do call flow-stat to retrieved aggregated reports from the flat-files, parses the reports and stores them in the database. Further aggregation is handled by the Stager aggregate script.

Since flow-tools' lack of IPFIX support, and no current plans schedule for implementation [UNINETT] evaluated to either:

  • Do implement support for IPFIX in flow-tools, or
  • Migrate to another flow-collector

We had a student to look at the source code of flow-tools, concluding that there was no easy path toward IPFIX support in flow-tools.

The plan now is to migrate to the NERD flow-collector (used in JRA2). NERD does already support NetFlow v9, and UNINETT has provided patches toward IPFIX (and extended netflow) support. UNINETT has a beta implementation of SCTP support, and the NERD flow collector is getting very close to full IPFIX compliance.

The NERD flow-collector have no reporting utility, so UNINETT has started the implementation of such a tool (similar to flow-tools' flow-stat or flow-report). UNINETT also plans to implement a converter script, converting the NERD flow file format to the flow-tools file format. This will obviously not work for IPFIX flow data.

Image:Netflow-collection.png

The diagram above shows the possible ways NetFlow can be collected in future versions of Stager.

Changes planned for Stager 2.0

Some changes, probably resulting in release 2.0, may be of interest to JRA1:

  • Code clean-up of front-end toward a more object oriented design. Design have been done keeping in mind that Stager should fit into a SOA.
  • Currently testing import of data from DAG card via MAPI, extended netflow based on IPFIX.
  • Presentation of histograms are now supported in developer tree.
  • Observation point model in database are currently beeing redesigned. The new design is more or less identical to the JRA1 D1.4 Database Schema in JRA1. New features will support for space aggregation of statistics. In example aggregating statistics for all interfaces for a router and attach the statistics to the router group observation point.

Reliability Features

Stager have added some features to avoid missing data when some of the Stager architecture elements are down.

Flow-tools is configured to store raw netflow files for N days. That means Stager has N days to restore from a down-period. Stager will automaticly detect that data is missing for some time period, and try to regenerate data for that time period.

Obviously if flow-capture is failing, or the computer that it runs on, there is no way to restore the lost data.

This means that if the database server goes down, or the Stager script is aborted, data will be collected next time the script is run.

The Stager script also have resource control. We had problems with flow-tools in the case of DoS attacks, because the flow-tools scripts did not scale to handle such huge amount of flows. Therefore we will kill the flow-tools process when it seem too much CPU or memory usage and never finish, only affecting the loss of data for currently running report.

Time aggregation

Stager uses a set of time resolutions; i.e. hour, day, week and month. In each time resolutions, Stager includes a time period, for each slotted time window that contains data.

Stager collects data on the highest resolution (hour) from the raw data (or reporting tool); and a separate aggregation script handles aggregation from:

  • hour to day
  • day to week
  • day to month

Time handling in Stager is pretty generic. A time resolution is defined as an [postgreSQL interval]. That means aggregation to year, or collecting data from 15-minutes intervals should be easily added (yet not tested).

NetFlow data takes up much disk space, therefore purging data is important. Stager basicly only need the raw data files available in a few hours until the hourly scheduled collection script is finished. However you want to store N days of raw netflow data to improve system reliability and avoid data loss resulted from system failure. The length of available raw data is configured with a cronjob with flow-expire.

In the Stager configuration files you setup a time duration to store data for each time resolution. Example setup can be:

  • Hour: store data for 6 months
  • Day: store data for 5 years
  • Week: store data for ever
  • Month: store data for ever

Frontend

Image:Stager frontend.png

Supports tabular reports:

  • Normal view
  • Matrix view
  • Summary view

Supports graph plots:

  • Line plot
  • Cummulative Area Plot
  • Direct Area Plot
  • Piechar
  • 3D Piechart
  • Histogram (yet not available in stable release)

More information

More information available on the Stager homepage.

This text is written by Andreas Solberg.

Statistics from UNINETT is available.

Personal tools