SA3 PM Trial Functionality Comments
From GEANT2-JRA1 Wiki
Please try all applications and add your comments about functionality. If something was not clear or did not work for you, please comment it here. TCMP MP and Burst applications are not yet available.
User interface where to access applications: https://sa3-pm.cesnet.cz
Contents |
ACAD
Abw application
Tbwtools application
Packetloss application
TCMP application
Burst application
Service applications (HW & SW status and resources consumption)
Any other comments
PSNC
Abw application
- Generating graphs is very quick
- The tool give a detailed view into GEANT2 traffic with high granularity and a possibility to see the content of this traffic
- The interface appearance could be more colourful to attract user. It is rather very simply and depressing in colour.
- I agree. I am not good at fancy graphics. If anyone has particular suggestions, I will try to incorporate them. (SU)
- It would be more clear if you separate time and date in "Start or end time:" as now it's close to each other and it's not clear what value I choose
- Done. (SU)
- I think it would be much better if a user can specify Start and' End time instead of Start and "Time length:". I think it's more natural
- What does the table above graphs mean? There is no description of it's purpose
- The table was mostly for debugging, it was removed. (SU)
- When selecting "Predefined:" somne value from the table below should automaticaly be checked or it should be written that you must choose something there. When one usually sees "Predefined" one thinks everything is already defined.
- The application starts with "predefined" and "last 10 mins" selected. You can manually unclick all predefined possibilities, in which case it will not generate graphs). I will see if re-clicking on "predefined" can re-enable some of the predefined options. (SU)
- Are the list of application protocols static or it's possible to extend it?
- Monitored application protocols are set in abw configuration file on each monitoring station. It can be edited and abw restarted. We initially set conservatively small number of protocols to check CPU load with real traffic on each monitoring station. We plan to check it and if there is sufficient reserve of CPU power, we will add more protocols. I agree that it should be mentioned somewhere, which protocols are currently monitored. (SU)
- Numerical values will be very helpful at least current values
Tbwtools application
We currently didn't enable these kind of stress tests as the monitoring station is behind firewall for security purposes. Generally in order to be useful such service would have to be installed in each PoP to make throughput tests between a user and some PoPs. And for this purposes we would rather see perfSONAR BWCTL MP installed in future.
- There are two advantages of Tbwtools over bwctl - it checks connection all the way from or to user's PC (not only betwen two servers in the network) and it provides feedback about what limited throughput (percentage of time when it was limited by sender, receiver, network, as well as number of characteristics about the connection). (SU)
Packetloss application
I think it is useful for specific purpose maybe grids or some scientific applications where very precise measurements must be done and one has to observe network behaviour in details on a specific link. But generally in comparison to standard ping it is expensive to have such measurements i.e one would need passive monitoring stations at each end of the link. For us it's enough to rely on active ping tool.
- I selected Host1 gn2-pm1.switch and then FlowsCount but I got error: "ERROR: no DS called 'h3d1' in 'var/lib/packetloss/data/packetloss_flows.rrd"
- It works for me, I am not sure what was wrong. Could you repeat the test now? (SU)
- Works now, I have fixed it. (AF)
- What is SmartMax? It is not explained
- Context tooltip added. SmartMax has the same functionality as "Draw maximum line: Auto" in ABW. (AF)
TCMP application
not available on April 16
Burst application
not available on April 16
Service applications (HW & SW status and resources consumption)
These are administrative pages so they are more useful for servers administrators than for users. They provide a quick useful overview of the health of the servers and services.
- HW & SW status
- What is in fact shown in the map? How are the colours calculated?
- It would be nice to explain "mapi, dimapi" which appear in the table.
- I think there is no need to repeat these words in table's cells. They are in the column's name.
- Resources
- Values are not explained e.g. I don't know how should I interpret "0/400%"(!).
- Bandwidth usage and protocols
- Where is this usage? On a management link - it is not clear
- It is one-way graphical overview of abw results. This should be now clear, see the application table on top of the main web page, there is a "abw-map" link. (SU)
- Most values are not available. For PIONIER nothing is available. I don't know why.
- It should be working now. (SU)
- The table linking colors to % probably needs a note what it is
- The color gauge was added. (SU)
- What is "protocols" in this view? I can't see any
- Good note, corrected (removed). (SU)
- Where is this usage? On a management link - it is not clear
Any other comments
- Please do not write "PSNC" everywhere. It's the institution name while you should use everywhere "PIONIER" as an NREN's name.
SWITCH
Abw application
Fascinating! The graphs are generated reasonably quickly, and let me look at link utilization at much better granularity than what traditional tools provide (MRTG, Cricket). The protocol break-down is interesting too, although in this case I have internal (Netflow-based) tools that provide a finer break-down.
It is a little confusing that IPv6 is listed as a "layer 4 protocol". I assume these is mostly native IPv6 traffic, so it should be listed at the same level ("layer 3") as IPv4 or other non-IP protocols (e.g. ARP or CDP). Over time, it should become interesting to break up IPv6 into TCP/UDP/... as well - in our case, total IPv6 traffic already exceeds "UDP" (which probably means: UDP over IPv4!) traffic in volume.
- I changed graph name to L3/L4. If volume of IPv6 traffic increases, we may indeed generate separate graphs for IPv6. As current IPv6 traffic is only a small part of total traffic, it seemed to me convenient to safe number of graphs and just show what part of total traffic is IPv6. (SU)
The scaled-down images on the main query results page provide a decent overview, but the scaling makes the legends unreadable. So it would be preferable to have these smaller images generated without the legends.
- Graphs can be generated twice - for full size and thumbnail viewing, but this would increase time to generate them. I experimented with font size, but it was always either too small in thumbnail or too clunky in full size. For now, I let it be as it is. Should we generate graphs twice at the expense of slower response? (SU)
Tbwtools application
The interface to this tool is daunting, although it seems to generate reasonable defaults for the large and complex Web form. I tried to hit "Run test" with the default settings, but this resulted in cryptic error messages. Well, I think I prefer NDT for now.
- User interface works with just default values. You only need to select measurement point to which you want to make a test connection (which is a step that can hardly be removed). If it did not work with default values, something was probably broken. Could you repeat the test and let me know what it showed? We will edit user interface to more clearly show what steps can be left default. (SU)
Packetloss application
I tried this, but was hampered by some packet loss that is due to CPU overload on the central server in the Czech Republic. I believe this problem is being addressed. Also, output is a little hard to interpret, partly because of the fact that there are two separate passive-monitoring probes in the site that interests me most (SWITCH/CERN) - one for each direction of the GN2-SWITCH access link.
- We improved presentation by integrating two directions (e.g., from SWITCH and to SWITCH) into one graph (it was surprisingly difficult task, thanks to Ales Friedl, who did it). Regarding lost packets on CESNET station, we reduced losses significantly by removing MAPI locking mechanism (which is not needed in our case) and increasing DAG memory buffer, but some losses still remain. We now test increased physical memory, which allowed to further increase DAG buffer, but it is likely that upgrade of the whole server will be needed. (SU)
TCMP application
Doesn't seem to be available yet.
Burst application
This was the application that I was most interested in, but as of 15 April 2008 it isn't operational yet.
Service applications (HW & SW status and resources consumption)
The map views provide a good overview of the status of the probes and of link loads.
The colour scale doesn't match my intuition, in particular the brick-red color that is in the middle of the lightly-used colour spectrum. Red should mean heavily loaded.
- Scale changed. (AF)
HW & SW status
This page renders quickly. It's a bit surprising that there are some notes where the hardware shows an ERROR, but software is still running OK. But on thinking of this it makes sense, such as when everything is running, but the monitoring interface hasn't been connected.
Resources
This page renders very slowly. Maybe this is because the interface itself is running on the machine that has the highest load (the probe at CESNET)? Maybe the CESNET machine's load is so high because I'm using the interface?
- No, it is not becouse of load. There is a delay if some of monitoring machines is down and interface waits a while for connection timeout. Now all machines are running and responses should be immediate. (AF)
Bandwidth usage and protocols
This page renders very slowly. I couldn't find any per-protocol measurements on the overview page, therefore I find the title confusing.
- Delay: same as above. In normal case responses are immediate, but if some machines are down, rendering is delayed as it waits a few seconds for connection timeout. Title fixed by SU. (AF)
