Now that the cluster update is done, I was able to start experiments again. It appears that my strange numbers are still here. I installed some more precise monitoring tools like sar, and tried to figure out what is going on. Here is the CPU usage graph for my application that simples receives UDP/SNMP heartbeat traps sent at a 1ms time interval.
As you can see, the period signal is still there. Trying to figure out what is going on, I did some profiling on my java code to figure out what is going. Here is the result of the profiling.
rank | self | accum | count | trace | method |
---|---|---|---|---|---|
1 | 99.74% | 99.74% | 295735 | 300260 | java.net.PlainDatagramSocketImpl.receive0 |
2 | 0.03% | 99.77% | 95 | 300335 | org.snmp4j.smi.VariableBinding.<init> |
3 | 0.02% | 99.79% | 70 | 300330 | org.snmp4j.smi.OctetString.<init> |
4 | 0.02% | 99.81% | 63 | 300332 | org.snmp4j.smi.OID.<init> |
5 | 0.02% | 99.83% | 62 | 300329 | jp.ac.jaist.snmpd.knowledge.ProcessKnowlegeState.<init> |
So my code is statistically doing nothing, just waiting for the next UDP packet to arrive, at the same time, I get CPU usage that goes up and down, including in supervisor mode. You can also see that the system and user level CPU usages are correlated, so it is clearly UDP packet reception that triggers this stuff. The only good thing is that the problem clearly does not lie in my code.