Metrics data and monitoring scenarios
You can monitor the metrics data from your node for a variety of reasons, and in different ways. Some suggested scenarios for monitoring are:
- Risk of out of memory error - Monitor the used memory in your node’s
HeapMemoryUsage
attribute. - High CPU usage - Monitor the
SystemCpuLoad
property of your node, to check for high CPU measurements. - High flow error rate - Check for repeated errors in the flows being used on your node. Flows are the way CorDapps perform their functions, if there is a high level of errors, there may be either an issue with your node, or a bug in the CorDapp or flow itself.
- Network parameter update proposal not accepted - Check to see whether network parameters that you or another party has proposed to the Network Map have yet been accepted. The updates could still be awaiting approval.
- Processing messages takes too long - Measure the time taken for Peer to Peer (P2P) messaging to be processed. If there is a high latency, you can choose to flag this as an error.
- Committing transactions time - Measure how long it takes to commit an executed action on the network.
- Signing transactions time - Where a signature is required for a transaction, you can measure the time being taken for this to be completed.
You can see a complete list, and guidance on monitoring specific scenarios in the Monitoring scenarios docs.
Metrics data
A Corda node exports a number of metrics for the purpose of monitoring the health of the node via JMX.
You can get metrics for your node from these key sources:
- Caches - A Corda node maintains a number of caches. For each of the metrics below, the name of the cache must be supplied in the component field to show metrics for that cache.
- Flows - Flow metrics can be used to measure key data about the activity on your node. Metrics include the total number of flows in flight at a given time, the total number of completed flows, and the total number of flows that failed with an error.
- Actions - Actions are reified IO actions to execute as part of state machine transitions. These metrics are only exposed when the relevant action gets executed for the first time.
- Metering - Metering metrics can be used to get an overview of the performance of commands that are persisted, the number of persisted signing events, the length of a queue of events waiting to be persisted, and more.
- P2P - Messaging between parties can be measured in a number of ways, including metrics for latency between messages being sent and received between nodes, the size of sent messages, the interval between received P2P messages.
- Other metrics - Measure the tine taken to sign a transaction or check whether proposed network parameter updates have been accepted yet.
Take a look at the Node metrics documentation for a complete range of the metrics data available from your node.