Monitoring in NServiceBus is easier than in regular 3-tier systems due to its use of queuing and message based communication.
When a system is broken down into multiple processes each having its own queue, we can quickly zoom in on which process is the bottleneck by looking at how many messages there are (on average) in each queue.
The only issue is that without knowing the rate of messages coming into each queue, and the rate at which messages are being processed from each queue, we can't know how long messages are waiting in each queue - the primary indicator of a bottleneck.
Unfortunately, despite the many performance counters Microsoft provides for MSMQ (including messages in queues, machine-wide incoming and outgoing messages per second, and the total messages in all queues) there is no built-in performance counter for the time it takes a message to get through each queue.
NServiceBus Performance Counters
As a part of the NServiceBus installation, 2 new performance counters are installed underneath the new "NServiceBus" category. The first, "Critical time", monitor the age of the oldest message in the queue. This takes into account the whole chain from the message being sent on the client machine until successfully processed by the server. You should define a SLA for each of your endpoints and use the CriticalTime counter to make sure that you're adhering to it.
Our next counter is called "Time to SLA breach" and acts as a early warning system that tells you the number of seconds left until the SLA for the particular endpoint is breached. This gives you a system wide counter that can be monitored without needing to put the SLA into your monitoring software. Just set that alarm to trigger when the counter go below X where is the time that your operations team needs to be able to take actions to prevent the SLA to be breached. To enable this feature you need to define the endpoint SLA and that is done by adding the [EndpointSLA] attribute on your endpoint configuration.If you're selfhosting you need to use the Configure.SetEndpointSLA() method on the fluent API instead. All processes running with the NServiceBus will collect this information and the counters are enabled by default.
Since all performance counters in Windows are exposed via Windows Management Instrumentation (WMI) it is very straightforward to pull this information into your existing monitoring infrastructure.
The following video shows NServiceBus performance counters, and demonstrates their usages.
If the system being monitored was designed according to the NServiceBus best practice of having each process (and by corollary each queue) handle only a single message type, you could then know how long each type of messages is waiting in the system.
This would enable you to provide the business with information on a use-case by use-case basis.
The business could, in turn, specify SLA requirements per use case which could then be monitored.
Based on this information, each process could be scaled independently using the distributor to make sure it stays within required service levels. This is Business Service Management (BSM) at its finest.
You might also be interesting in our support for auditing