Performance alertingSTOR2RRD has build-in alerting feature. Alerting is based on reaching of performance thresholds.
You can define alarms for any storage and its volumes, pools or host.
- Storage total (summary of all pools)
- Host (since v1.40 March 2017)
- io: IO/sec total (read + write)
- io read: IO/sec read
- io write: IO/sec write
- data: data throughput in MB/sec (read + write)
- read: read data throughput
- write: write data throughput
- resp time: response time, latency
- resp time read: response time read
- resp time write: response time write
Emailing. You can place direct email address on each directive, use email groups or default email address.
Alert email has attached graph which shows utilization of the metric for past 25 hours.
Nagios support. You can configure Nagios to pick up alarms from STOR2RRD via standard NRPE module.
Nagios plug-in installation
External alerting via external shell script. Each alert can invoke defined script with given parameters.
You can use it for your integration needs.
- SNMP trap: it will be available since v1.40 (March 2017)
- Alerting plug-ins to other monitoring tools can be developed on demand especially for customers under support contract
- Limit: treshold level in IO per second, MB per seconds or mili seconds
- Peak: The period of time in which avg traffic utilization has to be over the specified limit to generate an alert.
Repeat: Default time in minutes which says how often you should be alerted.
You can specify per storage/pool/volume different value in alert repeat time column of each alert
Exclude hours: time range in hours when alerts will be ignored
It is useful for exclusion of nightly batch jobs which usualy causing a big load.
- Email group: you can create different email groups and direct alarms to them
Email alert example 1
Alert when average write response time for storwize-home and pool01 is greater then 2.5 mili seconds during last 15 mins
STOR2RRD: POOL alert for storwize-home:pool01: response time write, actual value: 2.67 ms (limit 2.5)
STOR2RRD alert Time: 18:24:00 18/06/2016 Storage: storwize-home POOL: pool01 Metric: response time write Average response time during last 15mins: 2.67 ms Limit: 2.5
Email alert example 2
Alert when average overall IO for storage storwize-home is greater then 50 IO/sec during last 5 mins
STOR2RRD: POOL-ALL: alert for storwize-home:all pools: io, actual value: 65 IO/sec (Limit 50)
STOR2RRD alert Time: 19:48:00 18/06/2016 Storage: storwize-home POOL-ALL: all pools Metric: io Average throughput during last 5mins: 65 IO/sec Limit: 50
HW event alerting It will be available since v1.40 (March 2017).