Monitoring VMware ESX 3.x, ESXi, vSphere 4 and vCenter Server

The purpose of this article is to describe how op5 Monitor or Nagios used with the Check ESX Plugin can be used to monitor your VMware ESX and vSphere servers. You may monitor either a single ESX(i)/vSphere server or a VMware VirtualCenter/vCenter Server and individual virtual machines. If you have a VMware cluster you should monitor the data center (VMware VirtualCenter/vCenter Server) and not the ESX/vSphere servers by them self.

More information can be found on Monitor Virtual Infrastructure with op5 Monitor

Prerequisites

Before you start you need to make sure you have an account on the server with correct access rights.
In the default installation of VMware ESX/vSphere there is a ‘read only’ profile you should use when creating a new user. That profile has enough rights to be used for monitoring. The user you create must be:

  • member of the group user
  • be based on the profile ‘read only’

You must install the VMware vSphere SDK for Perl on your op5 Monitor server. Please read the how-to about Installing vSphere SDK for Perl for instructions.

This will be done

We will go through:

  • Monitoring a VMWare ESX Datacenter/vCenter
  • Monitoring a VMWare ESX/vSphere Host
  • Monitoring a VMware Virtual host
  • Monitoring a VMware Virtual host trough a Datacenter/vCenter
  • Monitoring a VMware ESX/vSphere Host trough a Datacenter/vCenter

Check commands

Add the required check-commands, if they don’t already exist in your configuration (‘Configure’ -> ‘Commands’ -> ‘Check Command Import’):

You should also define a username and password in /opt/monitor/etc/resource.cfg to hide this information from the CGI:s:

$USER11$=username
$USER12$=password

Note: We’ll use the $HOSTALIAS$ macro in the command_line because we need to use the VM-names as they are defined in your VMware server. Set this name as an Alias in the host definition. These changes doesn’t affect the history of your host.

Commands for ESX(i) Datacenter/vCenter

command_namecommand_line
check_esx3_dc_vm$USER1$/check_esx3 -D $ARG1$ -u $USER11$ -p $USER12$ -l $ARG2$ -s $ARG3$ -N $HOSTALIAS$ -w $ARG4$ -c $ARG5$
check_esx3_dc_host$USER1$/check_esx3 -D $ARG1$ -u $USER11$ -p $USER12$ -l $ARG2$ -s $ARG3$ -H $HOSTADDRESS$ -w $ARG4$ -c $ARG5$

Commands for ESX(i)/vSphere Hosts

command_namecommand_line
check_esx3_host_cpu_usage$USER1$/check_esx3 -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l cpu -s usage -w $ARG1$ -c $ARG2$
check_esx3_host_mem_usage$USER1$/check_esx3 -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l mem -s usage -w $ARG1$ -c $ARG2$
check_esx3_host_swap_usage$USER1$/check_esx3 -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l mem -s swap -w $ARG1$ -c $ARG2$
check_esx3_host_net_usage$USER1$/check_esx3 -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l net -s usage -w $ARG1$ -c $ARG2$
check_esx3_host_vmfs$USER1$/check_esx3 -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l vmfs -s $ARG1$ -w “$ARG2$:” -c “$ARG3$:”
check_esx3_host_runtime_status$USER1$/check_esx3 -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l runtime -s status
check_esx3_host_runtime_issues$USER1$/check_esx3 -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l runtime -s issues
check_esx3_host_io_read$USER1$/check_esx3 -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l io -s read -w $ARG1$ -c $ARG2$
check_esx3_host_io_write$USER1$/check_esx3 -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l io -s write -w $ARG1$ -c $ARG2$


Commands for virtual machines on ESX(i)/vSphere servers

command_namecommand_line
check_esx3_vm_cpu_usage$USER1$/check_esx3 -H $ARG1$ -u $USER11$ -p $USER12$ -N $HOSTALIAS$ -l cpu -s usage -w $ARG2$ -c $ARG3$
check_esx3_vm_mem_usage$USER1$/check_esx3 -H $ARG1$ -u $USER11$ -p $USER12$ -N $HOSTALIAS$ -l mem -s usage -w $ARG2$ -c $ARG3$
check_esx3_vm_swap_usage$USER1$/check_esx3 -H $ARG1$ -u $USER11$ -p $USER12$ -N $HOSTALIAS$ -l mem -s swap -w $ARG2$ -c $ARG3$
check_esx3_vm_net_usage$USER1$/check_esx3 -H $ARG1$ -u $USER11$ -p $USER12$ -N $HOSTALIAS$ -l net -s usage -w $ARG2$ -c $ARG3$
check_esx3_vm_runtime_cpu$USER1$/check_esx3 -H $ARG1$ -u $USER11$ -p $USER12$ -N $HOSTALIAS$ -l runtime -s cpu -w $ARG2$ -c $ARG3$
check_esx3_vm_runtime_mem$USER1$/check_esx3 -H $ARG1$ -u $USER11$ -p $USER12$ -N $HOSTALIAS$ -l runtime -s mem -w $ARG2$ -c $ARG3$
check_esx3_vm_runtime_status$USER1$/check_esx3 -H $ARG1$ -u $USER11$ -p $USER12$ -N $HOSTALIAS$ -l runtime -s status
check_esx3_vm_runtime_state$USER1$/check_esx3 -H $ARG1$ -u $USER11$ -p $USER12$ -N $HOSTALIAS$ -l runtime -s state
check_esx3_vm_runtime_issues$USER1$/check_esx3 -H $ARG1$ -u $USER11$ -p $USER12$ -N $HOSTALIAS$ -l runtime -s issues

Commands for ESX/vSphere Hosts trough your Datacenter/vCenter

command_namecommand_line
check_esx3_dc_host_cpu_usage$USER1$/check_esx3 -D $ARG1$ -u $USER11$ -p $USER12$ -H $HOSTALIAS$ -l cpu -s usage -w $ARG2$ -c $ARG3$
check_esx3_dc_host_mem_usage$USER1$/check_esx3 -D $ARG1$ -u $USER11$ -p $USER12$ -H $HOSTALIAS$ -l mem -s usage -w $ARG2$ -c $ARG3$
check_esx3_dc_host_net_usage$USER1$/check_esx3 -D $ARG1$ -u $USER11$ -p $USER12$ -H $HOSTALIAS$ -l net -s usage -w $ARG2$ -c $ARG3$
check_esx3_dc_host_runtime_issues$USER1$/check_esx3 -D $ARG1$ -u $USER11$ -p $USER12$ -H $HOSTALIAS$ -l runtime -s issues
check_esx3_dc_host_runtime_state$USER1$/check_esx3 -D $ARG1$ -u $USER11$ -p $USER12$ -H $HOSTALIAS$ -l runtime -s state
check_esx3_dc_host_runtime_status$USER1$/check_esx3 -D $ARG1$ -u $USER11$ -p $USER12$ -H $HOSTALIAS$ -l runtime -s status
check_esx3_dc_host_swap_usage$USER1$/check_esx3 -D $ARG1$ -u $USER11$ -p $USER12$ -H $HOSTALIAS$ -l mem -s swap -w $ARG2$ -c $ARG3$
check_esx3_dc_host_io_read$USER1$/check_esx3 -D $ARG1$ -u $USER11$ -p $USER12$ -H $HOSTALIAS$ -l io -s read -w $ARG2$ -c $ARG3$
check_esx3_dc_host_io_write$USER1$/check_esx3 -D $ARG1$ -u $USER11$ -p $USER12$ -H $HOSTALIAS$ -l io -s write -w $ARG2$ -c $ARG3$

Generic commands for ESX(i)/vSphere

There are three generic commands for check_esx3 which could be used if you want to monitor anything not mentioned in the tables above. If you do not have them in your system you may add them with the import functionality in op5 Monitor (‘Configure’ -> ‘Commands’ -> ‘Check Command Import’).

command_namedescription
check_esx3_dcUse this command if you want to monitor (or throuh) a Datacenter/vCenter.
check_esx3_hostUse this command if you want to monitor a ESX(i)/vSphere.
check_esx3_vmUse this one to monitor a single VM.
check_esx_dc_vmUse this command to monitor a single VM trough Datacenter/vCenter

Adding the services

Add the required services, (‘Configure’ -> ‘Host: ‘ -> ‘Go’ -> ‘Services for host ‘ -> ‘Add new service’ -> ‘Go’):

Add the following services (Argumenst are just examples, you need to adjust them to suite your environment).

Services for ESX(i) Datacenter

service_descriptioncheck_commandcheck_command_argsNote
VMware DC VMcheck_esx3_dc_vmVCserver-ip!command!subcommand!
warning!critical
VMware DC Hostcheck_esx3_dc_hostVCserver-ip!command!subcommand!
warning!critical

Services for ESX(i) hosts

service_descriptioncheck_commandcheck_command_argsNote
VMware CPU Usagecheck_esx3_host_cpu_usage80!90*
VMware Mem Usagecheck_esx3_host_mem_usage80!90*
VMware Swap Usagecheck_esx3_host_swap_usage80!90*
VMware Net Usagecheck_esx3_host_net_usage102400!204800**
VMware VMFS main-storagecheck_esx3_host_vmfsmain-storage!15%!10%
VMware Runtime Statuscheck_esx3_host_runtime_status***
VMware Runtime Issuescheck_esx3_host_runtime_issues****
VMware IO Readcheck_esx3_host_io_read40!90*****
VMware IO Writecheck_esx3_host_io_write40!90*****

Services for virtual machines on ESX(i)/vSphere server

service_descriptioncheck_commandcheck_command_argsNote
VMware VM CPU Usagecheck_esx3_vm_cpu_usageesx-host-ip!80!90*
VMware VM Mem Usagecheck_esx3_vm_mem_usageesx-host-ip!80!90*
VMware VM Swap Usagecheck_esx3_vm_swap_usageesx-host-ip!80!90*
VMware VM Net Usagecheck_esx3_vm_net_usageesx-hostip!
102400!204800
**
VMware VM Runtime CPUcheck_esx3_vm_runtime_statusesx-host-ip!80!90*
VMware VM Runtime Memcheck_esx3_vm_runtime_statusesx-host-ip!80!90*
VMware VM Runtime Statuscheck_esx3_vm_runtime_statusesx-host-ip***
VMware VM Runtime Issuescheck_esx3_vm_runtime_issuesesx-host-ip****

Services for virtual machines through your Datacenter/vCenter

service_descr.check_commandcheck_command_argsNote
VMware Host CPU Usagecheck_esx3_dc_host_cpu_usageVCserver-ip!80!90*
VMware Host Mem Usagecheck_esx3_dc_host_mem_usageVCserver-ip!80!90*
VMware Host Swap Usagecheck_esx3_dc_host_swap_usageVCserver-ip!80!90*
VMware Host Net Usagecheck_esx3_dc_host_net_usageVCserver-ip
!102400!204800
**
VMware Host Runtime Statuscheck_esx3_dc_host_runtime_statusVCserver-ip***
VMware Host IO Readcheck_esx3_dc_host_io_readVCserver-ip!40!90*****
VMware Host IO Writecheck_esx3_dc_host_io_writeVCserver-ip!40!90*****

Services for ESX/vSphere Hosts through your Datacenter/vCenter

service_descr.check_commandcheck_command_argsNote
VMware Host CPU Usagecheck_esx3_dc_host_cpu_usageVCserver-ip!80!90*
VMware Host Mem Usagecheck_esx3_dc_host_mem_usageVCserver-ip!80!90*
VMware Host Swap Usagecheck_esx3_dc_host_swap_usageVCserver-ip!80!90*
VMware Host Net Usagecheck_esx3_dc_host_net_usageVCserver-ip
!102400!204800
**
VMware Host Runtime Statuscheck_esx3_dc_host_runtime_statusVCserver-ip***
VMware Host IO Readcheck_esx3_dc_host_io_readVCserver-ip!40!90*****
VMware Host IO Writecheck_esx3_dc_host_io_writeVCserver-ip!40!90*****

Notes:

* Warn and critical in percent.
** Warn and critical in kb/s
*** Anything else than “green” as response results in a Critical state
**** Any issues found results in a Critical state
***** Warn and critical in ms

“” as the last char on each row meens that the command is splitted for readability, should be on one line.

Ranges for Warning and Critical thresholds:

10 < 0 or > 10, (outside the range of {0 .. 10})

10: < 10, (outside {10 .. ∞})

~:10 > 10, (outside the range of {-∞ .. 10})

10:20 < 10 or > 20, (outside the range of {10 .. 20})

@10:20 ≥ 10 and ≤ 20, (inside the range of {10 .. 20})

10 < 0 or > 10, (outside the range of {0 .. 10})


More info can be found here:

http://nagiosplug.sourceforge.net/developer-guidelines.html#THRESHOLDFORMAT

Use the “Test this service” button for the services to see if they work. Once the are correct and working as they should you may add the services to all of your hosts with the clone-function.