intERLab at AIT: Network Management Workshop: Nagios Version 2 Exercises Nagios exercise PART I ----------------------------------------------------------------------------- 1. Install Nagios Do this as root. # apt-get install nagios2 - You will be asked for a password for the nagios admin Web user - remember it! Now do this so that we can have pretty icons # apt-get install nagios-images 2. Create the Web user password file: # htpasswd -c /etc/nagios2/htpasswd.users nagiosadmin New password: Re-type new password: 2. You should already have a working Nagios! - Open a browser, and go to http://localhost/nagios2/ - At the login prompt, login as: user: nagiosadmin pass: 3. Let's look at the interface together... # cd /etc/nagios2/ # ls -l -rw-r--r-- 1 root root 1598 2007-09-01 00:03 apache2.conf -rw-r--r-- 1 root root 9573 2006-12-20 22:20 cgi.cfg -rw-r--r-- 1 root root 4653 2006-12-20 22:20 commands.cfg drwxr-xr-x 2 root root 4096 2007-09-01 00:03 conf.d -rw-r--r-- 1 root root 26 2007-09-01 00:05 htpasswd.users -rw-r--r-- 1 root root 30431 2006-12-20 22:20 nagios.cfg -rw-r----- 1 root nagios 1293 2006-12-20 22:19 resource.cfg drwxr-xr-x 2 root root 4096 2006-12-20 22:20 stylesheets # ls -l conf.d/ -rw-r--r-- 1 root root 1687 2006-12-20 22:19 contacts_nagios2.cfg -rw-r--r-- 1 root root 413 2006-12-20 22:19 extinfo_nagios2.cfg -rw-r--r-- 1 root root 1152 2006-12-20 22:19 generic-host_nagios2.cfg -rw-r--r-- 1 root root 1803 2006-12-20 22:19 generic-service_nagios2.cfg -rw-r--r-- 1 root root 210 2007-09-01 00:03 host-gateway_nagios2.cfg -rw-r--r-- 1 root root 976 2006-12-20 22:19 hostgroups_nagios2.cfg -rw-r--r-- 1 root root 2163 2006-12-20 22:19 localhost_nagios2.cfg -rw-r--r-- 1 root root 806 2006-12-20 22:19 services_nagios2.cfg -rw-r--r-- 1 root root 1609 2006-12-20 22:19 timeperiods_nagios2.cfg PART II ----------------------------------------------------------------------------- 1. According to what we saw in class, let's add a new host - Pick any PC in the room, i.e. something other than pc10! # cd /etc/nagios2/conf.d/ # vi pc10.cfg define host { use generic-host host_name pc10 alias PC 10 at intERLab address _______________ [pc10's IP address here] } ... Save and quit 2. Let's create a new hostgroup for the occasion, and add our host to it - Edit the file hostgroups_nagios2.cfg and add a new group: # vi hostgroups_nagios2.cfg define hostgroup { hostgroup_name interlab-pcs alias intERLab PCs members pc10 } 3. Now let's associate some services to that host # vi services_nagios2.cfg - Find the section called "check that ssh services are running", and change the line: hostgroup_name ssh-servers to hostgroup_name ssh-servers, interlab-pcs 4. Verify that your configuration file is OK: # nagios2 -v /etc/nagios2/nagios.cfg ... You should get : Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the check. 5. Reload Nagios # /etc/init.d/nagios2 reload NOTES: - This is the standard way of updating the nagios configuration. There is a bug in the Ubuntu init script (/etc/init.d/nagios2). You should do the following instead: # /etc/init.d/nagios2 stop # /etc/init.d/nagios2 start Each time you make changes - otherwise you will end up with multiple Nagios instansces running. To resolve this problem you can do: # ps auxwww | grep nagios # killall nagios2 # /etc/init.d/nagios2 start 6. Go to the web interface (http://localhost/nagios2) and check the host you just added 7. Add ALL the PCs in the room! - Add all the PCs in the room to the config - Check HTTP for all PCs in the room - Remember to verify the configuration file! - I suggest that you create a single config file called pcs.cfg to do this. NOTE: - This requires a bit of planning, but you should have all the elements for doing this... - Think well about the logical structure of the files -- it should be possible for you to do this without doing too much work! PART III ----------------------------------------------------------------------------- 1. Now let's create a complete Nagios configuration for our classroom network. NOTES: - This requires more planning. You have switches, routers, the and the gateway server. In addition, the IP addresses that you use are for your network router, the classroom router, and the other network's router depend on your position in the network. - You want to use internal IP address for your network's router, the gateway router, but the external IP address for the other network's router. - Note that the switches are not running Telnet, they are using ssh. So you should do an ssh check on them. - The routers, except one, are running ssh. - We have two unmanaged switches in our network. These will cause us problems later. Don't worry about them now. - It is important that you properly define the parent for devices. Some examples are given below. Devices can have more than one parent, but in our network this is not true. 2.) Create a file to define the configuration of your network gateway. This is usually a router box, but in our case this is the noc box as we are doing NAT in our classroom. Thus, the noc box is the parent box for our backbone router. Do this in the file "/etc/nagios2/conf.d/ Sample entry: # a host definition for the gateway of the default route define host { host_name gateway alias Default Gateway (the noc) address 10.10.10.1 use generic-host } define service { use generic-service host_name gateway service_description Interface check_command check-host-alive } NOTE: - We have defined a default service for this entry as well. A simple ping to verify that the host is alive. This is redundant as the gateway box in this case is the box on which Nagios is running for the classroom. For your use, however, this makes sense as you need to verify that your gateway out of the classroom network is up. 3.) Create a file to define the configuration for your routers. Maybe "/etc/nagios2/conf.d/routers.cfg". there should be three entries in this file. Sample entry: define host { use generic-host host_name rtr10.10.1 alias router for 10.10.1 on backbone # address 10.10.1.254 address 10.10.11.1 parents gw-rtr } 4.) Create a file to define the configuration for your switches. Maybe "/etc/nagios2/conf.d/switches.cfg". There should be two entries in this file. Sample entry: define host { use generic-host host_name switch1 alias switch 1 mgmt@intERLab address 10.10.1.253 parents rtr10.10.1 } 5.) In the file "/etc/nagios2/conf.d/hostgroups_nagios2.cfg" create hostgroups for all the routers, switches and pcs in the classroom. Sample entry: # hostgroup definition for AIT intERLab Network Management Workshop define hostgroup { hostgroup_name cisco-routers alias Cisco Routers at AIT intERLab members gw-rtr,rtr10.10.1,rtr10.10.2 } 6.) In the file "/etc/nagios2/conf.d/services_nagios2.cfg" you define what groups (not individual devices) will have what service checks run on them. Sample entry (yours may not be as complex at first): # check that ssh services are running define service { hostgroup_name ssh-servers,interlab-pcs,cisco-routers,internet-srv-ssh,switches service_description SSH check_command check_ssh use generic-service notification_interval 0 ; set > 0 if you want to be renotified } 7.) The file "/etc/nagios2/conf.d/extinfo_nagios2.cfg" defines details for each device defined. For instance, here are some sample entries you could use to build prettier Nagios results for your various devices: ================ extinfo_nagios2.cfg =================== define hostextinfo { host_name rtr10.10.1 icon_image cook/router.png icon_image_alt Router statusmap_image cook/router.gd2 } define hostextinfo { host_name switch1 icon_image cook/network_switch.png icon_image_alt Network Switch statusmap_image cook/network_switch.gd2 } define hostextinfo { host_name pc1 icon_image base/ubuntu.png icon_image_alt Debian GNU/Linux statusmap_image base/ubuntu.gd2 } ================ extinfo_nagios2.cfg =================== NOTES: - You don't have the "ubuntu.*" icons by default. If you get an error about this when restarting Nagios, then change "ubuntu.*" to be "linux.*". - We have additional images available for you to use. You can download these from the Nagios Plugins and Add Ons Exchnage site at: http://www.nagiosexchange.org/ - To get the Ubuntu icons for nagios you can do the following: # cd /tmp # wget http://noc/files/imagepack-ubuntu.tar.gz # tar xvzf imagepack-ubuntu.tar.gz # cd logos # sudo mv * /usr/share/nagios/htdocs/images/logos/base/. Now you will have the ubuntu logos available to use in Nagio. 8. If you have gotten here and are still reading you can download an entire set of Nagios configuration files for this network that will only need a few changes for your machine. These are availabe here: http://noc/conf/etc/nagios2/ You can copy the files using wget or scp. For insance: # cd /etc/nagios2 # sudo bash # scp -r inst@noc:/var/www/share/conf/etc/nagios2/* . would overwrite whatever you have in your /etc/nagios2 directory and sub-directories with these preconfigured files. 9.) You sill need to update a few files. Including: /etc/nagios2/conf.d/routers.cfg /etc/nagios2/conf.d/pcs.cfg You should make sure that you have the correct IP addresses defined in routers.cfg for your network view, and you will want to comment out your pcs entry in the file pcs.cfg Remember to restart Nagios for changes to take affect. PART IV ----------------------------------------------------------------------------- 1.) Here we will tie in the ability of Nagios and Trac to work together to help document your network. The concept if quite simple. First, go to your local Trac project install page at: http://localhost/trac/ait Log in as the admin user so that you can edit the Trac wiki. 2.) Create an entry for your PC in the wiki. You can do this by clicking on the "Edit this page" button and entering in a link like this (example for PC1, use your PC number instead): [wiki:PC1 PC1] : '''10.10.1.1''' Save the page. 3.) Click on the PC1 item that's grey with a question mark. Now create this page. Enter in some text about your PC and save the page. 4.) In Nagios you need to edit the file: /etc/nagios2/conf.d/extinfo_nagios2.cfg and update your PCs entry in this file with a line like this: notes_url http://localhost/trac/ait/wiki/PC1 You can place this on a line after the "host_name" entry. Remember to change "PC1" to your PCs number. 5.) Restart Nagios. 6.) If you look in your Nagios Service Detail view there should now be a new icon next to your machine's entry. This looks like a folder. Click on this and the URL you entered for the notes_url entry in the extinfo_nagios2.cfg file will open. You can, also, click on the machines' icon in the graph views, then click again and this page will open. PART V ----------------------------------------------------------------------------- 1.) Now we will create a plug-in for Nagios. This plug-in will do the following: * Ping a set of (external) servers. * If one server is down a warning will be generated. * If two servers are down a critical state will be generated. This will be part of our scripting session. The instructions for doing this are here: http://noc/presos/scripting/bash.html PART VI ----------------------------------------------------------------------------- 1.) We will update our Nagios contacts definion, "/etc/nagios2/conf.d/contacts_nagios2.cfg" to add a local user to that will receive alerts for certain condition. 2.) Next we will add another user for our Trac ticketing system so that a ticket is automatically generated for specific events. 3.) The first step in this is to update track with a new plug-in called email2trac. You can find this plug-in and more details on its installation, use and configuration here: https://subtrac.sara.nl/oss/email2trac To install the email2trac plug-in do the following: # sudo bash # cd /usr/local/src # wget http://noc/files/email2trac.tar.gz # tar xvzf email2trac.tar.gz # cd email2trac-0.13 # ./configure # make # make install This is one of those times when you need to use source to install software on your system. 4.) Now you need to configure the email2trac plug-in. Edit the file /usr/local/etc/email2trac.conf and change the lines that read: project: /data/trac/jouvin to read: project: /trac/ait To etter understand what you are doing, and to see all the various options you can set read the more complete configuration docu- mentation available here: https://subtrac.sara.nl/oss/email2trac/wiki/Email2tracConfiguration 5.) Next we need to create an alias in our email system that will receive emails for the trac system and then pipe them through the email2trac plug-in to the trac Project installed on your machine. To do this edit the file /etc/aliases and add a line that reads: trac-tickets: "|/usr/local/bin/run_email2trac --project=bas" 6.) Save your changes. Now, we are going to replace the MTA that is currently installed on your system with the Postfix MTA. While a complete understanding of the use of Postfix is a complex topic, the actual installation is trivial under Ubuntu. To install Postfix simply do: # sudo apt-get install postfix - When prompted for the mail server configuration to choose select: "Internet site:" - When prompted for your system mail name it should be something like "pc1", "pc2", etc... (i.e., "pcN"). This is fine. Just select "Ok". - When prompted where to deliver mail for "postmaster", "root", etc enter in: inst and select "Ok". - For the "Other destinations to accept mail question accept what is shown by choose "Ok". - For "Force synchronous updates on mail queue" select "No". - For the networks blocks on which your host should relay mail just accept the default of "127.0.0.0/8" and press "Ok". - For "Mailbox size limit" select the default of "0" and press "Ok". - For "Local address extension character" keep the default of "+" and press "Ok". - For Internet protocols to use select "ipv4" and press "Ok". At this point Postfix should finish installing and then start. 7.) Now you can test to see if Trac is accepting email and creating new tickets. To do this do the following: # mail trac-tickets@localhost Subject: Ticket Test Type in some text... Then, press ENTER and on a newline type a single "." and press ENTER again. Cc: # If everything worked you should have just created an email that went to trac-tickets@localhost, which is really in your /etc/alias files and points to the run_email2trac program in your machines' /usr/local/bin directory. This takes your email and creates a Trac ticket. To verify this open a web browser and go to: http://localhost/trac/ait And click on the "View Tickets" link, then click on "Active Tickets". Your ticket should appear if everything worked. If you like this system, then you would want to become familiar with what options are available to you for using the email2trac plug-in. PART VII ----------------------------------------------------------------------------- 1.) Now you have all the bits and pieces necessary to have Trac auto- matically generate a notification email to the trac-tickets alias for a service check. This, in turn, would generate a ticket in the Trac project ticketing system. 2.) In the file /etc/nagios2/conf.d/contacts_nagios2.cfg you need to create an entry for the trac-tickets user. This entry goes in the "Contacts" section of the file. Here is a sample you can use: define contact{ contact_name trac-tickets alias Trac service_notification_period 24x7 host_notification_period 24x7 service_notification_options c ; c = critical. Dont' create tickets for other states. host_notification_options d ; d = down. Don't create tickets for other states. service_notification_commands notify-by-email host_notification_commands host-notify-by-email email trac-tickets@localhost } 3.) Next you need to create a contact group that contains the trac- tickets contact. In this case our group will only have one user, but you can certainly have multiple users in this group. Here is a sample entry you can use: define contactgroup{ contactgroup_name tickets alias email to ticket system for Trac members trac-tickets } 4.) Now, before we can define the service we wish to monitor that generate this ticket, for this scenario, we need to create a hostgroup with just one entry. Edit the file: /etc/nagios2/conf.d/hostgroups_nagios2.cfg and add the entry: # hostgroup definition for AIT intERLab Network Management Workshop define hostgroup { hostgroup_name gateway-router alias Gateway router at AIT intERLab members gw-rtr } 5.) Finally, edit the file /etc/nagios2/conf.d/services_nagios2.cfg and add the following (long) entry: # check gw-rtr if live define service { hostgroup_name gateway-router service_description PING-RTR check_command check-router-alive use generic-service notifications_enabled 1 check_period 24x7 normal_check_interval 1 retry_check_interval 1 max_check_attempts 3 notification_period 24x7 notification_options w,u,c,r contact_groups tickets notification_interval 0 ; set > 0 if you want to be renotified } Restart Nagios. In theory, if the gateway router on interface 10.10.10.10 goes down, this should generate a notification email that will be delivered to trac-tickets@localhost, which, in turn will create a new ticket in our Trac project. Did you notice that the notification options here are set to "w,u,c,r"? But, in the file contacts_nagios2.cfg the trac-tickets contact entry overrides these options with just "c" for critical. Thus, only a single email should arrive for each hard-state change to critical for this particular device and service, which will only generate a single ticket. We'll try simuating this in class. You could try simulating using a different service to force ticket generation (maybe http service on a neighbor's box?).