Nagios Exercises




PART I
-----------------------------------------------------------------------------

1. Install Nagios
   
    Do this as root.

    # apt-get install nagios2

    - You will be asked for a password for the nagios admin Web user
    - remember it!

    Now do this so that we can have pretty icons

    # apt-get install nagios-images


2. Create the Web user password file:

    # htpasswd -c /etc/nagios2/htpasswd.users nagiosadmin

New password:         
Re-type new password: 


2. You should already have a working Nagios!

    - Open a browser, and go to

    http://localhost/nagios2/

    - At the login prompt, login as:

        user: nagiosadmin
        pass: 

3. Let's look at the interface together...

    # cd /etc/nagios2/

    # ls -l 
    -rw-r--r-- 1 root root    1598 2007-09-01 00:03 apache2.conf
    -rw-r--r-- 1 root root    9573 2006-12-20 22:20 cgi.cfg
    -rw-r--r-- 1 root root    4653 2006-12-20 22:20 commands.cfg
    drwxr-xr-x 2 root root    4096 2007-09-01 00:03 conf.d
    -rw-r--r-- 1 root root      26 2007-09-01 00:05 htpasswd.users
    -rw-r--r-- 1 root root   30431 2006-12-20 22:20 nagios.cfg
    -rw-r----- 1 root nagios  1293 2006-12-20 22:19 resource.cfg
    drwxr-xr-x 2 root root    4096 2006-12-20 22:20 stylesheets
    
    # ls -l conf.d/
    -rw-r--r-- 1 root root 1687 2006-12-20 22:19 contacts_nagios2.cfg
    -rw-r--r-- 1 root root  413 2006-12-20 22:19 extinfo_nagios2.cfg
    -rw-r--r-- 1 root root 1152 2006-12-20 22:19 generic-host_nagios2.cfg
    -rw-r--r-- 1 root root 1803 2006-12-20 22:19 generic-service_nagios2.cfg
    -rw-r--r-- 1 root root  210 2007-09-01 00:03 host-gateway_nagios2.cfg
    -rw-r--r-- 1 root root  976 2006-12-20 22:19 hostgroups_nagios2.cfg
    -rw-r--r-- 1 root root 2163 2006-12-20 22:19 localhost_nagios2.cfg
    -rw-r--r-- 1 root root  806 2006-12-20 22:19 services_nagios2.cfg
    -rw-r--r-- 1 root root 1609 2006-12-20 22:19 timeperiods_nagios2.cfg




PART II
-----------------------------------------------------------------------------

1. According to what we saw in class, let's add a new host

    - Pick any PC in the room, i.e. something other than pc10!

    # cd /etc/nagios2/conf.d/

    # vi pc10.cfg

define host {
    use         generic-host
    host_name   pc10
    alias       PC 10 at intERLab 
    address     _______________       [pc10's IP address here]
}

    ... Save and quit

2. Let's create a new hostgroup for the occasion, and add our host
   to it

    - Edit the file hostgroups_nagios2.cfg and add a new group:

    # vi hostgroups_nagios2.cfg

define hostgroup {
    hostgroup_name  interlab-pcs
    alias           intERLab PCs
    members         pc10
}

3. Now let's associate some services to that host

    # vi services_nagios2.cfg

    - Find the section called "check that ssh services are running",
      and change the line:

hostgroup_name                  ssh-servers

    to

hostgroup_name                  ssh-servers, interlab-pcs



4. Verify that your configuration file is OK:

    # nagios2 -v /etc/nagios2/nagios.cfg 

    ... You should get :

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the check.


5. Reload Nagios

    # /etc/init.d/nagios2 reload

   NOTES:

   - This is the standard way of updating the nagios configuration.
     There is a bug in the Ubuntu init script (/etc/init.d/nagios2).
     You should do the following instead:

    # /etc/init.d/nagios2 stop
    # /etc/init.d/nagios2 start

     Each time you make changes - otherwise you will end up with
     multiple Nagios instansces running. To resolve this problem
     you can do:

    # ps auxwww | grep nagios
    # killall nagios2
    # /etc/init.d/nagios2 start


6. Go to the web interface (http://localhost/nagios2) and check the host
   you just added


7. Add ALL the PCs in the room!

    - Add all the PCs in the room to the config

    - Check HTTP for all PCs in the room

    - Remember to verify the configuration file!

    - I suggest that you create a single config file called pcs.cfg
      to do this.

    NOTE:

    - This requires a bit of planning, but you should have all the elements
      for doing this...

    - Think well about the logical structure of the files -- it should be
      possible for you to do this without doing too much work!




PART III
-----------------------------------------------------------------------------

1. Now let's create a complete Nagios configuration for our
   classroom network.

   NOTES:

   - This requires more planning. You have switches, routers, the
     and the gateway server. In addition, the IP addresses that you
     use are for your network router, the classroom router, and the
     other network's router depend on your position in the network.

   - You want to use internal IP address for your network's router,
     the gateway router, but the external IP address for the other
     network's router.
 
   - Note that the switches are not running Telnet, they are
     using ssh. So you should do an ssh check on them.

   - The routers, except one, are running ssh.

   - We have two unmanaged switches in our network. These will
     cause us problems later. Don't worry about them now.

   - It is important that you properly define the parent for
     devices. Some examples are given below. Devices can have
     more than one parent, but in our network this is not
     true.

2.) Create a file to define the configuration of your network
    gateway. This is usually a router box, but in our case
    this is the noc box as we are doing NAT in our classroom.
    Thus, the noc box is the parent box for our backbone router.
    Do this in the file "/etc/nagios2/conf.d/

    Sample entry:

# a host definition for the gateway of the default route
define host {
        host_name   gateway
        alias       Default Gateway (the noc)
        address     10.10.10.1
        use         generic-host
        }

define service {
        use             generic-service
        host_name       gateway
        service_description Interface
        check_command   check-host-alive
}


    NOTE:

    - We have defined a default service for this entry as well. A
      simple ping to verify that the host is alive. This is redundant
      as the gateway box in this case is the box on which Nagios is
      running for the classroom. For your use, however, this makes
      sense as you need to verify that your gateway out of the
      classroom network is up.

3.) Create a file to define the configuration for your routers.
    Maybe "/etc/nagios2/conf.d/routers.cfg". there should be
    three entries in this file.

    Sample entry:

define host {
    use         generic-host
    host_name   rtr10.10.1
    alias       router for 10.10.1  on backbone
#   address     10.10.1.254
    address     10.10.11.1
    parents     gw-rtr
}

4.) Create a file to define the configuration for your switches.
    Maybe "/etc/nagios2/conf.d/switches.cfg". There should be
    two entries in this file.

    Sample entry:

define host {
    use         generic-host
    host_name   switch1
    alias       switch 1 mgmt@intERLab
    address     10.10.1.253
    parents rtr10.10.1
}


5.) In the file "/etc/nagios2/conf.d/hostgroups_nagios2.cfg"
    create hostgroups for all the routers, switches and
    pcs in the classroom.

    Sample entry:

# hostgroup definition for AIT intERLab Network Management Workshop
define hostgroup {
        hostgroup_name cisco-routers
        alias          Cisco Routers at AIT intERLab
        members        gw-rtr,rtr10.10.1,rtr10.10.2
}

6.) In the file "/etc/nagios2/conf.d/services_nagios2.cfg" you
    define what groups (not individual devices) will have what
    service checks run on them.

    Sample entry (yours may not be as complex at first):

# check that ssh services are running
define service {
        hostgroup_name                  ssh-servers,interlab-pcs,cisco-routers,internet-srv-ssh,switches
        service_description             SSH
        check_command                   check_ssh
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}

7.) The file "/etc/nagios2/conf.d/extinfo_nagios2.cfg" defines
    details for each device defined. For instance, here are
    some sample entries you could use to build prettier Nagios
    results for your various devices:

================ extinfo_nagios2.cfg ===================
define hostextinfo {
        host_name   rtr10.10.1
        icon_image       cook/router.png
        icon_image_alt   Router
        statusmap_image  cook/router.gd2
}

define hostextinfo {
        host_name   switch1
        icon_image       cook/network_switch.png
        icon_image_alt   Network Switch
        statusmap_image  cook/network_switch.gd2
}

define hostextinfo {
        host_name   pc1
        icon_image       base/ubuntu.png
        icon_image_alt   Debian GNU/Linux
        statusmap_image  base/ubuntu.gd2
}
================ extinfo_nagios2.cfg ===================

    NOTES:

    - You don't have the "ubuntu.*" icons by default. If 
      you get an error about this when restarting Nagios,
      then change "ubuntu.*" to be "linux.*".
    - We have additional images available for you to use.
      You can download these from the Nagios Plugins and
      Add Ons Exchnage site at:

      http://www.nagiosexchange.org/

    - To get the Ubuntu icons for nagios you can do the following:

    # cd /tmp
    # wget http://noc/files/imagepack-ubuntu.tar.gz
    # tar xvzf imagepack-ubuntu.tar.gz
    # cd logos
    # sudo mv * /usr/share/nagios/htdocs/images/logos/base/.


   Now you will have the ubuntu logos available to use in Nagio.


8. If you have gotten here and are still reading you can download
   an entire set of Nagios configuration files for this network
   that will only need a few changes for your machine. These are
   availabe here:

     http://noc/conf/etc/nagios2/

   You can copy the files using wget or scp. For insance:

     # cd /etc/nagios2
     # sudo bash
     # scp -r inst@noc:/var/www/share/conf/etc/nagios2/* .

   would overwrite whatever you have in your /etc/nagios2
   directory and sub-directories with these preconfigured files.

9.) You sill need to update a few files. Including:

     /etc/nagios2/conf.d/routers.cfg
     /etc/nagios2/conf.d/pcs.cfg

    You should make sure that you have the correct IP
    addresses defined in routers.cfg for your network view,
    and you will want to comment out your pcs entry in
    the file pcs.cfg

    Remember to restart Nagios for changes to take affect.



PART IV
-----------------------------------------------------------------------------

1.) Here we will tie in the ability of Nagios and Trac to work
    together to help document your network. The concept if
    quite simple. First, go to your local Trac project install
    page at:

    http://localhost/trac/ait

    Log in as the admin user so that you can edit the Trac
    wiki.

2.) Create an entry for your PC in the wiki. You can do this by
    clicking on the "Edit this page" button and entering in a
    link like this (example for PC1, use your PC number instead):

    [wiki:PC1 PC1] : '''10.10.1.1'''

   Save the page.


3.) Click on the PC1 item that's grey with a question mark. Now
    create this page. Enter in some text about your PC and save
    the page.

4.) In Nagios you need to edit the file:

    /etc/nagios2/conf.d/extinfo_nagios2.cfg

   and update your PCs entry in this file with a line like this:

   notes_url       http://localhost/trac/ait/wiki/PC1

   You can place this on a line after the "host_name" entry.
   Remember to change "PC1" to your PCs number.

5.) Restart Nagios. 

6.) If you look in your Nagios Service Detail view there should now be
    a new icon next to your machine's entry. This looks like a folder.
    Click on this and the URL you entered for the notes_url entry in
    the extinfo_nagios2.cfg file will open. You can, also, click on
    the machines' icon in the graph views, then click again and this
    page will open.



PART V
-----------------------------------------------------------------------------

1.) Now we will create a plug-in for Nagios. This plug-in will do the
    following:

    * Ping a set of (external) servers.
    * If one server is down a warning will be generated.
    * If two servers are down a critical state will be generated.

    This will be part of our scripting session. The instructions for
    doing this are here:

      http://noc/presos/scripting/bash.html
    


PART VI
-----------------------------------------------------------------------------

1.) We will update our Nagios contacts definion,
    "/etc/nagios2/conf.d/contacts_nagios2.cfg" to add a local user to
    that will receive alerts for certain condition.

2.) Next we will add another user for our Trac ticketing system so
    that a ticket is automatically generated for specific events.

3.) The first step in this is to update track with a new plug-in
    called email2trac. You can find this plug-in and more details on
    its installation, use and configuration here:

      https://subtrac.sara.nl/oss/email2trac

    To install the email2trac plug-in do the following:

      # sudo bash
      # cd /usr/local/src
      # wget http://noc/files/email2trac.tar.gz
      # tar xvzf email2trac.tar.gz
      # cd email2trac-0.13
      # ./configure
      # make
      # make install

    This is one of those times when you need to use source to
    install software on your system.

4.) Now you need to configure the email2trac plug-in. Edit the file
    /usr/local/etc/email2trac.conf and change the lines that read:

      project: /data/trac/jouvin

    to read:

      project: /trac/ait

    To etter understand what you are doing, and to see all the various
    options you can set read the more complete configuration docu-
    mentation available here:

      https://subtrac.sara.nl/oss/email2trac/wiki/Email2tracConfiguration

5.) Next we need to create an alias in our email system that will
    receive emails for the trac system and then pipe them through
    the email2trac plug-in to the trac Project installed on your
    machine. To do this edit the file /etc/aliases and add a line
    that reads:

      trac-tickets: "|/usr/local/bin/run_email2trac --project=bas"

6.) Save your changes. Now, we are going to replace the MTA that is
    currently installed on your system with the Postfix MTA. While
    a complete understanding of the use of Postfix is a complex
    topic, the actual installation is trivial under Ubuntu. To
    install Postfix simply do:

      # sudo apt-get install postfix

    - When prompted for the mail server configuration to choose select:

        "Internet site:"

    - When prompted for your system mail name it should be something
      like "pc1", "pc2", etc... (i.e., "pcN"). This is fine. Just
      select "Ok".

    - When prompted where to deliver mail for "postmaster", "root",
      etc enter in:

        inst

      and select "Ok".

    - For the "Other destinations to accept mail question accept
      what is shown by choose "Ok".

    - For "Force synchronous updates on mail queue" select "No".

    - For the networks blocks on which your host should relay mail
      just accept the default of "127.0.0.0/8" and press "Ok".

    - For "Mailbox size limit" select the default of "0" and press
      "Ok".
    - For "Local address extension character" keep the default of
      "+" and press "Ok".

    - For Internet protocols to use select "ipv4" and press "Ok".

    At this point Postfix should finish installing and then start.
        
7.) Now you can test to see if Trac is accepting email and creating
    new tickets. To do this do the following:

      # mail trac-tickets@localhost

      Subject: Ticket Test
      Type in some text... Then, press ENTER and on a newline
      type a single "." and press ENTER again.
      Cc:

      #

    If everything worked you should have just created an email that
    went to trac-tickets@localhost, which is really in your /etc/alias
    files and points to the run_email2trac program in your machines'
    /usr/local/bin directory. This takes your email and creates a
    Trac ticket. To verify this open a web browser and go to:

      http://localhost/trac/ait

    And click on the "View Tickets" link, then click on "Active
    Tickets". Your ticket should appear if everything worked.

    If you like this system, then you would want to become
    familiar with what options are available to you for using the
    email2trac plug-in.


PART VII
-----------------------------------------------------------------------------

1.) Now you have all the bits and pieces necessary to have Trac auto-
    matically generate a notification email to the trac-tickets alias
    for a service check. This, in turn, would generate a ticket in the
    Trac project ticketing system.

2.) In the file /etc/nagios2/conf.d/contacts_nagios2.cfg you need to
    create an entry for the trac-tickets user. This entry goes in the
    "Contacts" section of the file. Here is a sample you can use:

define contact{
        contact_name                    trac-tickets
        alias                           Trac
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    c ; c = critical. Dont' create tickets for other states.
        host_notification_options       d ; d = down. Don't create tickets for other states.
        service_notification_commands   notify-by-email
        host_notification_commands      host-notify-by-email
        email                           trac-tickets@localhost
        }


3.) Next you need to create a contact group that contains the trac-
    tickets contact. In this case our group will only have one user,
    but you can certainly have multiple users in this group. Here
    is a sample entry you can use:

define contactgroup{
        contactgroup_name       tickets
        alias                   email to ticket system for Trac
        members                 trac-tickets
        }

4.) Now, before we can define the service we wish to monitor that
    generate this ticket, for this scenario, we need to create a
    hostgroup with just one entry. Edit the file:

    /etc/nagios2/conf.d/hostgroups_nagios2.cfg

    and add the entry:

# hostgroup definition for AIT intERLab Network Management Workshop
define hostgroup {
        hostgroup_name gateway-router
        alias          Gateway router at AIT intERLab
        members        gw-rtr       
}

5.) Finally, edit the file /etc/nagios2/conf.d/services_nagios2.cfg
    and add the following (long) entry:

# check gw-rtr if live
define service {
        hostgroup_name                  gateway-router
        service_description             PING-RTR
        check_command                   check-router-alive
        use                             generic-service
        notifications_enabled           1
        check_period                    24x7
        normal_check_interval           1
        retry_check_interval            1
        max_check_attempts              3
        notification_period             24x7
        notification_options            w,u,c,r
        contact_groups                  tickets
        notification_interval           0 ; set > 0 if you want to be renotified
}


    Restart Nagios. In theory, if the gateway router on interface
    10.10.10.10 goes down, this should generate a notification email
    that will be delivered to trac-tickets@localhost, which, in turn
    will create a new ticket in our Trac project.

    Did you notice that the notification options here are set to
    "w,u,c,r"? But, in the file contacts_nagios2.cfg the trac-tickets
     contact entry overrides these options with just "c" for critical.
    Thus, only a single email should arrive for each hard-state change
    to critical for this particular device and service, which will
    only generate a single ticket.

    We'll try simuating this in class. You could try simulating using
    a different service to force ticket generation (maybe http service
    on a neighbor's box?).