Difference between revisions of "AutoCloseOnNagiosRecoveryMessages"

From Request Tracker Wiki
Jump to navigation Jump to search
(No difference)

Revision as of 17:15, 15 April 2011

Problem Description

We use Nagios to check if our machines ar up and working. Every time something strange happens (swap use is too high, CPU load is above 10, and so on ) it sends an e-mail with a subject like " * PROBLEM boxxor/CPU load os CRITICAL *". As soon as things back back to normal it sends another message " * RECOVERY boxxor/CPU load os OK *". So, this will create two tickets in RT - two tickets that ougt to be manually merged and closed. To make things easier here I adapted the above script to merge ALL pending open/new PROBLEM messages related to a given RECOBERY message and automatically close/resolve these tickets.

History

  • Mar 2004 - original version from Todd Chapman extracted from an email message
  • Nov 2009 - Sunnavy uploads plugin to the CPAN
  • Mar 2010 - Kamil's simplification of Todd's variant (requires Nagios3)

Solutions

Original Todd's version

Description: Merge Into Existing Ticket on match Condition: OnCreate

Action: User Defined Custom action preparation code:

1;

Custom action cleanup code:

# If the subject of the ticket matches a pattern suggesting
  # that this is a Nagios RECOVERY message  AND there is
  # an existing ticket (open or new) in the "General" queue with a matching
  # "problem description", (that is not this ticket)
  # merge this ticket into that ticket
  #
  # Based on http://marc.free.net.ph/message/20040319.180325.27528377.en.html
  
  my $problem_desc = undef;
  
  my $Transaction = $self->TransactionObj;
  my $subject = $Transaction->Attachments->First->GetHeader('Subject');
  if ($subject =~ /\*\* RECOVERY (\w+) - (.*) OK \*\*/) {
      # This looks like a nagios recovery message
      $problem_desc = $2;
  
      $RT::Logger->debug("Found a recovery msg: $problem_desc");
  } else {
      return 1;
  }
  
  # Ok, now let's merge this ticket with it's PROBLEM msg.
  my $search = RT::Tickets->new($RT::SystemUser);
  $search->LimitQueue(VALUE => 'General');
  $search->LimitStatus(VALUE => 'new', OPERATOR => '=', ENTRYAGGREGATOR => 'or');
  $search->LimitStatus(VALUE => 'open', OPERATOR => '=');
  
  if ($search->Count == 0) { return 1; }
  my $id = undef;
  while (my $ticket = $search->Next) {
      # Ignore the ticket that opened this transation (the recovery one...)
      next if $self->TicketObj->Id == $ticket->Id;
      # Look for nagios PROBLEM warning messages...
      if ( $ticket->Subject =~ /\*\* PROBLEM (\w+) - (.*) (\w+) \*\*/ ) {
          if ($2 eq $problem_desc){
              # Aha! Found the Problem TICKET corresponding to this RECOVERY
              # ticket
              $id = $ticket->Id;
              # Nagios may send more then one PROBLEM message, right?
              $RT::Logger->debug("Merging ticket " . $self->TicketObj->Id . " into $id because of OA number match.");
              $self->TicketObj->MergeInto($id);
              # Keep looking for more PROBLEM tickets...
          }
      }
  }
  
  $id || return 1;
  # Auto-close/resolve this whole thing
  $self->TicketObj->SetStatus( "resolved" );
  1;
  

Extension from Sunnaby

RT-Extension-Nagios

== Kamil's version for Nagios3 and newer

by Kamil Srot (kamil.srot at nLogy dot com) 26/03/2010

First of all - sorry for my coding, I don't know Perl at all :-( Feel free to upgrade the script and let me know :-)

I use Nagios3 and it comes with nice macro defined making integration with RT much easier. Here is example of notification, defined in Nagios (commands.cfg):

# 'notify-host-by-rtemail' command definition
  define command{
        command_name    notify-host-by-rtemail
        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nEventID: $HOSTPROBLEMID$\nLastEventID: $LASTHOSTPROBLEMID$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
        }
  
  # 'notify-service-by-rtemail' command definition
  define command{
        command_name    notify-service-by-rtemail
        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\nEventID: $SERVICEPROBLEMID$\nLastEventID: $LASTSERVICEPROBLEMID$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
        }
  
  
  

Note the $HOSTPROBLEMID$, $LASTHOSTPROBLEMID$, $SERVICEPROBLEMID$ and $LASTSERVICEPROBLEMID$ macros.

The *PROBLEMID is new unique ID for the first time, a problem appears and is constant till final RECOVERY. RECOVERY has everytime *PROBLEMID eqal to 0 and LAST*PROBLEMID is the *PROBLEMID or all previous notifications.

I use code like this, to process incoming emails and close open tickets and merge the corresponding ones:

# ziskej telo mailu
  my $T_Obj = $self->TicketObj;
  my $AttachObj = $self->TransactionObj->Attachments->First;
  my $content = $AttachObj->Content;
  
  # extract EventID and LastEventID
  my $val = 0;
  my $EventID = undef;
  my $LastEventID = undef;
  if( $content =~ m/^\QEventID:\E\s*(\S+)\s*$/m ) {
   $EventID = $1;
  }
  if( $content =~ m/^\QLastEventID:\E\s*(\S+)\s*$/m ) {
   $LastEventID = $1;
  }
  
  if($EventID == 0) {
   $val = $LastEventID;
  } else {
   $val = $EventID;
  }
  
  # Hledej ticket se stejnym EventID
  my $TicketsObj = RT::Tickets->new($RT::SystemUser);
  $TicketsObj->LimitQueue(VALUE => 'Monitoring');
  $TicketsObj->LimitCustomField(CUSTOMFIELD => 'NagiosProblemID', OPERATOR => '=', VALUE => $val);
  
  if ($TicketsObj->Count > 0) {
   # nalezeno!
   my $id = undef;
   my $ticket;
   while ($ticket = $TicketsObj->Next) {
    next if $self->TicketObj->Id == $ticket->Id;
    $id = $ticket->Id;
    last;
   }
   if ( $id ) {
    # ...merge into
    $self->TicketObj->MergeInto($id);
    # kdyz je EventID = 0 zavirame parent ticket
    if($EventID == 0) {
     $self->TicketObj->SetStatus('resolved');
    }
    # ...and exit
    return 1;
   }
  }
  
  # hmm, novej ticket.
  # nechame ho propadnout do fronty
  
  # a nastavit NagiosProblemID
  $self->TicketObj->AddCustomFieldValue( Field => 'NagiosProblemID', Value => $val, RecordTransaction=>0 );
  # pokud je to recovery, tak nastavit na resolved
  if($EventID == 0) {
   $self->TicketObj->SetStatus('resolved');
  }