AutoCloseOnNagiosRecoveryMessages

From Request Tracker Wiki
Revision as of 11:08, 26 June 2010 by AndyHarrison (talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This tip is based on http://marc.free.net.ph/message/20040319.180325.27528377.en.html, a e-mail from Todd Chapman to the rt-users mailing list (Mars - 2004).

We use Nagios to check if our machines ar up and working. Every time something strange happens (swap use is too high, CPU load is above 10, and so on ) it sends an e-mail with a subject like " * PROBLEM boxxor/CPU load os CRITICAL ". As soon as things back back to normal it sends another message " RECOVERY boxxor/CPU load os OK *". So, this will create two tickets in RT - two tickets that ougt to be manually merged and closed. To make things easier here I adapted the above script to merge ALL pending open/new PROBLEM messages related to a given RECOBERY message and automatically close/resolve these tickets.

Description: Merge Into Existing Ticket on match Condition: OnCreate

Action: User Defined Custom action preparation code:

1;

Custom action cleanup code:

# If the subject of the ticket matches a pattern suggesting # that this is a Nagios RECOVERY message AND there is # an existing ticket (open or new) in the "General" queue with a matching # "problem description", (that is not this ticket) # merge this ticket into that ticket # # Based on http://marc.free.net.ph/message/20040319.180325.27528377.en.html my $problem_desc = undef; my $Transaction = $self->TransactionObj; my $subject = $Transaction->Attachments->First->GetHeader('Subject'); if ($subject =~ /** RECOVERY (\w+) - (.*) OK **/) { # This looks like a nagios recovery message $problem_desc = $2; <code><pre> $RT::Logger-&gt;debug("Found a recovery msg: $problem_desc"); </pre></code> } else { return 1; } # Ok, now let's merge this ticket with it's PROBLEM msg. my $search = RT::Tickets->new($RT::SystemUser); $search->LimitQueue(VALUE => 'General'); $search->LimitStatus(VALUE => 'new', OPERATOR => '=', ENTRYAGGREGATOR => 'or'); $search->LimitStatus(VALUE => 'open', OPERATOR => '='); if ($search->Count == 0) { return 1; } my $id = undef; while (my $ticket = $search->Next) { # Ignore the ticket that opened this transation (the recovery one...) next if $self->TicketObj->Id == $ticket->Id; # Look for nagios PROBLEM warning messages... if ( $ticket->Subject =~ /** PROBLEM (\w+) - (.*) (\w+) **/ ) { if ($2 eq $problem_desc){ # Aha! Found the Problem TICKET corresponding to this RECOVERY # ticket $id = $ticket->Id; # Nagios may send more then one PROBLEM message, right? $RT::Logger->debug("Merging ticket " . $self->TicketObj->Id . " into $id because of OA number match."); $self->TicketObj->MergeInto($id); # Keep looking for more PROBLEM tickets... } } } $id || return 1; # Auto-close/resolve this whole thing $self->TicketObj->SetStatus( "resolved" ); 1;

Upgrade

by Kamil Srot (kamil.srot at nLogy dot com) 26/03/2010

First of all - sorry for my coding, I don't know Perl at all :-( Feel free to upgrade the script and let me know :-)

I use Nagios3 and it comes with nice macro defined making integration with RT much easier.

Here is example of notification, defined in Nagios (commands.cfg):

# 'notify-host-by-rtemail' command definition define command{ command_name notify-host-by-rtemail command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nEventID: $HOSTPROBLEMID$\nLastEventID: $LASTHOSTPROBLEMID$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$ } # 'notify-service-by-rtemail' command definition define command{ command_name notify-service-by-rtemail command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\nEventID: $SERVICEPROBLEMID$\nLastEventID: $LASTSERVICEPROBLEMID$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$ }

Note the $HOSTPROBLEMID$, $LASTHOSTPROBLEMID$, $SERVICEPROBLEMID$ and $LASTSERVICEPROBLEMID$ macros.

The PROBLEMID is new unique ID for the first time, a problem appears and is constant till final RECOVERY. RECOVERY has everytime *PROBLEMID eqal to 0 and LASTPROBLEMID is the *PROBLEMID or all previous notifications.

I use code like this, to process incoming emails and close open tickets and merge the corresponding ones:

# ziskej telo mailu my $T_Obj = $self->TicketObj; my $AttachObj = $self->TransactionObj->Attachments->First; my $content = $AttachObj->Content; # extract EventID and LastEventID my $val = 0; my $EventID = undef; my $LastEventID = undef; if( $content =~ m/^\QEventID:\E\s<em>(\S+)\s</em>$/m ) { $EventID = $1; } if( $content =~ m/^\QLastEventID:\E\s<em>(\S+)\s</em>$/m ) { $LastEventID = $1; } if($EventID == 0) { $val = $LastEventID; } else { $val = $EventID; } # Hledej ticket se stejnym EventID my $TicketsObj = RT::Tickets->new($RT::SystemUser); $TicketsObj->LimitQueue(VALUE => 'Monitoring'); $TicketsObj->LimitCustomField(CUSTOMFIELD => 'NagiosProblemID', OPERATOR => '=', VALUE => $val); if ($TicketsObj->Count > 0) { # nalezeno! my $id = undef; my $ticket; while ($ticket = $TicketsObj->Next) { next if $self->TicketObj->Id == $ticket->Id; $id = $ticket->Id; last; } if ( $id ) { # ...merge into $self->TicketObj->MergeInto($id); # kdyz je EventID = 0 zavirame parent ticket if($EventID == 0) { $self->TicketObj->SetStatus('resolved'); } # ...and exit return 1; } } # hmm, novej ticket. # nechame ho propadnout do fronty # a nastavit NagiosProblemID $self->TicketObj->AddCustomFieldValue( Field => 'NagiosProblemID', Value => $val, RecordTransaction=>0 ); # pokud je to recovery, tak nastavit na resolved if($EventID == 0) { $self->TicketObj->SetStatus('resolved'); }