IntegrateSphinx: Difference between revisions
m (→Intro) |
m (13 revisions imported) |
(No difference)
|
Revision as of 15:11, 6 April 2016
Contributed by Cris
Integrating the Sphinx full-text search engine in RT
What is Sphinx?
Sphinx is a free open-source SQL full-text search engine. As they put it on their site: How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.
Basically, Sphinx scans your database and builds his own indexes for blazingly-fast full-text searches.
Why?
Well, we needed fast full-text searches, and what RT offers is not enough. It is slow and it causes a high load on the db server.
I saw a reference to Sphinx on the RT users mailing list, and - after exploring a few alternatives and verifying no one had already did it - proceeded to integrate it into our RT installation.
I am going to illustrate here all the steps we put in place to have Sphinx completely integrated in RT. The work required a couple of weeks of my spare time and that of two other colleagues.
Unfortunately I am not able to package this as a proper extension, but I hope this docs will help those who want to do the same. Maybe someone will feel inspired enough to package it for the benefit of the community, or maybe Jesse will decide to integrate it into RT out of the box. Who knows?
First: setting up Sphinx
Intro
First of all, you have to go to the Sphinx homepage, download and compile the sources. Or you can choose among the provided binary packages.
We decided to go with version 1.10-beta, the most recent at the time of our installation.
At this point you have to decide if you want to use real-time indexes or the more traditional full+delta approach to indexes.
I will describe here briefly the meaning of these two options; if you want more infos please refer to the Sphinx homepage.
Real-time indexes (on the Sphinx site they call them RT-indexes... quite confusing, uh?) are a clever way to have your indexes always updated, up to the millisecond. They are however still in beta (as of 1.10-beta) and they are more complex to set up and update. We wanted a quick solution, so we chose the traditional way.
The traditional way consists of having two indexes: one big index comprising all tickets up to a certain date (or to a certain ID), and a small delta index for the rest of the tickets. You can use only one big index, but it will take a few minutes to build it so you cannot refresh it too frequently. If you use the full+delta approach you can build the big index e.g. once a day and the delta index once every few minutes.
Presently we have about 15 thousand tickets in our RT db; we build the full index every morning, indexing everything up to the first day of the current month: it takes about 8 minutes. The delta index (from start of month to now) is built every 5 minutes and it takes about a couple seconds.
At this point I should probably say that we are running RT on two solaris servers: one for the presentation and one for the DB. Sphinx runs on the DB side.
Sphinx setup
First thing to do is to tell Sphinx where your database is, what kind of database engine you use and how to scan your data.
You tell Sphinx all these things in the file sphinx.conf (we have it in /usr/local/etc). Here is our sphinx.conf:
# # Sphinx configuration # source rt3 { type = mysql sql_host = localhost sql_user = AMENDED sql_pass = AMENDED sql_db = rt3 sql_port = 3306 # optional, default is 3306 sql_query = \ select TK.id as TicketNum, \ TK.Creator as CreatedBy, \ TK.Queue as Queue, \ unix_timestamp(convert_tz(TK.Created,'GMT','Europe/Rome')) as Created, \ TK.Subject as Subject, \ group_concat(cast(A.Content as char)) as Content \ from Tickets TK, Transactions T, Attachments A \ where ((T.ObjectId = TK.id) or (T.ObjectId is null)) \ and A.TransactionId = T.id \ and T.ObjectType = 'RT::Ticket' \ and A.ContentType = 'text/plain' \ and A.Content is not null \ and TK.Created < concat(year(now()),'-',month(now()),'-01') \ group by TK.id \ order by TK.id, T.id sql_attr_uint = Created sql_attr_uint = Queue sql_attr_uint = CreatedBy sql_query_info = SELECT * FROM Tickets WHERE id=$id } source rt3delta : rt3 { sql_query = \ select TK.id as TicketNum, \ TK.Creator as CreatedBy, \ TK.Queue as Queue, \ unix_timestamp(convert_tz(TK.Created,'GMT','Europe/Rome')) as Created, \ TK.Subject as Subject, \ group_concat(cast(A.Content as char)) as Content \ from Tickets TK, Transactions T, Attachments A \ where ((T.ObjectId = TK.id) or (T.ObjectId is null)) \ and A.TransactionId = T.id \ and T.ObjectType = 'RT::Ticket' \ and A.ContentType = 'text/plain' \ and A.Content is not null \ and TK.Created between concat(year(now()),'-',month(now()),'-01') and last_day(now()) \ group by TK.id \ order by TK.id, T.id } index fulltext1 { source = rt3 path = /var/data/fulltext1 docinfo = extern charset_type = sbcs min_infix_len = 3 enable_star = 1 } index fulltextdelta: fulltext1 { source = rt3delta path = /var/data/fulltextdelta } indexer { mem_limit = 32M } searchd { listen = 9312 listen = 9306:mysql41 log = /var/log/searchd.log query_log = /var/log/query.log read_timeout = 5 max_children = 30 pid_file = /var/log/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 0 unlink_old = 1 workers = threads # for RT to work }
Let's see what we have in this configuration file.
First we define a data source, that I have named (with lots of fantasy!) rt3
. A data source tells Sphinx where to get data to feed its indexes. We tell Sphinx the type of source (mysql
), where it is (localhost
), and how to retrieve the data (the sqlquery
).
There is one VERY important thing in this section: every column that is returned by the query is indexed for full text search, unless you declare it as an attribute. Attributes cannot be searched for text, but they allow you to filter the results.
In our configuration, we return Subject and Content as searchable fields, while the three sql_attr_uint
attributes can be used to refine the search (i.e. filter).
NOTE: We had previously defined the three attributes according to their content (e.g. the Created attribute was defines as sql_attr_timestamp
), but we have discovered that this configuration would not allow us to use those attributes for filtering. It seems Sphinx can only use sql_attr_uint
for filtering... I don't know if it's a temporary limitation, a bug, or working as designed.
The sql_query_info has to return a hit for every $id ($id is an implicit variable, which contains the first column returned by the sql_query. It MUST be an integer.).
We then define another data source (rt3delta
) which we derive from rt3
. Everything defined in the rt3
data source is inherited by rt3delta
, save for what we override (sqlquery
in this case). As you can see, the queries are very similar; only the date range is different.
After data sources we define indexes.
First index is fulltext1
. Everything should be pretty intuitive, but for the two min_infix_len
and enable_star
fields. By default Sphinx only indexes full words, but it would be very nice to be able to search for partial words by using wildcard characters. So here it is: min_infix_len
tells Sphinx the minimum length of subwords to index, and enable_star
tells it that we want to be able to use star as wildcard character. This way you can search for the word minimum by using min*
or *nim*
, but not by using mi*
(because we told Sphinx the minimum length for subwords is 3 chars).
The second index is fulltextdelta
, which is derived from fulltext
. As we saw above, everything is inherited but for things we explicitely override.
At last we give some parameters for the inner working of indexer
and searchd
, which are two of the command line tools of Sphinx. The values shown here are good defaults.
Building the indexes
Now that we have a working configuration, it's time to build the indexes.
You can initialize the indexes with the command:
/usr/local/bin/indexer --all
This will build all the indexes, and it will give you an idea of the time and disk space required for each index.
For production use however, you want to put the commands in cron
so that indexes are refreshed regularly and automatically.
We refresh the fulltext1
index once a day at 7 a.m., with this command:
/usr/local/bin/indexer --rotate fulltext1
Where --rotate
allows the index to be substituted while the search daemon is running.
We refresh the fulltextdelta
index every 5 minutes with this command:
/usr/local/bin/indexer --rotate fulltextdelta
The search daemon
Once you have built the indexes you can start searching from the command line with the search
tool. When you are confident your setup is correct, it's time to start the search daemon.
We do this with the command
/usr/local/bin/searchd --config /usr/local/etc/sphinx.conf
In our experience searchd
is not very stable (probably due to the beta release): it stops responding after a couple days of uptime, so we restart it daily.
With this last step you have Sphinx all set up and working, so the first part of this tutorial ends here. Please note that Sphinx has a lot of options, that I have not described. I only describe here the options that proved useful for us. If you want to know more about Sphinx please refer to the Sphinx home page.
Second: using the API to query Sphinx
So we now have Sphinx up and running and fully populated indexes that return correct results, but now we need to be able to interrogate Sphinx' indexes from RT.
To do this you have to talk to the search daemon, which speaks his own language: the Sphinx API. There are several implementations (PHP, Java, Python and even a third-party Perl API), but since we are mostly familiar with PHP we are going to use it to implement our interface.
Before posting the actual PHP script, there are a few things I have to point out:
- Mason does not support PHP out of the box, so we did a little hack that I'll explain later
- The script you'll see here contains in-line CSS to adapt the results to the default look of RT (web2). If you're using a different style you may want to change the inline CSS accordingly.
- The script spits out all the results in one page: there's still no support for pagination a-la RT.
- The script spits out ALL the results, including deleted, rejected, merged tickets.
You see, the script is still a little rough, but I hope we'll be able to refine it with the help of the community.
sphinx.php follows:
<html> <head> <title>Ricerca Ticket Sphinx</title> <script language="JavaScript"> document.domain = "AMENDED"; </script> <style> body {font:13px/1.231 arial,helvetica,clean,sans-serif;*font-size:small;*font:x-small;} table {font-size:inherit;font:100%;} pre,code,kbd,samp,tt{font-family:monospace;*font-size:108%;line-height:100%;} a { color: #000; text-decoration: none; } div#body a:visited { color: #666; } /* background: #547CCC ; */ body { padding:0; margin:0; background: #FFFFFF ; font-family: arial, helvetica, sans-serif; color: #000000; } table.ticket-list, table.queue-summary, table.collection { margin-top: 0.75em; font-size: 0.9em; border: 1px solid #aaa; border-bottom: 2px solid #999; border-right: 2px solid #999; } table.queue-summary tr>*:first-child { padding-left: 1em; } table.queue-summary tr>*:last-child { padding-right: 1em; } table.ticket-list a, table.queue-summary a, table.collection a { font-weight: bold; } table.ticket-list th.collection-as-table, table.collection th.collection-as-table { background: #ddd; font-size: 0.9em; margin-bottom: 0.5em; text-align: left; } table.queue-summary th.collection-as-table { font-size: 0.9em; margin-bottom: 0.5em; text-align: right; } table.queue-summary th.collection-as-table:first-child { text-align: left; } tr.collection-as-table+tr.collection-as-table th { border-bottom: 2px solid grey; } table.queue-summary td { background: #efefef; border-bottom: 1px solid #ccc; } tr.evenline td { background: #eee; } tr.oddline td { background: #fff; } tr.evenline td, tr.oddline td { padding-top: 0.5em; } tr.evenline+tr.evenline td, tr.oddline+tr.oddline td{ padding-top: 0; border: none; } table.ticket-list td:first-child, table.ticket-list th:first-child { padding-left: 1em; } table.ticket-list td:last-child, table.ticket-list th:last-child { padding-right: 1em; } th.collection-as-table , td.collection-as-table { padding-right: 0.5em; } .pagenum.a:hover, .paging a.nav:hover{ text-decoration: underline; } .pagenum *, .paging a.nav{ padding: .5em; } .currentpage{ text-decoration: none; font-weight: bold; background: #eee; } div.paging{ text-align: center; padding-bottom: 1em; } #body>table.ticket-list { margin-bottom: 2em; } </style> </head> <body> <?php //Costante con il max numero di righe ritornate da Sphinx define('SPHINX_MAX_LIMIT', 1000); require_once("Classi/sphinxapi.php"); require_once("Classi/Timer.class.php"); require_once("Classi/MySql.class.php"); include_once("Include/Query.inc.php"); include_once("Include/dvopendb.config.inc.php"); $MyTimer = new Timer(); //----------------------------------------------------------------------------------------------- //Parametri di connessione al Daemon Sphinx $host = "AMENDED"; $port = 9312; $index = "*"; // put your index name here //----------------------------------------------------------------------------------------------- //Variabili in GET $query = $_GET['Text']; $RequestorFilter = trim($_GET['Requestor']); $CodaFilter = $_GET['Queue']; $DataInizioFilter = trim($_GET['From']); $DataFineFilter = trim($_GET['To']); //----------------------------------------------------------------------------------------------- //Connessione al Daemon Sphinx $cl = new SphinxClient(); $cl->SetServer($host, $port); $cl->SetMatchMode(SPH_MATCH_EXTENDED2); $cl->SetLimits(0,SPHINX_MAX_LIMIT, $max_matches=0, $cutoff=0 ); //----------------------------------------------------------------------------------------------- //MySql Database Query per la retrive dei dati del ticket $SqlQuery = new MySql($DbHostName,$DbUser,$DbPassword,$DbName); $SqlQuery->Connetti(); //----------------------------------------------------------------------------------------------- //Filtri sulla ricerca Sphinx //Check della query (parametro di ricerca) if(strlen($query)==0) { echo "E' necessario inserire un parametro di ricerca nella casella \"Text\" per proseguire."; exit(1); } //Filtro Requestor if(strlen($RequestorFilter)>0) { $TempQuery = sprintf($RequestorLookUpQuery,mysql_real_escape_string($RequestorFilter)); $SqlQuery->ExecQuery($TempQuery); $RequestorResult = $SqlQuery->GetMonoArrayResult(); if(count($RequestorResult)>0) { $cl->SetFilter ("createdby",array($RequestorResult['id']) , $exclude=false ); } else { echo "<label class=\"warninglabel\">Warning - L'indirizzo \"".$RequestorFilter."\" non e' presente nel Database; La ricerca e' stata effettuata per tutti i richiedenti</label><br/><br/>"; } } //Filtro Queue if(strlen($CodaFilter)>0) $cl->SetFilter ("queue", array($CodaFilter), $exclude=false ); //Filtro Data //__________________________________________________________________________________________________________________ //Check delle date if(strlen($DataInizioFilter)>0) { //Check Datainizio if(checkdate(substr($DataInizioFilter,5,2),substr($DataInizioFilter,8,2),substr($DataInizioFilter,0,4))==false) { echo "Data Inizio ".$DataInizioFilter." non corretta "; exit(1); } else $DataInizio = mktime(0, 0, 0, substr($DataInizioFilter,5,2), substr($DataInizioFilter,8,2), substr($DataInizioFilter,0,4)); } if(strlen($DataFineFilter)>0) { //Check DataFine if(checkdate(substr($DataFineFilter,5,2),substr($DataFineFilter,8,2),substr($DataFineFilter,0,4))==false) { echo "Data Fine ".$DataFineFilter." non corretta "; exit(1); } else $DataFine = mktime(0, 0, 0, substr($DataFineFilter,5,2), substr($DataFineFilter,8,2), substr($DataFineFilter,0,4)); } //__________________________________________________________________________________________________________________ //Filtraggio vero e proprio //Se sono settate entrambe... if(strlen($DataInizioFilter)>0 && strlen($DataFineFilter)>0) { if($DataInizio < $DataFine) { //Se sono settate entrambe le date filtro per entrambe $cl->SetFilterRange("created", $DataInizio, $DataFine, $exclude=false ); } else { echo "La data Inizio ".$DataInizioFilter." e' maggiore della data di fine ".$DataFineFilter; exit(1); } } //Se solo DataInizio.. if(strlen($DataInizioFilter)>0 && strlen($DataFineFilter)==0) { $cl->SetFilterRange("created", $DataInizio, mktime(), $exclude=false ); } //Se solo DataFine.. if(strlen($DataInizioFilter)==0 && strlen($DataFineFilter)>0) { //2008-11-14 : Data del primo Ticket di RT //1988-07-18 : Data di nascita di AMENDED :) $cl->SetFilterRange("created", mktime(0, 0, 0,07,18,1988), $DataFine, $exclude=false ); } //----------------------------------------------------------------------------------------------- //Eseguo la Query su Sphinx $res = $cl->Query($query, $index); if ($res == false) { echo "failure: ".$cl->GetLastError()."\n"; } else { if(count($res["matches"])!=0) { //Se i risultati totali sono > 1000 if($res[total_found]>SPHINX_MAX_LIMIT) { echo "<label class=\"warninglabel\">Warning - Sono stati trovati ".$res[total_found]." risultati, verrano visualizzati solo i primi ".SPHINX_MAX_LIMIT."</label><br/><br/>"; } //Se ci sono dei risultati creo la tabella e stampo echo "<table id='ResultTable' class='ticket-list' border='0' cellspacing='0' cellpadding='1' width='100%'> <thead> <tr class='collection-as-table'> <th class='collection-as-table'>#</td> <th class='collection-as-table'>Oggetto</td> <th class='collection-as-table'>Coda</td> <th class='collection-as-table'>Status</td> <th class='collection-as-table'>Creato</td> <th class='collection-as-table'>Richiedente</td> <th class='collection-as-table'>Proprietario</td> </tr> </thead><tbody>"; foreach ($res["matches"] as $key=>$value) { $i++; $TempQuery = sprintf($TicketDetailQuery,mysql_real_escape_string($key)); $SqlQuery->ExecQuery($TempQuery); $Risultato = $SqlQuery->GetMonoArrayResult(); if($i%2==0) { echo " <tr class='evenline'> <td class='collection-as-table'><a href='http://rtdba.AMENDED/Ticket/Display.html?id=".$Risultato['id']."' target='_top'>".$Risultato['id']."</a></td> <td class='collection-as-table' width='400'><a href='http://rtdba.AMENDED/Ticket/Display.html?id=".$Risultato['id']."' target='_top'>".$Risultato['Subject']."</a></td> <td class='collection-as-table'>".$Risultato['Queue']."</td> <td class='collection-as-table'>".$Risultato['Status']."</td> <td class='collection-as-table'>".$Risultato['Created']."</td> <td class='collection-as-table'>".$Risultato['Requestor']."</td> <td class='collection-as-table'>".$Risultato['Owner']."</td> </tr>"; } else { echo " <tr class='oddline'> <td class='collection-as-table'><a href='http://rtdba.AMENDED/Ticket/Display.html?id=".$Risultato['id']."' target='_top'>".$Risultato['id']."</a></td> <td class='collection-as-table' width='400'><a href='http://rtdba.AMENDED/Ticket/Display.html?id=".$Risultato['id']."' target='_top'>".$Risultato['Subject']."</a></td> <td class='collection-as-table'>".$Risultato['Queue']."</td> <td class='collection-as-table'>".$Risultato['Status']."</td> <td class='collection-as-table'>".$Risultato['Created']."</td> <td class='collection-as-table'>".$Risultato['Requestor']."</td> <td class='collection-as-table'>".$Risultato['Owner']."</td> </tr>"; } } echo " </tbody> <tfoot></tfoot>
";
} else { //...Altrimenti stampo che non ho trovato nulla echo "<br/><label class=\"notfoundlabel\">Nessun risultato trovato</label><br/>"; } echo "Trovati ".$res[total]." di ".$res[total_found]." in ".$MyTimer->elapsed()." secondi"; } ?>
dvopendb.config.inc.php follows:
<?php $DbHostName = "HOSTNAME_AMENDED:3306/rt3"; $DbUser = "AMENDED"; $DbPassword = "AMENDED"; $DbName = "rt3"; $DbQuery = "select EmailAddress, RealName from Users"; ?>
Query.inc.php follows:
<?php $TicketDetailQuery = "select Tickets.id,Subject,Status,Tickets.Created as Created,U1.Name as Requestor,U2.Name as Owner,Queues.Name as Queue from Tickets left outer join Users U1 on U1.id = Tickets.Creator left outer join Users U2 on U2.id = Tickets.Owner left outer join Queues on Queue = Queues.id where Tickets.id = %s "; $RequestorLookUpQuery = "select id from Users where EmailAddress = '%s' "; ?>
This ends the step regarding the use of the Sphinx API to query the searchd
daemon.
Third: integrating it all into RT
Now we need a way to call the php script from within RT. Well, technically we don't need it, but it would be very nice to have.
For this task I found very useful the WritingPortlets section of this wiki, written by Andy D'Arcy Jewell (thank you!)
So the first thing I did was writing a portlet that one can integrate in the "RT at a glance" page.
I put it in <rt-base-dir>/local/html/Elements. The portlet itself is in a file called SphinxSearch.
SphinxSearch follows:
<&|/Widgets/TitleBox, title => loc('Fulltext Search (Sphinx)'), bodyclass => "", titleright => loc('HELP'), titleright_href => RT->Config->Get('WebPath').'/Prefs/SphinxHelp.html'&> <form name="input" action="/Tools/sphinx.html" method="get"> <table> <tr class="input-row"> <td class="label"><&|/l&>Text</&>:</td> <td class="value"> <input type="text" size="40" name="Text" /> </td> </tr> <tr class="input-row"> <td class="label"><&|/l&>Queue</&>:</td> <td class="value"> <& /Elements/SelectQueue, Name => 'Queue' &> </td> </tr> <tr class="input-row"> <td class="label"><&|/l&>Requestor</&>:</td> <td class="value"> <& /Elements/EmailInput, Name => 'Requestor', Size => '40' &> </td> </tr> <tr class="input-row"> <td class="label"><&|/l&>From</&>:</td> <td class="value"> <& /Elements/SelectDate, Name => "From" || '' &> </td> </tr> <tr class="input-row"> <td class="label"><&|/l&>To</&>:</td> <td class="value"> <& /Elements/SelectDate, Name => "To" || '' &> </td> </tr> </table> <div class="submit"> <div class="buttons"> <input class="button" type="submit" value="Submit" /> </div> </div> </form> </&>
This is the result:
I tried to keep the look and feel of RT by copying code from the other portlets. I even used RT syntax to allow the localization of string, but not everything gets translated since not all the terms are in RT resource files.
Since Sphinx' extended search syntax can be very elaborated, I put a HELP link in the upper right corner (circled in red in the image above). This points to a page that I extracted from Sphinx' documentation. The page is in <rt-base-dir>/local/html/Prefs. The file is called SphinxHelp.html.
SphinxHelp.html follows:
<& /Elements/Header, Title => "Sphinx - Free open-source SQL full-text search engine" &> <& /Elements/Tabs, Title => "Help for Sphinx search syntax", current_tab => 'index.html', current_toptab => 'index.html', tabs => $tabs, actions => $actions, &> <div align="left" class="refresh"> <div> <div> <h2 style="clear: both"><a name="extended-syntax"></a>Extended query syntax</h2> </div> </div> <p> The following special operators and modifiers can be used when using the extended matching mode: </p> <div> <ul type="disc"> <li>operator OR: <pre class="programlisting">hello | world</pre></li> <li>operator NOT:<pre class="programlisting"> hello -world hello !world</pre> </li> <li>field search operator: <pre class="programlisting">@title hello @body world</pre></li> <li>field position limit modifier (introduced in version 0.9.9-rc1): <pre class="programlisting">@body[50] hello</pre></li> <li>multiple-field search operator: <pre class="programlisting">@(title,body) hello world</pre></li> <li>all-field search operator: <pre class="programlisting">@* hello</pre></li> <li>phrase search operator: <pre class="programlisting">"hello world"</pre></li> <li>proximity search operator: <pre class="programlisting">"hello world"~10</pre></li> <li>quorum matching operator: <pre class="programlisting">"the world is a wonderful place"/3</pre></li> <li>strict order operator (aka operator "before"): <pre class="programlisting">aaa << bbb << ccc</pre></li> <li>exact form modifier (introduced in version 0.9.9-rc1): <pre class="programlisting">raining =cats and =dogs</pre></li> <li>field-start and field-end modifier (introduced in version 0.9.9-rc2): <pre class="programlisting">^hello world$</pre></li> </ul> </div> <p> Here's an example query that uses some of these operators: </p><div class="example"><a name="ex-extended-query"></a><p class="title"><b>Example: Extended matching mode: query example</b></p><div class="example-contents"><pre class="programlisting"> "hello world" @title "example program"~5 @body python -(php|perl) @* code </pre></div></div> <p><br class="example-break"> The full meaning of this search is: </p><div class="itemizedlist"><ul type="disc"><li>Find the words 'hello' and 'world' adjacently in any field in a document;</li> <li>Additionally, the same document must also contain the words 'example' and 'program' in the title field, with up to, but not including, 10 words between the words in question; (E.g. "example PHP program" would be matched however "example script to introduce outside data into the correct context for your program" would not because two terms have 10 or more words between them)</li> <li>Additionally, the same document must contain the word 'python' in the body field, but not contain either 'php' or 'perl';</li> <li>Additionally, the same document must contain the word 'code' in any field.</li> </ul></div> <p> </p><p> There always is implicit AND operator, so "hello world" means that both "hello" and "world" must be present in matching document. </p><p> OR operator precedence is higher than AND, so "looking for cat | dog | mouse" means "looking for ( cat | dog | mouse )" and <span class="emphasis"><em>not</em></span> "(looking for cat) | dog | mouse". </p><p> Field limit operator limits subsequent searching to a given field. Normally, query will fail with an error message if given field name does not exist in the searched index. However, that can be suppressed by specifying "@@relaxed" option at the very beginning of the query: </p><pre class="programlisting"> @@relaxed @nosuchfield my query </pre><p> This can be helpful when searching through heterogeneous indexes with different schemas. </p><p> Field position limit, introduced in version 0.9.9-rc1, additionaly restricts the searching to first N position within given field (or fields). For example, "@body[50] hello" will <span class="bold"><strong>not</strong></span> match the documents where the keyword 'hello' occurs at position 51 and below in the body. </p><p> Proximity distance is specified in words, adjusted for word count, and applies to all words within quotes. For instance, "cat dog mouse"~5 query means that there must be less than 8-word span which contains all 3 words, ie. "CAT aaa bbb ccc DOG eee fff MOUSE" document will <span class="emphasis"><em>not</em></span> match this query, because this span is exactly 8 words long. </p><p> Quorum matching operator introduces a kind of fuzzy matching. It will only match those documents that pass a given threshold of given words. The example above ("the world is a wonderful place"/3) will match all documents that have at least 3 of the 6 specified words. </p><p> Strict order operator (aka operator "before"), introduced in version 0.9.9-rc2, will match the document only if its argument keywords occur in the document exactly in the query order. For instance, "black << cat" query (without quotes) will match the document "black and white cat" but <span class="emphasis"><em>not</em></span> the "that cat was black" document. Order operator has the lowest priority. It can be applied both to just keywords and more complex expressions, ie. this is a valid query: </p><pre class="programlisting"> (bag of words) << "exact phrase" << red|green|blue </pre><p> </p><p> Exact form keyword modifier, introduced in version 0.9.9-rc1, will match the document only if the keyword occurred in exactly the specified form. The default behaviour is to match the document if the stemmed keyword matches. For instance, "runs" query will match both the document that contains "runs" <span class="emphasis"><em>and</em></span> the document that contains "running", because both forms stem to just "run" - while "=runs" query will only match the first document. Exact form operator requires <a href="#conf-index-exact-words" title="11.2.39. index_exact_words">index_exact_words</a> option to be enabled. This is a modifier that affects the keyword and thus can be used within operators such as phrase, proximity, and quorum operators. </p><p> Field-start and field-end keyword modifiers, introduced in version 0.9.9-rc2, will make the keyword match only if it occurred at the very start or the very end of a fulltext field, respectively. For instance, the query "^hello world$" (with quotes and thus combining phrase operator and start/end modifiers) will only match documents that contain at least one field that has exactly these two keywords. </p><p> Starting with 0.9.9-rc1, arbitrarily nested brackets and negations are allowed. However, the query must be possible to compute without involving an implicit list of all documents: </p><pre class="programlisting"> // correct query aaa -(bbb -(ccc ddd)) // queries that are non-computable -aaa aaa | -bbb </pre><p> </p> </div> <div align="right" class="search-result-actions"> </div> <%init> my @results; my $skip_create = 1; my $actions; my $tabs = $m->comp("/Elements/DashboardTabs"); </%init>
The last part is about passing the form's (portlet) parameters to the PHP script.
As I told you before, the problem here is that Mason does not support PHP out of the box. I found a few hacks around the web to make Mason support PHP, but they were either not applicable to our environment or too difficult to apply.
So we implemented another hack: since we already have another virtual domain where PHP works, we created an html page that embeds (within an iframe) the results from the PHP page that resides in the other virtual domain.
That's why you see this little piece of code
<script language="JavaScript"> document.domain = "AMENDED"; </script>
in the PHP script above: you have to declare that both the HTML page and the PHP script belong to the same domain (although on different subdomains), otherwise you get a security error in the browser.
The html page is located in <rt-base-dir>/local/html/Tools and the file is called sphinx.html.
sphinx.html follows:
<& /Elements/Header, Title => "Search results" &> <& /Elements/Tabs, Title => "Search results", current_tab => 'index.html', current_toptab => 'index.html', tabs => $tabs, actions => $actions, &> <div align="left"> <table width=95% border=0> <tr> <td> <iframe id="sphinxresults" src="http://PHP_VIRTDOMAIN_AMENDED/sphinx.php?<% "Text=" . $Text . "&Queue=" . $Queue . "&Requestor=" . $Requestor . "&From=" . $From . "&To=" . $To %>" width="95%" height="200" frameborder="0" onload="resizeIframeToFitContent(this)"></iframe> </td> </tr> </table> </div> <div align="right" class="search-result-actions"> </div> <%init> my @results; my $skip_create = 1; my $actions; my $tabs = $m->comp("/Elements/DashboardTabs"); </%init> <%args> $Text $Queue => '' $Requestor => '' $From => '' $To => '' </%args>
There is one last piece of code that we need to put in place: it allows us to insert a little piece of custom CSS and custom Javascript in all RT page headers.
You have to create a file called InHeader in <rt-base-dir>/local/html/NoAuth/css/web2 (web2 is the name of the default RT CSS, if you use another CSS you have to change this path). The contents of this file are shown below.
InHeader follows:
<style> pre.programlisting { background-color: #f0f0f0; padding: 0.5em; margin-left: 2em; margin-right: 2em; } </style> <script language="JavaScript"> function resizeIframeToFitContent(iframe) { // This function resizes an IFrame object // to fit its content. // The IFrame tag must have a unique ID attribute. iframe.height = document.getElementById(iframe.id).contentWindow.document.body.scrollHeight + 50; } </script> <script language="JavaScript"> document.domain = "AMENDED"; </script>
Finally, to add our portlet to the installable portlets in RT, you have to modify your RT_SiteConfig.pm file and add the portlet name to the $HomePageComponents variable. If you don't see the $HomePageComponents variable in your RT_SiteConfig.pm file, copy it from the RT_Config.pm file.
Mine looks like this:
Set($HomepageComponents, [qw(QuickCreate Quicksearch MyAdminQueues MySupportQueues MyReminders RefreshHomepage Dashboards SphinxSearch)]);
Conclusion
Whew, it was long, but not difficult I hope.
Following the instruction in this article you get an installable portlet that will allow you to easily (and quickly!) search the RT ticket database.
Sphinx' syntax allows very powerful searches and its performance is unbeatable. We find it so convenient that we've found us often using it in place of every other search.
We had to implement a few hacks to embed the PHP code, but that's just because we were more familiar with PHP. If you are a Perl hacker you can use the Sphinx' Perl API to integrate Sphinx even better. Let's see what the community can come up with!
I hope this was useful. If you need help, I can be reached through RT mailing lists.
Thanks to Cris for this documentation! Here are a few things we changed for our organisation :
With the missing files included in sphinx.php, I had to change a few things so here is my spinx.php file :
<html> <head> <title>Recherche Ticket Sphinx</title> <script src="Classi/sorttable.js"></script> <style> body {font:13px/1.231 arial,helvetica,clean,sans-serif;*font-size:small;*font:x-small;} table {font-size:inherit;font:100%;} pre,code,kbd,samp,tt{font-family:monospace;*font-size:108%;line-height:100%;} a { color: #000; text-decoration: none; } div#body a:visited { color: #666; } /* background: #547CCC ; */ body { padding:0; margin:0; background: #FFFFFF ; font-family: arial, helvetica, sans-serif; color: #000000; } table.ticket-list, table.queue-summary, table.collection { margin-top: 0.75em; font-size: 0.9em; border: 1px solid #aaa; border-bottom: 2px solid #999; border-right: 2px solid #999; } table.queue-summary tr>*:first-child { padding-left: 1em; } table.queue-summary tr>*:last-child { padding-right: 1em; } table.ticket-list a, table.queue-summary a, table.collection a { font-weight: bold; } table.ticket-list th.collection-as-table, table.collection th.collection-as-table { background: #ddd; font-size: 0.9em; margin-bottom: 0.5em; text-align: left; } table.queue-summary th.collection-as-table { font-size: 0.9em; margin-bottom: 0.5em; text-align: right; } table.queue-summary th.collection-as-table:first-child { text-align: left; } tr.collection-as-table+tr.collection-as-table th { border-bottom: 2px solid grey; } table.queue-summary td { background: #efefef; border-bottom: 1px solid #ccc; } tr.evenline td { background: #eee; } tr.oddline td { background: #fff; } tr.evenline td, tr.oddline td { padding-top: 0.5em; } tr.evenline+tr.evenline td, tr.oddline+tr.oddline td{ padding-top: 0; border: none; } table.ticket-list td:first-child, table.ticket-list th:first-child { padding-left: 1em; } table.ticket-list td:last-child, table.ticket-list th:last-child { padding-right: 1em; } th.collection-as-table , td.collection-as-table { padding-right: 0.5em; } .pagenum.a:hover, .paging a.nav:hover{ text-decoration: underline; } .pagenum *, .paging a.nav{ padding: .5em; } .currentpage{ text-decoration: none; font-weight: bold; background: #eee; } div.paging{ text-align: center; padding-bottom: 1em; } #body>table.ticket-list { margin-bottom: 2em; } </style> </head> <body> <?php //Costante con il max numero di righe ritornate da Sphinx define('SPHINX_MAX_LIMIT', 1000); require_once("Classi/sphinxapi.php"); //require_once("Classi/Timer.class.php"); require_once("Classi/MySql.class.php"); require_once("Classi/ErrorManager.class.php"); include_once("Include/Query.inc.php"); include_once("Include/dvopendb.config.inc.php"); // $MyTimer = new Timer(); //----------------------------------------------------------------------------------------------- //Parametri di connessione al Daemon Sphinx $host = "localhost"; $port = 9312; $index = "*"; // put your index name here //----------------------------------------------------------------------------------------------- //Variabili in GET $query = $_GET['Text']; $RequestorFilter = trim($_GET['Requestor']); $CodaFilter = $_GET['Queue']; $DataInizioFilter = trim($_GET['From']); $DataFineFilter = trim($_GET['To']); //----------------------------------------------------------------------------------------------- //Connessione al Daemon Sphinx $cl = new SphinxClient(); $cl->SetServer($host, $port); $cl->SetMatchMode(SPH_MATCH_EXTENDED2); $cl->SetLimits(0,SPHINX_MAX_LIMIT, $max_matches=0, $cutoff=0 ); //----------------------------------------------------------------------------------------------- //MySql Database Query per la retrive dei dati del ticket $SqlQuery = new MySql(); $SqlQuery->connect($DbHostName,$DbName,$DbUser,$DbPassword); //----------------------------------------------------------------------------------------------- //Filtri sulla ricerca Sphinx //Check della query (parametro di ricerca) if(strlen($query)==0) { echo "Un paramètre doit être inscrit dans la zone \"Texte\" pour continuer."; exit(1); } //Filtro Requestor if(strlen($RequestorFilter)>0) { $TempQuery = sprintf($RequestorLookUpQuery,mysql_real_escape_string($RequestorFilter)); // $result22 = $SqlQuery->query($TempQuery); // $RequestorResult = $SqlQuery->fetchAll($result22); $result22 = $SqlQuery->query($TempQuery); $RequestorResult = mysql_fetch_array($result22); if(count($RequestorResult)>0) { $cl->SetFilter ("createdby",array($RequestorResult['id']) , $exclude=false ); } else { echo "<label class=\"warninglabel\">Attention - L'adresse \"".$RequestorFilter."\" n'est pas dans la base de donnée et la recherche a été effectuée </label><br/><br/>"; } } //Filtro Queue if(strlen($CodaFilter)>0) $cl->SetFilter ("queue", array($CodaFilter), $exclude=false ); //Filtro Data //__________________________________________________________________________________________________________________ //Check delle date if(strlen($DataInizioFilter)>0) { //Check Datainizio if(checkdate(substr($DataInizioFilter,5,2),substr($DataInizioFilter,8,2),substr($DataInizioFilter,0,4))==false) { echo "Date de début ".$DataInizioFilter." incorrecte "; exit(1); } else $DataInizio = mktime(0, 0, 0, substr($DataInizioFilter,5,2), substr($DataInizioFilter,8,2), substr($DataInizioFilter,0,4)); } if(strlen($DataFineFilter)>0) { //Check DataFine if(checkdate(substr($DataFineFilter,5,2),substr($DataFineFilter,8,2),substr($DataFineFilter,0,4))==false) { echo "Date de fin ".$DataFineFilter." incorrecte "; exit(1); } else $DataFine = mktime(0, 0, 0, substr($DataFineFilter,5,2), substr($DataFineFilter,8,2), substr($DataFineFilter,0,4)); } //__________________________________________________________________________________________________________________ //Filtraggio vero e proprio //Se sono settate entrambe... if(strlen($DataInizioFilter)>0 && strlen($DataFineFilter)>0) { if($DataInizio < $DataFine) { //Se sono settate entrambe le date filtro per entrambe $cl->SetFilterRange("created", $DataInizio, $DataFine, $exclude=false ); } else { echo "La date de début ".$DataInizioFilter." est supérieure à la date de fin ".$DataFineFilter; exit(1); } } //Se solo DataInizio.. if(strlen($DataInizioFilter)>0 && strlen($DataFineFilter)==0) { $cl->SetFilterRange("created", $DataInizio, mktime(), $exclude=false ); } //Se solo DataFine.. if(strlen($DataInizioFilter)==0 && strlen($DataFineFilter)>0) { //2008-11-14 : Data del primo Ticket di RT //1988-07-18 : Data di nascita di AMENDED :) $cl->SetFilterRange("created", mktime(0, 0, 0,07,18,1988), $DataFine, $exclude=false ); } //----------------------------------------------------------------------------------------------- //Eseguo la Query su Sphinx $res = $cl->Query($query, $index); if ($res == false) { echo "failure: ".$cl->GetLastError()."\n"; } else { if(count($res['matches'])!=0) { //Se i risultati totali sono > 1000 if($res['total_found']>SPHINX_MAX_LIMIT) { echo "<label class=\"warninglabel\"> Attention, ".$res[total_found]." résultats ont été trouvés, mais vous ne verrez que les premiers ".SPHINX_MAX_LIMIT." résultats.</label><br/><br/>"; } //Se ci sono dei risultati creo la tabella e stampo //echo "<table id='ResultTable' class='ticket-list' border='0' cellspacing='0' cellpadding='1' width='100%'> echo "<table id='ResultTable' class='sortable' border='0' cellspacing='0' cellpadding='1' width='100%'> <thead> <tr class='collection-as-table'> <th class='collection-as-table'>#</td> <th class='collection-as-table'>Sujet</td> <th class='collection-as-table'>École</td> <th class='collection-as-table'>Status</td> <th class='collection-as-table'>Date_de_création</td> <th class='collection-as-table'>Demandeur</td> <th class='collection-as-table'>Propriétaire</td> </tr> </thead><tbody>"; foreach ($res["matches"] as $key=>$value) { $i++; $TempQuery = sprintf($TicketDetailQuery,mysql_real_escape_string($key)); $result34 = $SqlQuery->query($TempQuery); $Risultato = mysql_fetch_array($result34); //$Risultato = $SqlQuery->queryFetchAll($TempQuery); //$Risultato = $SqlQuery->fetchAll($result23); if($i%2==0) { echo " <tr class='evenline'> <td class='collection-as-table'><a href='http://AMENDED/Ticket/Display.html?id=".$Risultato['id']."' target='_top'>".$Risultato['id']."</a></td> <td class='collection-as-table' width='400'><a href='http://AMENDED/Ticket/Display.html?id=".$Risultato['id']."' target='_top'>".$Risultato['Subject']."</a></td> <td class='collection-as-table'>".$Risultato['Queue']."</td> <td class='collection-as-table'>".$Risultato['Status']."</td> <td class='collection-as-table'>".$Risultato['Created']."</td> <td class='collection-as-table'>".$Risultato['Requestor']."</td> <td class='collection-as-table'>".$Risultato['Owner']."</td> </tr>"; } else { echo " <tr class='oddline'> <td class='collection-as-table'><a href='http://AMENDED/Ticket/Display.html?id=".$Risultato['id']."' target='_top'>".$Risultato['id']."</a></td> <td class='collection-as-table' width='400'><a href='http://AMENDED/Ticket/Display.html?id=".$Risultato['id']."' target='_top'>".$Risultato['Subject']."</a></td> <td class='collection-as-table'>".$Risultato['Queue']."</td> <td class='collection-as-table'>".$Risultato['Status']."</td> <td class='collection-as-table'>".$Risultato['Created']."</td> <td class='collection-as-table'>".$Risultato['Requestor']."</td> <td class='collection-as-table'>".$Risultato['Owner']."</td> </tr>"; } } echo " </tbody> <tfoot></tfoot> </table>"; } else { //...Altrimenti stampo che non ho trovato nulla echo "<br/><label class=\"notfoundlabel\">Aucun résultat trouvé</label><br/>"; } echo "Résultats ".$res['total']." de ".$res['total_found']; // in ".$MyTimer->elapsed()." secondi"; } ?>
Here is my Classi/MySql.class.php file
<?php class MySQLException extends Exception { } class MySQL { private $host = null, $baseName = null, $user = null, $id = null; private $errorManager = null; private $queries = array(); public function __construct() { $this->errorManager = ErrorManager::getInstance(); } // public function __construct() public function getHost() { return $this->host; } public function getBaseName() { return $this->baseName; } public function getUser() { return $this->user; } public function connect($host, $base, $user, $password, $enconding = "utf8") { if (!($this->id = mysql_connect($host, $user, $password))) { $this->host = null; $this->base = null; $this->user = null; $this->id = null; $this->errorManager->reportFatalError($this, new MySQLException("database connection failed!")); } else { $this->host = $host; $this->user = $user; if (mysql_select_db($base, $this->id)) { $this->baseName = $base; } mysql_query("set names $enconding", $this->id); //mysql_set_charset($enconding, $this->id); return true; } } // public function connect($host, $base, $user, $password) public function getLastId() { if (is_resource($this->id)) { $lastId = mysql_insert_id($this->id); return $lastId; } else { $this->errorManager->reportFatalError($this, new MySQLException("database is not connected, can't get last id")); } } // public function getLastId() public function fetchAll($result) { if (mysql_num_rows($result)) { $rows = array(); while ($row = mysql_fetch_assoc($result)) { $rows[] = $row; } return $rows; } else { $this->errorManager->reportFatalError($this, new MySQLException("invalid mysql query result supplied to fetchAll()")); } } // public function fetchAll($result) public function query($string) { if (is_resource($this->id)) { if ($result = mysql_query($string, $this->id)) { $this->queries[] = $string; return $result; } else { $this->errorManager->reportFatalError($this, new MySQLException("error in query: '$string':".mysql_error($this->id))); } } else { $this->errorManager->reportFatalError($this, new MySQLException("database is not connected, can't make query")); } } // public function query($string) public function queryFetchAll($query) { $result = $this->query($query); $entries = $this->fetchAll($result); return $entries; } // public function queryFetchAll($query) public function getQueries() { return $this->queries; } // public function getQueries() public function __destruct() { if (is_resource($this->id)) { mysql_close($this->id); } } // public function __destruct() } // class Mysql ?>
I added ErrorManager.class.php also :
<?php class ErrorManager { private static $instance = null; private $errors = array(); private function __construct() { } public static function getInstance() { if (self::$instance == null) { self::$instance = new ErrorManager(); } return self::$instance; } // public function getInstance() public function getErrors() { return $this->errors; } // public function getErrors() public function reportError(&$sourceObject, $msg = "") { $class = get_class($sourceObject); $this->errors[] = array("source" => $sourceObject, "message" => "$class: \"$msg\""); } // public function reportError($sourceObject, $msg = "") public function reportFatalError(&$sourceObject, $exception = null) { if (!($exception instanceof Exception)) { $exception = new Exception(); } $msg = $exception->getMessage(); $class = get_class($sourceObject); $this->reportError($sourceObject, "fatal error: $msg"); throw $exception; } // public function reportFatalError($sourceObject, $exception) } // class ErrorManager ?>
In the Classi I added http://www.kryogenix.org/code/browser/sorttable/sorttable.js for the table to sort.
When you download the last beta version of sphinx, you get a api so I copied it to the Classi also : sphinxbeta/sphinx-2.0.1-beta/api/sphinxapi.php
The most important thing is to change the limit in the php for the group_concat to make sure you will get all your tickets :
mysql -p
SET GLOBAL group_concat_max_len = 8192000;
My sphinx.conf is the following :
source rt3 { type = mysql sql_host = localhost sql_user = sql_pass = sql_db = rt3 sql_port = 3306 # optional, default is 3306 sql_query = \ select TK.id as TicketNum, \ TK.Creator as CreatedBy, \ TK.Queue as Queue, \ unix_timestamp(TK.Created) as Created, \ TK.Subject as Subject, \ group_concat(A.Content) as Content \ from Tickets TK, Transactions T, Attachments A \ where ((T.ObjectId = TK.id) or (T.ObjectId is null)) \ and A.TransactionId = T.id \ and T.ObjectType = 'RT::Ticket' \ and (A.ContentType = 'text/plain' or A.ContentType = 'text/html') \ and A.Content is not null \ and TK.Created < concat(year(now()),'-',month(now()),'-01') \ group by TK.id \ order by TK.id, T.id sql_attr_uint = Created sql_attr_uint = Queue sql_attr_uint = CreatedBy sql_query_info = SELECT * FROM Tickets WHERE id=$id } source rt3delta : rt3 { sql_query = \ select TK.id as TicketNum, \ TK.Creator as CreatedBy, \ TK.Queue as Queue, \ unix_timestamp(TK.Created) as Created, \ TK.Subject as Subject, \ group_concat(A.Content) as Content \ from Tickets TK, Transactions T, Attachments A \ where ((T.ObjectId = TK.id) or (T.ObjectId is null)) \ and A.TransactionId = T.id \ and T.ObjectType = 'RT::Ticket' \ and A.Content is not null \ and TK.Created between concat(year(now()),'-',month(now()),'-01') and last_day(now()) \ group by TK.id \ order by TK.id, T.id } index fulltext1 { source = rt3 path = /var/data/fulltext1 docinfo = extern charset_type = utf-8 html_strip = 1 min_word_len = 1 enable_star = 1 min_infix_len = 3 } index fulltextdelta: fulltext1 { source = rt3delta charset_type = utf-8 path = /var/data/fulltextdelta } indexer { mem_limit = 128M } searchd { listen = 9312 listen = 9306:mysql41 log = /var/log/sphinx/searchd.log query_log = /var/log/sphinx/query.log read_timeout = 5 max_children = 30 pid_file = /var/run/sphinx/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1 workers = threads # for RT to work }