Difference between revisions of "IntegrateSphinx"

From Request Tracker Wiki
Jump to navigation Jump to search
Line 619: Line 619:
I put it in <rt-base-dir>/local/html/Elements. The portlet itself is in a file called SphinxSearch.
I put it in <rt-base-dir>/local/html/Elements. The portlet itself is in a file called SphinxSearch.


'''<u>sphinxsearch follows:</u>'''
'''<u>SphinxSearch follows:</u>'''


  <nowiki><&|/Widgets/TitleBox, title => loc('Fulltext Search (Sphinx)'), bodyclass => "",
  <nowiki><&|/Widgets/TitleBox, title => loc('Fulltext Search (Sphinx)'), bodyclass => "",
Line 677: Line 677:
Since Sphinx' extended search syntax can be very elaborated, I put a HELP link in the upper right corner (circled in red in the image above). This points to a page that I extracted from Sphinx' documentation. The page is in &lt;rt-base-dir&gt;/local/html/Prefs. The file is called SphinxHelp.html.
Since Sphinx' extended search syntax can be very elaborated, I put a HELP link in the upper right corner (circled in red in the image above). This points to a page that I extracted from Sphinx' documentation. The page is in &lt;rt-base-dir&gt;/local/html/Prefs. The file is called SphinxHelp.html.


'''<u>sphinxhelp.html follows:</u>'''
'''<u>SphinxHelp.html follows:</u>'''


  &lt;&amp; /Elements/Header, Title =&gt; "Sphinx - Free open-source SQL full-text search engine" &amp;&gt;
  &lt;&amp; /Elements/Header, Title =&gt; "Sphinx - Free open-source SQL full-text search engine" &amp;&gt;
Line 901: Line 901:
You have to create a file called InHeader in &lt;rt-base-dir&gt;/local/html/NoAuth/css/web2 (web2 is the name of the default RT CSS, if you use another CSS you have to change this path). The contents of this file are shown below.
You have to create a file called InHeader in &lt;rt-base-dir&gt;/local/html/NoAuth/css/web2 (web2 is the name of the default RT CSS, if you use another CSS you have to change this path). The contents of this file are shown below.


'''<u>inheader follows:</u>'''
'''<u>InHeader follows:</u>'''


  &lt;style&gt;
  &lt;style&gt;

Revision as of 06:23, 11 January 2011

Contributed by Cris

Integrating the Sphinx full-text search engine in RT

homepage Sphinx


What is Sphinx?

Sphinx is a free open-source SQL full-text search engine. As they put it on their site: How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.

Basically, Sphinx scans your database and builds his own indexes for blazingly-fast full-text searches.


Why?

Well, we needed fast full-text searches, and what RT offers is not enough. It is slow and it causes a high load on the db server.

I saw a reference to Sphinx on the RT users mailing list, and - after exploring a few alternatives and verifying no one had already did it - proceeded to integrate it into our RT installation.

I am going to illustrate here all the steps we put in place to have Sphinx completely integrated in RT. The work required a couple of weeks of my spare time and that of two other colleagues.

Unfortunately I am not able to package this as a proper extension, but I hope this docs will help those who want to do the same. Maybe someone will feel inspired enough to package it for the benefit of the community, or maybe Jesse will decide to integrate it into RT out of the box. Who knows?


First: setting up Sphinx

Intro

First of all, you have to go to the Sphinx homepage, download and compile the sources. Or you can choose among the provided binary packages.

We decided to go with version 1.10-beta, the most recent at the time of our installation.

At this point you have to decide if you want to use real-time indexes or the more traditional full+delta approach to indexes.

I will describe here briefly the meaning of these two options; if you want more infos please refer to the Sphinx homepage.

Real-time indexes (on the Sphinx site they call them RT-indexes... quite confusing, uh?) are a clever way to have your indexes always updated, up to the millisecond. They are however still in beta (as of 1.10-beta) and they are more complex to set up and update. We wanted a quick solution, so we choose the traditional way.

The traditional way consists of having to indexes: one big index comprising all tickets up to a certain date (or to a certain ID), and a small delta index for the rest of the tickets. You can use only one big index, but it will take a few minutes to build it so you cannot refresh it too frequently. If you use the full+delta approach you can build the big index e.g. once a day and the delta index once every few minutes.

Presently we have about 15 thousand tickets in our RT db; we build the full index every morning, indexing everything up to the first day of the current month: it takes about 8 minutes. The delta index (from start of month to now) is built every 5 minutes and it takes about a couple seconds.

At this point I should probably say that we are running RT on two solaris servers: one for the presentation and one for the DB. Sphinx runs on the DB side.

Sphinx setup

First thing to do is to tell Sphinx where your database is, what kind of database engine you use and how to scan your data.

You tell Sphinx all these things in the file sphinx.conf (we have it in /usr/local/etc). Here is our sphinx.conf:

#
 # Sphinx configuration
 #
 
 source rt3
 {
 	type			= mysql
 
 	sql_host		= localhost
 	sql_user		= AMENDED
 	sql_pass		= AMENDED
 	sql_db			= rt3
 	sql_port		= 3306	# optional, default is 3306
 
 	sql_query		= \
 		select TK.id as TicketNum, \
 			TK.Creator as CreatedBy, \
 			TK.Queue as Queue, \
 			unix_timestamp(convert_tz(TK.Created,'GMT','Europe/Rome')) as Created, \
 			TK.Subject as Subject, \
 			group_concat(cast(A.Content as char)) as Content \
 		from Tickets TK, Transactions T, Attachments A \
 		where ((T.ObjectId = TK.id) or (T.ObjectId is null)) \
 		and A.TransactionId = T.id \
 		and T.ObjectType = 'RT::Ticket' \
 		and A.ContentType = 'text/plain' \
 		and A.Content is not null \
 		and TK.Created < concat(year(now()),'-',month(now()),'-01') \
 		group by TK.id \
 		order by TK.id, T.id
 
 	sql_attr_uint		= Created
 	sql_attr_uint		= Queue
 	sql_attr_uint		= CreatedBy
 
 	sql_query_info		= SELECT * FROM Tickets WHERE id=$id
 }
 
 
 source rt3delta : rt3
 {
 	sql_query		= \
 		select TK.id as TicketNum, \
 			TK.Creator as CreatedBy, \
 			TK.Queue as Queue, \
 			unix_timestamp(convert_tz(TK.Created,'GMT','Europe/Rome')) as Created, \
 			TK.Subject as Subject, \
 			group_concat(cast(A.Content as char)) as Content \
 		from Tickets TK, Transactions T, Attachments A \
 		where ((T.ObjectId = TK.id) or (T.ObjectId is null)) \
 		and A.TransactionId = T.id \
 		and T.ObjectType = 'RT::Ticket' \
 		and A.ContentType = 'text/plain' \
 		and A.Content is not null \
 		and TK.Created between concat(year(now()),'-',month(now()),'-01') and last_day(now()) \
 		group by TK.id \
 		order by TK.id, T.id
 }
 
 
 index fulltext1
 {
 	source			= rt3
 	path			= /var/data/fulltext1
 	docinfo			= extern
 	charset_type	= sbcs
 	min_infix_len 	= 3
 	enable_star     = 1
 }
 
 
 index fulltextdelta: fulltext1
 {
 	source			= rt3delta
 	path			= /var/data/fulltextdelta
 }
 
 
 indexer
 {
 	mem_limit		= 32M
 }
 
 
 searchd
 {
 	listen			= 9312
 	listen			= 9306:mysql41
 	log				= /var/log/searchd.log
 	query_log		= /var/log/query.log
 	read_timeout	= 5
 	max_children	= 30
 	pid_file		= /var/log/searchd.pid
 	max_matches		= 1000
 	seamless_rotate		= 1
 	preopen_indexes		= 0
 	unlink_old		= 1
 	workers			= threads # for RT to work
 }
 
 
 

Let's see what we have in this configuration file.

First we define a data source, that I have named (with lots of fantasy!) rt3. A data source tells Sphinx where to get data to feed its indexes. We tell Sphinx the type of source (mysql), where it is (localhost), and how to retrieve the data (the sqlquery).

There is one VERY important thing in this section: every column that is returned by the query is indexed for full text search, unless you declare it as an attribute. Attributes cannot be searched for text, but they allow you to filter the results.

In our configuration, we return Subject and Content as searchable fields, while the three sql_attr_uint attributes can be used to refine the search (i.e. filter).

NOTE: We had previously defined the three attributes according to their content (e.g. the Created attribute was defines as sql_attr_timestamp), but we have discovered that this configuration would not allow us to use those attributes for filtering. It seems Sphinx can only use sql_attr_uint for filtering... I don't know if it's a temporary limitation, a bug, or working as designed.

The sql_query_info has to return a hit for every $id ($id is an implicit variable, which contains the first column returned by the sql_query. It MUST be an integer.).

We then define another data source (rt3delta) which we derive from rt3. Everything defined in the rt3 data source is inherited by rt3delta, save for what we override (sqlquery in this case). As you can see, the queries are very similar; only the date range is different.

After data sources we define indexes.

First index is fulltext1. Everything should be pretty intuitive, but for the two min_infix_len and enable_star fields. By default Sphinx only indexes full words, but it would be very nice to be able to search for partial words by using wildcard characters. So here it is: min_infix_len tells Sphinx the minimum length of subwords to index, and enable_star tells it that we want to be able to use star as wildcard character. This way you can search for the word minimum by using min* or *nim*, but not by using mi* (because we told Sphinx the minimum length for subwords is 3 chars).

The second index is fulltextdelta, which is derived from fulltext. As we saw above, everything is inherited but for things we explicitely override.

At last we give some parameters for the inner working of indexer and searchd, which are two of the command line tools of Sphinx. The values shown here are good defaults.

Building the indexes

Now that we have a working configuration, it's time to build the indexes.

You can initialize the indexes with the command:

/usr/local/bin/indexer --all

This will build all the indexes, and it will give you an idea of the time and disk space required for each index.

For production use however, you want to put the commands in cron so that indexes are refreshed regularly and automatically.

We refresh the fulltext1 index once a day at 7 a.m., with this command:

/usr/local/bin/indexer --rotate fulltext1

Where --rotate allows the index to be substituted while the search daemon is running.

We refresh the fulltextdelta index every 5 minutes with this command:

/usr/local/bin/indexer --rotate fulltextdelta


The search daemon

Once you have built the indexes you can start searching from the command line with the search tool. When you are confident your setup is correct, it's time to start the search daemon.

We do this with the command

/usr/local/bin/searchd --config /usr/local/etc/sphinx.conf

In our experience searchd is not very stable (probably due to the beta release): it stops responding after a couple days of uptime, so we restart it daily.

With this last step you have Sphinx all set up and working, so the first part of this tutorial ends here. Please note that Sphinx has a lot of options, that I have not described. I only describe here the options that proved useful for us. If you want to know more about Sphinx please refer to the Sphinx home page.



Second: using the API to query Sphinx

So we now have Sphinx up and running and fully populated indexes that return correct results, but now we need to be able to interrogate Sphinx' indexes from RT.

To do this you have to talk to the search daemon, which speaks his own language: the Sphinx API. There are several implementations (PHP, Java, Python and even a third-party Perl API), but since we are mostly familiar with PHP we are going to use it to implement our interface.

Before posting the actual PHP script, there are a few things I have to point out:

  • Mason does not support PHP out of the box, so we did a little hack that I'll explain later
  • The script you'll see here contains in-line CSS to adapt the results to the default look of RT (web2). If you're using a different style you may want to change the inline CSS accordingly.
  • The script spits out all the results in one page: there's still no support for pagination a-la RT.
  • The script spits out ALL the results, including deleted, rejected, merged tickets.

You see, the script is still a little rough, but I hope we'll be able to refine it with the help of the community.

sphinx.php follows:

<html>
	<head>
		<title>Ricerca Ticket Sphinx</title>

		<script language="JavaScript">
			document.domain = "AMENDED";
		</script>

		<style>
			body {font:13px/1.231 arial,helvetica,clean,sans-serif;*font-size:small;*font:x-small;}
			table {font-size:inherit;font:100%;}
			pre,code,kbd,samp,tt{font-family:monospace;*font-size:108%;line-height:100%;}

			a {
				color: #000;
				text-decoration: none;
			}

			div#body a:visited {
				color: #666;
			}

			/* background: #547CCC ; */
			body {
				padding:0;
				margin:0;

				background: #FFFFFF ;
				font-family: arial, helvetica, sans-serif;

				color: #000000;
			}

			table.ticket-list, table.queue-summary, table.collection {
				margin-top: 0.75em;
				font-size: 0.9em;
				border: 1px solid #aaa;
				border-bottom: 2px solid #999;
				border-right: 2px solid #999;
			}

			table.queue-summary tr>*:first-child {
				padding-left: 1em;
			}

			table.queue-summary tr>*:last-child {
				padding-right: 1em;
			}

			table.ticket-list a, table.queue-summary a, table.collection a  {
				font-weight: bold;
			}

			table.ticket-list th.collection-as-table, table.collection th.collection-as-table {
				background: #ddd;
				font-size: 0.9em;
				margin-bottom: 0.5em;
				text-align: left;
			}

			table.queue-summary th.collection-as-table {
				font-size: 0.9em;
				margin-bottom: 0.5em;
				text-align: right;
			}

			table.queue-summary th.collection-as-table:first-child {
				text-align: left;
			}

			tr.collection-as-table+tr.collection-as-table th {
				border-bottom: 2px solid grey;
			}

			table.queue-summary td {
				background: #efefef;
				border-bottom: 1px solid #ccc;
			}

			tr.evenline td {
				background: #eee;
			}

			tr.oddline td {
				background: #fff;
			}

			tr.evenline td, tr.oddline td {
				padding-top: 0.5em;
			}

			tr.evenline+tr.evenline td, tr.oddline+tr.oddline td{
				padding-top: 0;
				border: none;
			}

			table.ticket-list td:first-child, table.ticket-list th:first-child {
				padding-left: 1em;
			}

			table.ticket-list td:last-child, table.ticket-list th:last-child {
				padding-right: 1em;
			}

			th.collection-as-table , td.collection-as-table {
				padding-right: 0.5em;
			}

			.pagenum.a:hover, .paging a.nav:hover{
				text-decoration: underline;
			}

			.pagenum *, .paging a.nav{
				padding: .5em;
			}

			.currentpage{
				text-decoration: none;
				font-weight: bold;
				background: #eee;
			}

			div.paging{
				text-align: center;
				padding-bottom: 1em;
			}

			#body>table.ticket-list {
				margin-bottom: 2em;
			}
		</style>
		</head>

	<body>
		<?php
		//Costante con il max numero di righe ritornate da Sphinx
		define('SPHINX_MAX_LIMIT', 1000);

		require_once("Classi/sphinxapi.php");
		require_once("Classi/Timer.class.php");
		require_once("Classi/MySql.class.php");
		include_once("Include/Query.inc.php");
		include_once("Include/dvopendb.config.inc.php");

		$MyTimer =  new Timer();
		//-----------------------------------------------------------------------------------------------
		//Parametri di connessione al Daemon Sphinx
		$host = "AMENDED";
		$port = 9312;
		$index = "*"; // put your index name here
		//-----------------------------------------------------------------------------------------------
		//Variabili in GET
		$query = $_GET['Text'];
		$RequestorFilter = trim($_GET['Requestor']);
		$CodaFilter = $_GET['Queue'];
		$DataInizioFilter = trim($_GET['From']);
		$DataFineFilter = trim($_GET['To']);
		//-----------------------------------------------------------------------------------------------
		//Connessione al Daemon Sphinx
		$cl = new SphinxClient();
		$cl->SetServer($host, $port);
		$cl->SetMatchMode(SPH_MATCH_EXTENDED2);
		$cl->SetLimits(0,SPHINX_MAX_LIMIT, $max_matches=0, $cutoff=0 );

		//-----------------------------------------------------------------------------------------------
		//MySql Database Query per la retrive dei dati del ticket

		$SqlQuery = new MySql($DbHostName,$DbUser,$DbPassword,$DbName);
		$SqlQuery->Connetti();

		//-----------------------------------------------------------------------------------------------
		//Filtri sulla ricerca Sphinx

		//Check della query (parametro di ricerca)
		if(strlen($query)==0)
		{
			echo "E' necessario inserire un parametro di ricerca nella casella \"Text\" per proseguire.";
			exit(1);
		}

		//Filtro Requestor
		if(strlen($RequestorFilter)>0)
		{
			$TempQuery = sprintf($RequestorLookUpQuery,mysql_real_escape_string($RequestorFilter));
			$SqlQuery->ExecQuery($TempQuery);
			$RequestorResult = $SqlQuery->GetMonoArrayResult();
			if(count($RequestorResult)>0)
			{
				$cl->SetFilter ("createdby",array($RequestorResult['id']) , $exclude=false );
			}
			else
			{
				echo "<label class=\"warninglabel\">Warning - L'indirizzo \"".$RequestorFilter."\" non e' presente nel Database; La ricerca e' stata effettuata per tutti i richiedenti</label><br/><br/>";
			}
		}

		//Filtro Queue
		if(strlen($CodaFilter)>0)
			$cl->SetFilter ("queue", array($CodaFilter), $exclude=false );

		//Filtro Data

		//__________________________________________________________________________________________________________________
		//Check delle date
		if(strlen($DataInizioFilter)>0)
		{
			//Check Datainizio
			if(checkdate(substr($DataInizioFilter,5,2),substr($DataInizioFilter,8,2),substr($DataInizioFilter,0,4))==false)
			{
				echo "Data Inizio ".$DataInizioFilter." non corretta ";
				exit(1);
			}
			else
				 $DataInizio = mktime(0, 0, 0, substr($DataInizioFilter,5,2), substr($DataInizioFilter,8,2), substr($DataInizioFilter,0,4));
		}

		if(strlen($DataFineFilter)>0)
		{	//Check DataFine

			if(checkdate(substr($DataFineFilter,5,2),substr($DataFineFilter,8,2),substr($DataFineFilter,0,4))==false)
			{
				echo "Data Fine ".$DataFineFilter." non corretta ";
				exit(1);
			}
			else
				$DataFine = mktime(0, 0, 0, substr($DataFineFilter,5,2), substr($DataFineFilter,8,2), substr($DataFineFilter,0,4));
		}
		//__________________________________________________________________________________________________________________

		//Filtraggio vero e proprio

		//Se sono settate entrambe...
		if(strlen($DataInizioFilter)>0 && strlen($DataFineFilter)>0)
		{
			if($DataInizio < $DataFine)
			{
				//Se sono settate entrambe le date filtro per entrambe
				$cl->SetFilterRange("created", $DataInizio, $DataFine, $exclude=false );
			}
			else
			{
				echo "La data Inizio ".$DataInizioFilter." e' maggiore della data di fine ".$DataFineFilter;
				exit(1);
			}
		}

		//Se solo DataInizio..
		if(strlen($DataInizioFilter)>0 && strlen($DataFineFilter)==0)
		{
			$cl->SetFilterRange("created", $DataInizio, mktime(), $exclude=false );
		}

		//Se solo DataFine..
		if(strlen($DataInizioFilter)==0 && strlen($DataFineFilter)>0)
		{
			//2008-11-14 : Data del primo Ticket di RT
			//1988-07-18 : Data di nascita di AMENDED :)
			$cl->SetFilterRange("created", mktime(0, 0, 0,07,18,1988), $DataFine, $exclude=false );
		}


		//-----------------------------------------------------------------------------------------------
		//Eseguo la Query su Sphinx

		$res = $cl->Query($query, $index);
		if ($res == false)
		{
			echo "failure: ".$cl->GetLastError()."\n";
		}
		else
		{
			if(count($res["matches"])!=0)
			{
				//Se i risultati totali sono > 1000
				if($res[total_found]>SPHINX_MAX_LIMIT)
				{
					echo "<label class=\"warninglabel\">Warning - Sono stati trovati ".$res[total_found]." risultati, verrano visualizzati solo i primi ".SPHINX_MAX_LIMIT."</label><br/><br/>";
				}

				//Se ci sono dei risultati creo la tabella e stampo
				echo "<table id='ResultTable' class='ticket-list' border='0' cellspacing='0' cellpadding='1' width='100%'>
						<tr class='collection-as-table'>
							<th class='collection-as-table'>#</td>
							<th class='collection-as-table'>Oggetto</td>
							<th class='collection-as-table'>Coda</td>
							<th class='collection-as-table'>Status</td>
							<th class='collection-as-table'>Creato</td>
							<th class='collection-as-table'>Richiedente</td>
							<th class='collection-as-table'>Proprietario</td>
						</tr>";
				foreach ($res["matches"] as $key=>$value)
				{
					$i++;
					$TempQuery = sprintf($TicketDetailQuery,mysql_real_escape_string($key));
					$SqlQuery->ExecQuery($TempQuery);
					$Risultato = $SqlQuery->GetMonoArrayResult();
					if($i%2==0)
					{
						echo "	<tr class='evenline'>
									<td class='collection-as-table'><a href='http://rtdba.AMENDED/Ticket/Display.html?id=".$Risultato['id']."' target='_top'>".$Risultato['id']."</a></td>
									<td class='collection-as-table' width='400'><a href='http://rtdba.AMENDED/Ticket/Display.html?id=".$Risultato['id']."' target='_top'>".$Risultato['Subject']."</a></td>
									<td class='collection-as-table'>".$Risultato['Queue']."</td>
									<td class='collection-as-table'>".$Risultato['Status']."</td>
									<td class='collection-as-table'>".$Risultato['Created']."</td>
									<td class='collection-as-table'>".$Risultato['Requestor']."</td>
									<td class='collection-as-table'>".$Risultato['Owner']."</td>
								</tr>";
					}
					else
					{
						echo "	<tr class='oddline'>
									<td class='collection-as-table'><a href='http://rtdba.AMENDED/Ticket/Display.html?id=".$Risultato['id']."' target='_top'>".$Risultato['id']."</a></td>
									<td class='collection-as-table' width='400'><a href='http://rtdba.AMENDED/Ticket/Display.html?id=".$Risultato['id']."' target='_top'>".$Risultato['Subject']."</a></td>
									<td class='collection-as-table'>".$Risultato['Queue']."</td>
									<td class='collection-as-table'>".$Risultato['Status']."</td>
									<td class='collection-as-table'>".$Risultato['Created']."</td>
									<td class='collection-as-table'>".$Risultato['Requestor']."</td>
									<td class='collection-as-table'>".$Risultato['Owner']."</td>
								</tr>";
					}
				}
				echo "</table>";
			}
			else
			{
				//...Altrimenti stampo che non ho trovato nulla
				echo "<br/><label class=\"notfoundlabel\">Nessun risultato trovato</label><br/>";
			}

			echo "Trovati ".$res[total]." di ".$res[total_found]." in ".$MyTimer->elapsed()." secondi";
		}
		?>


dvopendb.config.inc.php follows:

<?php
$DbHostName = "HOSTNAME_AMENDED:3306/rt3";
$DbUser = "AMENDED";
$DbPassword = "AMENDED";
$DbName = "rt3";
$DbQuery = "select EmailAddress, RealName from Users";
?>


Query.inc.php follows:

<?php
$TicketDetailQuery =
"select Tickets.id,Subject,Status,Tickets.Created as Created,U1.Name as Requestor,U2.Name as Owner,Queues.Name as Queue
from Tickets left outer join Users U1
on U1.id = Tickets.Creator
left outer join  Users U2
on U2.id = Tickets.Owner
left outer join Queues
on  Queue = Queues.id
where Tickets.id = %s
";

$RequestorLookUpQuery =
"select id
from Users
where EmailAddress = '%s'
";
?>

This ends the step regarding the use of the Sphinx API to query the searchd daemon.


Third: integrating it all into RT

Now we need a way to call the php script from within RT. Well, technically we don't need it, but it would be very nice to have.

For this task I found very useful the WritingPortlets section of this wiki, written by Andy D'Arcy Jewell (thank you!)

So the first thing I did was writing a portlet that one can integrate in the "RT at a glance" page.

I put it in <rt-base-dir>/local/html/Elements. The portlet itself is in a file called SphinxSearch.

SphinxSearch follows:

<&|/Widgets/TitleBox, title => loc('Fulltext Search (Sphinx)'), bodyclass => "",
     titleright => loc('HELP'), titleright_href => RT->Config->Get('WebPath').'/Prefs/SphinxHelp.html'&>
  	<form name="input" action="/Tools/sphinx.html" method="get">
  		<table>
  			<tr class="input-row">
  				<td class="label"><&|/l&>Text</&>:</td>
  				<td class="value">
  					<input type="text" size="40" name="Text" />
  				</td>
  			</tr>
  			<tr class="input-row">
  				<td class="label"><&|/l&>Queue</&>:</td>
  				<td class="value">
  				<& /Elements/SelectQueue, Name => 'Queue' &>
  				</td>
  			</tr>
  			<tr class="input-row">
  				<td class="label"><&|/l&>Requestor</&>:</td>
  				<td class="value">
  					<& /Elements/EmailInput, Name => 'Requestor', Size => '40' &>
  				</td>
  			</tr>
  			<tr class="input-row">
  				<td class="label"><&|/l&>From</&>:</td>
  				<td class="value">
  					<& /Elements/SelectDate, Name => "From" || '' &>
  				</td>
  			</tr>
  			<tr class="input-row">
  				<td class="label"><&|/l&>To</&>:</td>
  				<td class="value">
  					<& /Elements/SelectDate, Name => "To" || '' &>
  				</td>
  			</tr>
  		</table>
  		<div class="submit">
  			<div class="buttons">
  				<input class="button" type="submit" value="Submit" />
  			</div>
  		</div>
  	</form>
  </&>
  
  

This is the result:

Rt3p.png



I tried to keep the look and feel of RT by copying code from the other portlets. I even used RT syntax to allow the localization of string, but not everything gets translated since not all the terms are in RT resource files.

Since Sphinx' extended search syntax can be very elaborated, I put a HELP link in the upper right corner (circled in red in the image above). This points to a page that I extracted from Sphinx' documentation. The page is in <rt-base-dir>/local/html/Prefs. The file is called SphinxHelp.html.

SphinxHelp.html follows:

<& /Elements/Header, Title => "Sphinx - Free open-source SQL full-text search engine" &>
<& /Elements/Tabs,
	Title => "Help for Sphinx search syntax",
	current_tab    => 'index.html',
	current_toptab => 'index.html',
	tabs           => $tabs,
	actions        => $actions,
&>

<div align="left" class="refresh">

	<div>
		<div>
			<h2 style="clear: both"><a name="extended-syntax"></a>Extended query syntax</h2>
		</div>
	</div>

	<p>
	The following special operators and modifiers can be used when using the extended matching mode:
	</p>
		<div>
			<ul type="disc">
				<li>operator OR: <pre class="programlisting">hello | world</pre></li>
				<li>operator NOT:<pre class="programlisting">
hello -world
hello !world</pre>
				</li>
				<li>field search operator: <pre class="programlisting">@title hello @body world</pre></li>
				<li>field position limit modifier (introduced in version 0.9.9-rc1): <pre class="programlisting">@body[50] hello</pre></li>
				<li>multiple-field search operator: <pre class="programlisting">@(title,body) hello world</pre></li>

				<li>all-field search operator: <pre class="programlisting">@* hello</pre></li>
				<li>phrase search operator: <pre class="programlisting">"hello world"</pre></li>
				<li>proximity search operator: <pre class="programlisting">"hello world"~10</pre></li>
				<li>quorum matching operator: <pre class="programlisting">"the world is a wonderful place"/3</pre></li>
				<li>strict order operator (aka operator "before"): <pre class="programlisting">aaa &lt;&lt; bbb &lt;&lt; ccc</pre></li>

				<li>exact form modifier (introduced in version 0.9.9-rc1): <pre class="programlisting">raining =cats and =dogs</pre></li>
				<li>field-start and field-end modifier (introduced in version 0.9.9-rc2): <pre class="programlisting">^hello world$</pre></li>
			</ul>
		</div>
	<p>

	Here's an example query that uses some of these operators:
	</p><div class="example"><a name="ex-extended-query"></a><p class="title"><b>Example:&nbsp;Extended matching mode: query example</b></p><div class="example-contents"><pre class="programlisting">
	"hello world" @title "example program"~5 @body python -(php|perl) @* code
	</pre></div></div>
	<p><br class="example-break">

	The full meaning of this search is:

	</p><div class="itemizedlist"><ul type="disc"><li>Find the words 'hello' and 'world' adjacently in any field in a document;</li>
	<li>Additionally, the same document must also contain the words 'example' and 'program'
		in the title field, with up to, but not including, 10 words between the words in question;
		(E.g. "example PHP program" would be matched however "example script to introduce outside data
		into the correct context for your program" would not because two terms have 10 or more words between them)</li>
	<li>Additionally, the same document must contain the word 'python' in the body field, but not contain either 'php' or 'perl';</li>
	<li>Additionally, the same document must contain the word 'code' in any field.</li>
	</ul></div>
	<p>
	</p><p>
	There always is implicit AND operator, so "hello world" means that
	both "hello" and "world" must be present in matching document.
	</p><p>
	OR operator precedence is higher than AND, so "looking for cat | dog | mouse"
	means "looking for ( cat | dog | mouse )" and <span class="emphasis"><em>not</em></span>

	"(looking for cat) | dog | mouse".
	</p><p>
	Field limit operator limits subsequent searching to a given field.
	Normally, query will fail with an error message if given field name does not exist
	in the searched index. However, that can be suppressed by specifying "@@relaxed"
	option at the very beginning of the query:
	</p><pre class="programlisting">
	@@relaxed @nosuchfield my query
	</pre><p>
	This can be helpful when searching through heterogeneous indexes with
	different schemas.
	</p><p>
	Field position limit, introduced in version 0.9.9-rc1, additionaly restricts the searching
	to first N position within given field (or fields). For example, "@body[50] hello" will
	<span class="bold"><strong>not</strong></span> match the documents where the keyword 'hello' occurs at position 51 and below
	in the body.
	</p><p>
	Proximity distance is specified in words, adjusted for word count, and
	applies to all words within quotes. For instance, "cat dog mouse"~5 query
	means that there must be less than 8-word span which contains all 3 words,
	ie. "CAT aaa bbb ccc DOG eee fff MOUSE" document will <span class="emphasis"><em>not</em></span>
	match this query, because this span is exactly 8 words long.

	</p><p>
	Quorum matching operator introduces a kind of fuzzy matching.
	It will only match those documents that pass a given threshold of given words.
	The example above ("the world is a wonderful place"/3) will match all documents
	that have at least 3 of the 6 specified words.
	</p><p>
	Strict order operator (aka operator "before"), introduced in version 0.9.9-rc2,
	will match the document only if its argument keywords occur in the document
	exactly in the query order. For instance, "black &lt;&lt; cat" query (without
	quotes) will match the document "black and white cat" but <span class="emphasis"><em>not</em></span>
	the "that cat was black" document. Order operator has the lowest priority.
	It can be applied both to just keywords and more complex expressions,
	ie. this is a valid query:
	</p><pre class="programlisting">
	(bag of words) &lt;&lt; "exact phrase" &lt;&lt; red|green|blue
	</pre><p>
	</p><p>

	Exact form keyword modifier, introduced in version 0.9.9-rc1, will match the document only if the keyword occurred
	in exactly the specified form. The default behaviour is to match the document
	if the stemmed keyword matches. For instance, "runs" query will match both
	the document that contains "runs" <span class="emphasis"><em>and</em></span> the document that
	contains "running", because both forms stem to just "run" - while "=runs"
	query will only match the first document. Exact form operator requires
	<a href="#conf-index-exact-words" title="11.2.39.&nbsp;index_exact_words">index_exact_words</a> option to be enabled.
	This is a modifier that affects the keyword and thus can be used within
	operators such as phrase, proximity, and quorum operators.
	</p><p>
	Field-start and field-end keyword modifiers, introduced in version 0.9.9-rc2,
	will make the keyword match only if it occurred at the very start or the very end
	of a fulltext field, respectively. For instance, the query "^hello world$"
	(with quotes and thus combining phrase operator and start/end modifiers)
	will only match documents that contain at least one field that has exactly
	these two keywords.
	</p><p>
	Starting with 0.9.9-rc1, arbitrarily nested brackets and negations are allowed.
	However, the query must be possible to compute without involving an implicit
	list of all documents:
	</p><pre class="programlisting">
	// correct query
	aaa -(bbb -(ccc ddd))

	// queries that are non-computable
	-aaa
	aaa | -bbb
	</pre><p>
	</p>

</div>

<div align="right" class="search-result-actions">
&nbsp;
</div>

<%init>

my @results;
my $skip_create = 1;
my $actions;
my $tabs = $m->comp("/Elements/DashboardTabs");

</%init>

The last part is about passing the form's (portlet) parameters to the PHP script.

As I told you before, the problem here is that Mason does not support PHP out of the box. I found a few hacks around the web to make Mason support PHP, but they were either not applicable to our environment or too difficult to apply.

So we implemented another hack: since we already have another virtual domain where PHP works, we created an html page that embeds (within an iframe) the results from the PHP page that resides in the other virtual domain.

That's why you see this little piece of code

        		<script language="JavaScript">
        			document.domain = "AMENDED";
        		</script>

in the PHP script above: you have to declare that both the HTML page and the PHP script belong to the same domain (although on different subdomains), otherwise you get a security error in the browser.

The html page is located in <rt-base-dir>/local/html/Tools and the file is called sphinx.html.

sphinx.html follows:

<& /Elements/Header, Title => "Search results" &>
  <& /Elements/Tabs,
  	Title => "Search results",
  	current_tab    => 'index.html',
  	current_toptab => 'index.html',
  	tabs           => $tabs,
  	actions        => $actions,
  &>
  
  <div align="left">
  
  <table width=95% border=0>
  	<tr>
  		<td>
  			<iframe id="sphinxresults" src="http://PHP_VIRTDOMAIN_AMENDED/sphinx.php?<% "Text=" . $Text . "&Queue=" . $Queue . "&Requestor=" . $Requestor . "&From=" . $From . "&To=" . $To %>" width="95%" height="200" frameborder="0" onload="resizeIframeToFitContent(this)"></iframe>
  		</td>
  	</tr>
  </table>
  
  </div>
  
  <div align="right" class="search-result-actions">
   
  </div>
  
  <%init>
  
  my @results;
  my $skip_create = 1;
  my $actions;
  my $tabs = $m->comp("/Elements/DashboardTabs");
  
  </%init>
  
  
  <%args>
  
  $Text
  $Queue => ''
  $Requestor => ''
  $From => ''
  $To => ''
  
  </%args>
  
  

There is one last piece of code that we need to put in place: it allows us to insert a little piece of custom CSS and custom Javascript in all RT page headers.

You have to create a file called InHeader in <rt-base-dir>/local/html/NoAuth/css/web2 (web2 is the name of the default RT CSS, if you use another CSS you have to change this path). The contents of this file are shown below.

InHeader follows:

<style>
	pre.programlisting
	{
		background-color:	#f0f0f0;
		padding:			0.5em;
		margin-left:		2em;
		margin-right:		2em;
	}
</style>

<script language="JavaScript">
function resizeIframeToFitContent(iframe) {
	// This function resizes an IFrame object
	// to fit its content.
	// The IFrame tag must have a unique ID attribute.
	iframe.height = document.getElementById(iframe.id).contentWindow.document.body.scrollHeight + 50;
}
</script>

<script language="JavaScript">
	document.domain = "AMENDED";
</script>

Finally, to add our portlet to the installable portlets in RT, you have to modify your RT_SiteConfig.pm file and add the portlet name to the $HomePageComponents variable. If you don't see the $HomePageComponents variable in your RT_SiteConfig.pm file, copy it from the RT_Config.pm file.

Mine looks like this:

Set($HomepageComponents, [qw(QuickCreate Quicksearch MyAdminQueues MySupportQueues MyReminders RefreshHomepage Dashboards SphinxSearch)]);


Conclusion

Whew, it was long, but not difficult I hope.

Following the instruction in this article you get an installable portlet that will allow you to easily (and quickly!) search the RT ticket database.

Sphinx' syntax allows very powerful searches and its performance is unbeatable. We find it so convenient that we've found us often using it in place of every other search.

We had to implement a few hacks to embed the PHP code, but that's just because we were more familiar with PHP. If you are a Perl hacker you can use the Sphinx' Perl API to integrate Sphinx even better. Let's see what the community can come up with!

I hope this was useful. If you need help, I can be reached through RT mailing lists.

Cris