Quantcast
Channel: ATLChris.com
Viewing all articles
Browse latest Browse all 13

Working with Sphinx (Search Engine) on a LAMP (Linux, Apache, MySQL, and PHP) Stack Server

$
0
0

Sphinx LogoRecently, I started exploring the differences between the MyISAM and InnoDB MySQL engines. It turns out, the one I was using, MyISAM, is not the best option when it comes to large databases and FULLTEXT searching. After researching the options (MyISAM vs InnoDB), I decided InnoDB was the best engine for an upcoming project. In an effort to steer clear of MySQL FULLTEXT searching with the InnoDB engine, I started to explore server/cache based search engines services. Sphinx is a Linux and Windows based search engine service which allows fulltext searching of extremely large databases in a very efficient and quick manner.

This tutorial is going to walk you through how to install the latest version of Sphinx, configure it for a basic usage, and use the PHP API to run search queries in PHP on a LAMP (Linux, Apache, MySQL, PHP) stack server. This tutorial is based on Ubuntu 12.04 64bit and assumes you have a LAMP stack server setup and SSH access to your server. If you get stuck at any point, browse on over to the Sphinx documentation for additional explination.

In recent months, I have switched all of my website hosting to DigitalOcean. While this is not an ad or a post about DigitalOcean, I will say that their cloud based servers are perfect for playing around with new technologies, like Sphinx. If you set up a new droplet (cloud server) you can destroy it 1 hour later and only pay pennies for the time you used it.

Example MySQL Database

In this example, we are going to search through a database of movies based on title and synopsis. Below is the example MySQL structure to go along with this example.

CREATE TABLE `movies` (
  `movie_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(255) DEFAULT NULL,
  `synopsis` text,
  `timestamp` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`movie_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Installing Sphinx

We are going to start by downloading the Sphinx (latest release version at the time of writing this is 2.0.9) DEB file to our server.

sudo wget http://sphinxsearch.com/files/sphinxsearch_2.0.9-release-0ubuntu11~precise_amd64.deb

Now that we have the DEB file downloaded we need to install it. Before we can do that, we need to install the libpq5 package.

sudo apt-get install libpq5

Extract/Install Sphinx

sudo dpkg -i sphinxsearch_2.0.9-release-0ubuntu11~precise_amd64.deb

Remove the original download.

sudo rm -r sphinxsearch_2.0.9-release-0ubuntu11~precise_amd64.deb

Congratulations, Sphinx is now installed. Wasn’t that easy?

Setup & Configure Sphinx

Now that Sphinx is installed, we are going to configure it to work with an imaginary site, yoursite.com, which is located at /var/www/yoursitecom/public_html/ in this example. All of the commands below will need to be adjusted to reflect your sites paths and name.

Lets start by creating a folder structure for Sphinx inside our sites existing folder structure.

sudo mkdir -p /var/www/yoursitecom/sphinx/data /var/www/yoursitecom/sphinx/etc /var/www/yoursitecom/sphinx/logs

We just setup 3 folders inside the yoursitecom directory. Notice that we put everything in the parent folder, not the public facing folder. This is important for security reasons. The data folder is going to house the generated indexes for this site. The etc folder is going to contain our site specific config file, and the logs folder is going to house some Sphinx specific logs.

Now, lets create a couple of empty files which will recieve data in the future.

sudo touch /var/www/yoursitecom/sphinx/etc/sphinx.conf /var/www/yoursitecom/sphinx/logs/query.log /var/www/yoursitecom/sphinx/logs/searchd.log /var/www/yoursitecom/sphinx/logs/searchd.pid

Now that we have our file and folder structure setup, lets start configuring Sphinx.

sudo vim /var/www/yoursitecom/sphinx/etc/sphinx.conf
/* Define the source that will be used to index your database. */
source yoursitemovies {

	/* Configure your MySQL details. */
	type = mysql
	sql_host = localhost
	sql_user = mysql_user
	sql_pass = mysql_pass
	sql_db = mysql_db

	/* Ranged queries are useful to avoid notorious MyISAM table lockup when indexing lots of data. Learn More: http://sphinxsearch.com/docs/manual-2.0.9.html#conf-sql-query-range */
	sql_query_range = SELECT MIN(movie_id), MAX(movie_id) FROM movies
	
	/* Learn More: http://sphinxsearch.com/docs/manual-2.0.9.html#conf-sql-range-step */
	sql_range_step = 1000
	
	/* This is the query that will be indexed by Sphinx. In this example title and synopsis will be indexed and movie_id will identify the record. */
	sql_query = SELECT movie_id, title, synopsis FROM movies WHERE movie_id>=$start AND movie_id<=$end
	
}

/* Configure an index based of the source above. */
index yoursitemovies {

	/* Define your source from above. )
	source = yoursitemovies
	
	/* This is the path to store the index data/cache. Provide a filename without an extension. Learn More: http://sphinxsearch.com/docs/manual-2.0.9.html#conf-path */
	path = /var/www/yoursitecom/sphinx/data/movies
	
	/* This is the minimum word length to index. Learn More: http://sphinxsearch.com/docs/manual-2.0.9.html#conf-min-word-len */
	min_word_len = 3
	
	/* This allows searching of partial words. Learn More: http://sphinxsearch.com/docs/manual-2.0.9.html#conf-min-infix-len */
	min_infix_len = 3
	
}

/* Configure your searchd service. */
searchd {

	/* Ignore compatibility mode. Learn More: http://sphinxsearch.com/docs/manual-2.0.9.html#conf-compat-sphinxql-magics */
	compat_sphinxql_magics = 0
	
	/* Define your log file paths. */
	log = /var/www/yoursitecom/sphinx/logs/searchd.log
	query_log = /var/www/yoursitecom/sphinx/logs/query.log
	pid_file = /var/www/yoursitecom/sphinx/logs/searchd.pid
	
 	/* Configure the searchd listening port. Port 9312 is the recommended/default port. Learn More: http://sphinxsearch.com/docs/manual-2.0.9.html#conf-listen */
	listen = localhost:9312
	
}

Now that we have our Sphinx settings all setup, we are going to build our first index.

sudo /usr/bin/indexer --config /var/www/yoursitecom/sphinx/etc/sphinx.conf --all

Now that our index has been built, we can manually start our searchd service. This is the service that actually searches the index we just built.

sudo /usr/bin/searchd --config /var/www/yoursitecom/sphinx/etc/sphinx.conf

Now that everything is setup, configured and indexed, we can search our newly created index.

sudo /usr/bin/search -c /var/www/yoursitecom/sphinx/etc/sphinx.conf searchterm

You should see out that resembles the output below.

Sphinx 2.0.9-id64-release (rel20-r4115)
Copyright (c) 2001-2013, Andrew Aksyonoff
Copyright (c) 2008-2013, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/var/www/filmfedcom/sphinx/etc/sphinx.conf'...
index 'filmfedmovies': query 'first ': returned 54 matches of 54 total in 0.000 sec

displaying matches:
1. document=40, weight=2621
2. document=257, weight=2621
3. document=554, weight=2621
4. document=43, weight=1588
5. document=45, weight=1588
6. document=50, weight=1588
7. document=51, weight=1588
8. document=52, weight=1588
9. document=61, weight=1588
10. document=95, weight=1588
11. document=130, weight=1588
12. document=139, weight=1588
13. document=143, weight=1588
14. document=157, weight=1588
15. document=164, weight=1588
16. document=168, weight=1588
17. document=181, weight=1588
18. document=198, weight=1588
19. document=215, weight=1588
20. document=218, weight=1588

words:
1. 'first': 54 documents, 57 hits

index 'filmfedpeople': query 'first ': returned 0 matches of 0 total in 0.000 sec

words:
1. 'first': 0 documents, 0 hits

In the matched section you will see the document, this is the movie_id from the DB. You can use this number to pull the record from your DB to display search results. The weight number listed is the relevance of the search term. The higher the number, the stronger the relevance that record is to your search term.

Now that everything is working, lets configure some basic server settings to make the searchd service start on server startup as well as create a crontab to update our index every few hours.

By default, your would have to start the searchd service manually after server reboot. That is no fun. Let configure it to start automatically. Start by adding a the searchd start command to the servers rc.local file.

sudo vim /etc/rc.local

Paste the following code just above the exit 0 line.

/usr/bin/searchd --config /var/www/yoursitecom/sphinx/etc/sphinx.conf

Now that we have everything setup and running, we need to make sure our index stays up-to-date with our database. We are going to accomplish this by running a crontab to rotate (rebuild) our index every 6 hours.

sudo crontab -e

Add the following line to the bottom of the document.

0 */6 * * * /usr/bin/indexer --rotate --config /var/www/yoursitecom/sphinx/etc/sphinx.conf --all

That is it. We now have everything configured, running, and automated. Next, we are going to learn how to run search queries in PHP agains our Sphinx index.

Accessing Sphinx inside PHP

Sphinx has an amazing API that allows PHP to run search queries against the Sphinx index. The following code is an example PHP script which will explain a basic level PHP integration.

<?PHP

/* Include the Sphinx PHP API */
require_once('/usr/share/sphinxsearch/api/sphinxapi.php');

/* Start the SphinxClient class. */
$cl = new SphinxClient();

/* Match all words or any word? Learn More: http://sphinxsearch.com/docs/manual-2.0.9.html#api-func-setmatchmode */
$cl->SetMatchMode(SPH_MATCH_ALL);

/* Set your field weights manually. Learn More: http://sphinxsearch.com/docs/manual-2.0.9.html#api-func-setfieldweights */
$cl->SetFieldWeights(array('title' => 50, 'synopsis' => 10));


/* Learn More: http://sphinxsearch.com/docs/manual-2.0.9.html#api-func-setsortmode */
$cl->SetSortMode(SPH_SORT_RELEVANCE);

/* Works like MySQL LIMIT. This is how you can paginate search results. */
$cl->setLimits(0,10);

/* The Query method will search the search term against the selected index. Learn More: http://sphinxsearch.com/docs/manual-2.0.9.html#api-func-query */
$res = $cl->Query('searchterm', 'yoursitemovies');


/* Now that we have the matches and weight, we are going to make a comma separated list of movie_id's to pull attentional details out of our DB. */
$c = 1;
foreach ($res['matches'] AS $key => $value) {
	if ($c == 1) {
		$movie_ids = $key;
	}
	else {
		$movie_ids .= ',' . $key;
	}

	$c++;
}

/* Here, we are using a MySQL IN clause to pull the movie_id, title, and synopsis from the DB for display. */
$sql = $pdo->prepare('SELECT movie_id, title, synopsis FROM movies WHERE movie_id IN (' . $movie_ids . ')');
$sql->execute();

/* Next, we are going to combine the pulled data with the Sphinx data so we can display everything based on the Sphinx weight integer. */
while ($qresult = $sql->fetchObject()) {
	$results['matches'][$qresult->movie_id]['data'] = $qresult;
}

foreach ($results['matches'] AS $result) {
	echo $result['data']->title;
	echo $result['data']->synopsis;
}

/* Now, lets dump everything on the screen. */
echo '<pre>';
var_dump($result);
echo '</pre>';

Well, that is it. You have learned how to install, configure, and control Sphinx via PHP and command line. Please share your experience in the comments below. Were you successful with your Sphinx endeavor? Do you have a questions? Did you encounter an issue? I am here to help.


Viewing all articles
Browse latest Browse all 13

Trending Articles