Back to Applications

What Is Perlfect Search


Perfect Search is an integrated, general-purpose, site indexer and search engine. It comes as a pair of distinct scripts, the indexer and the search engine. The indexer automatically scans and indexes a Web site, and the search engine is a CGI script that serves search queries for keywords over the index and displays results pages in HTML. This is in a standard format, including title, description, and relevance ranking for each matching document. Advanced features include stopwords, a potent exclude mechanism, and a handy automatic installation and configuration utility.

For installation instructions, please click here.

Indexing your site

After the script has been installed, you will need to index your site to be able to perform searches.

Most people want to index the files as they are on the server's disk, and this is what will happen by default. If your pages are generated dynamically (e.g. via PHP) you will want to index them via http. This is also important for security reasons since dynamic files might contain passwords that should not be indexed in their source. To index dynamic pages, load conf.pl into an editor and set $HTTP_START_URL.

Indexing Using ssh/telnet

  1. Log in to your account using ssh/telnet if it is on a remote machine.
  2. Go to the directory where the script was installed. The setup utility will have installed the script in a directory perlfect/search/ inside your cgi-bin directory.
  3. Run the indexer program with the command: perl indexer.pl and wait until it's finished.

Indexing Using a Web Browser

  1. If you cannot log in to your server via telnet/ssh, you can start the indexing process with your browser. This is less secure than logging in via ssh to start the indexer, so it should only be used if necessary.
  2. Set a password at $INDEXER_CGI_PASSWORD in conf.pl.
  3. Load the index_form.html HTML file into an editor and change the action attribute value of the <form> tag so that it points to your server.
  4. Load index_form.html with a browser, enter your password and submit the form. indexer.pl must be executable by your server for this, and the data and temp directories need to be writeable, so you have to set the according permissions with your FTP program.

Depending on how large your site is, you will need to wait for some time while the indexer digests all of your site's content. If you stop the indexing (e.g. with Ctrl-C if you are in a shell), your index will not be updated. Perfect Search will continue to use the old index.

Putting a Search Box on Your Pages

The setup utility will have the search script installed inside your cgi-bin directory in a subdirectory called /search. If your cgi-bin is at the URL http://>yourdomain.com/cgi-bin/, the location of the search script will be http://yourdomain.com/cgi-bin/search/search.pl. Point your browser to this URL to see if it works. If the script has been installed correctly and an index has been successfully created using indexer.pl, this URL should return a results page for an empty query (i.e. a page that tells you there are no results). You can then use the following HTML code to insert the search box in any of your pages (or use search_form.html, which contains this code):

<form method="get" action="cgi-bin/search/search.pl">
<input type="hidden" name="p" value="1">
<input type="hidden" name="lang" value="en">
<input type="hidden" name="include" value="">
<input type="hidden" name="exclude" value="">
<input type="hidden" name="penalty" value="0">
<select name="mode">
<option value="all">Match ALL words</option>
<option value="any">Match ANY word</option>
</select>
<input type="text" name="q">
<input type="submit" value="Search">
</form>

You might have to change the form's action attribute to fit your local setup. Here's a list of the possible fields (the defaults are okay for most people, so you probably don't need to change anything):

  • p - This is an internal variable (for the current page), you should not change it.
  • lang - Use this attribute to set the language of the result page. The text strings for new languages and the paths to the templates must be added to conf.pl.
  • include - If you only want to search a part of all indexed files, you can limit the search to certain paths with this option. Example: /archive/ will exclude all files except those whose pathnames match "/archive/". You can also set a regular expression. Setting this to "" will search all files.

    Do not use this to protect private files (see below).

  • exclude - If you want to exclude the files in certain paths, use this option. Example: /old_stuff/. This is evaluated after include, so you can restrict the set of files with include and then further restrict it with this option. You can also set a regular expression. Setting this to "" will not exclude any files.

    Do not use this to protect private files, as anybody can change this option. To protect private/secret files, use conf/no_index.txt instead and re-index your files.

  • penalty - You can decrease the ranking of old documents with this option, i.e. they will appear more at the end of all matches. This may be useful for mailing list archives, where new articles are often more interesting. The value is a float number that sets the decrease in percent per age in days. Example: with 0.5 a 100 days old document's ranking will be decreased by 50% (the calculation does not use the percentages you see in the result pages). Even if a document's ranking is decreased to 0%, it will still appear as a match. Your server should send a Last-modified header if you want to use this option and you index your pages via http. If it doesn't, the pages without this header will be regarded as very new (their date is "now"). Often dynamically generated page lack the Last-modified header, but it depends on the contents of the page if it makes sense to regard the page as up-to-date.
  • mode - Set the default operator to all (logical AND) or any (logical OR). This sets how terms are connected if more than one term is entered by the user. It's just a default, users can still use +/- in front of their terms. If you don't want your users to see the selection, just make it a hidden field.
  • - This is the name of the search field.

Customizing the Results Page

Inside the directory where Perlfect Search was installed, you will find a directory called templates. Inside it, there are the files search.html and no_match.html. You can open these files with your favorite text editor and edit them to customize the look of the results page. It is like a regular HTML file, but there are some comments in it that tell the Perlfect Search where to insert the dynamic results.

The result pages are valid XHTML. Please support web standards and test the pages for correctness at validator.w3.org if you make changes to them.

Template files themselves are not valid XHTML, but the generated pages that show the result of a search are. To test a template, search for something, save the result page, PerfectPerfectpages100-day-ol, and upload that file to the validator.

Highlighting Matched Terms

Perfect Search allows you to display the documents with all search terms highlighted. Each search result has a "highlight matches" link for that. This feature is limited to HTML pages that follow some simple restrictions:

  • Attribute values may not contain < or >, e.g. <img src="..." alt="<b>Picture</b>"> is forbidden
  • <script> and <style> sections need to be commented out. Example of how to comment out <script>:

<script>
<!�Here comes the javascript// -->
</script>

If your documents don't follow these restrictions, the pages may be displayed garbled. You should then disable this feature by setting $HIGHLIGHT_MATCHES = 0; in conf. pl. You can use @HIGHLIGHT_EXT to set which files have a "highlight matches" link. Usually, these are just HTML files, including HTML files generated by PHP, etc. (only if $HTTP_START_URL is set), but not for PDF files, etc.

The "highlight matches" feature takes a URL as a parameter�still it will refuse to work on any URL that was not indexed. This is a security measure so people cannot just load any file from your server or view any URL on the web via your server.

Excluding Directories or Files from the Index

Local filesystem

Inside the directory where Perfect Search was installed, you'll find a directory called conf. Inside it, there's a file called no_index.txt. Open it with your favorite text editor and add the paths of any files you want to exclude from indexing, one on each line. The use of the wildcard character * is supported, so for example a line containing /dir1/dir2/file.* will match any file in /dir1/dir2/ that starts with file. If you want to exclude a whole directory, use /dir1/dir_to_exclude/*

You need to run the indexer. pl again after making changes to this file.

Files fetched via http

If you are using the $HTTP_START_URL option to fetch your files via http you can also exclude certain files from the index by adding this meta tag to their head: The robots.txt file in the document root of your web server is also taken into consideration.

Searching

  1. Type in one or more words into the search field and click "Search" (or press Return).
  2. If Match ALL words are selected, only those documents are returned that contain all of your search terms. With Match ANY word, all documents are returned that contain at least one of your search terms. Alternatively, you can put a plus sign (+) directly in front of one or more words to only get those files that include all of those words. Words with a minus (-) sign directly in front of them change the result so that only documents are listed that don't contain any of those words.

    Phrase searches are not supported, so it does not work to put quotes around your query "like this."

  3. The results are ordered by relevance with the most relevant documents listed first. Relevance depends on the number and position of matched words in the documents.


Related Articles

How To Activate SSL In OSCommerce
How To Reset My OSCommerce Password
What Is OSCommerce
How To Reset My Drupal Password
What Is Drupal

Can’t Find what you need?

No worries, Our experts are here to help.