* Purpose: Basic instructions to use this class.
* @(#) $Header: /home/mlemos/cvsroot/htdiginterface/README,v 1.1 2005/02/08 06:14:30 mlemos Exp $
PHP interface for Ht:/Dig versions 3.1.x or 3.2.x:
This class provides an interface to the Ht:/Dig package of programs to
simplify the process of configuration, indexing and searching a site.
Despite Ht:/Dig can work with an existing configuration files, this class
can only work properly if you use a configuration file generated by the
The class sets certain configuration directives to work with special
result page template files that are necessary to let the class parse the
search results and extract the information returned by htsearch program.
The special template files are supplied within this class package. There
are also example scripts to perform each of the steps to configure, index
and search a site with Ht:/Dig.
To make this class work properly, please follow these steps:
1. The htdig_setup_configuration.php example script demonstrates how to
setup the class so it can create a suitable configuration file for
You can tell it to supersede the default Ht:/Dig configuration file or
generate a new file in a different path.
You may generate as many different configuration files as you want,
possibly one configuration file for each site that you may be hosting in
the same server. In this case, you may want to specify different
directories for the database files that will contain each site index.
The script should call the GenerateConfiguration function to tell the
class to create the configuration file.
This function takes an array of values for any Ht:/Dig options that you
may want to set to customize the indexing and searching processes of your
The GenerateConfiguration function merges your custom options with some
options that the class needs to set to make the search results page
parsing work properly. Those options set the file names of the output
results templates to: htdig_header.html, htdig_nomatch.html,
htdig_syntaxerror.html and htdig_template.html .
The GenerateConfiguration function just takes a special option named
template_path to specify an alternative directory for the template files
if you want to put them in the current directory of your site index and
search page script.
2. The next step after creating a suitable configuration file is to start
the process of crawling a site to build the index database files.
The htdig_build_databases.php example script demonstrates how to start a
crawling session. It calls the class function named Dig that wraps around
the htdig, htmerge and htfuzzy commands.
This function can be called as often as you want, eventually using
different configuration files, if you want, to index different sites. This
is something that you probably will schedule to be done once a day on low
traffic hours for each of your sites.
Scheduled crawling can be done using tools like cron or equivalent in your
operating system, using PHP CGI or CLI versions to run the crawler script
off the Web server.
The Dig function calls Ht:/Dig programs in a way that they will create
temporary index database files during the indexing process. Only when the
process is ended, the final index database files replaced with the
contents of temporary files.
This way you can run a crawling process at the same time the site is being
searched by your users using database files from the previous crawling
3. Once your site is indexed at least once, you can start using the class
to provide an interface to search your site pages. Take a look at the
htdig_search.php script for an example site search page. You can use this
example script as base for your customized site search page.
The example script presents a simple search form. When the form is
submitted, it calls the Search function and outputs the results split into
pages with links to navigate between each pages of search results. The
number of results per page is configurable.