Developing new providers

David F. Barrero <david DOT barrero AT rediris DOT es>

Introduction

In some cases you will need to use Searchy with some information support that Searchy cannot deal with, for example some data base, or you will want to implement some specific information retrieval algorithm. In those cases you will have to develop a custom provider.

Searchy has been designed to be quite flexible, and new providers may be easily developed. You will not need to deal with SOAP details and other nasty stuff, you will deal just with your algorithm, with your business logic and all the things you like to deal with (at least I hope you like them ;-)).

A provider may perform any task you want, there is only one limitation: is receives a query encapsulated in a Dublin Core element (see the Quick Uses's Guide to see a complete list of DC elements) and it must return its results also in Dublin Core. Do not worry about it, this stuff is done in two or three lines of code. Well, actually another limitation is that it have to be implemented in Java, but I do not thing it is a great problem. In a future new providers may be implemented in Python and other languages with a Java wrapper.

A provider may access any information support, using any retrieval algorithm, using any technology. The range of potential applications is huge, just use your imagination!.

Let's go on...

The develop of a new provider is as easy as implement a Java interface, and a simple one.

A provider must perform at least two tasks, initialise itself with the config info given by the Searchy core, and attend queries. Those tasks are done thought the two methods in the Provider interface.

The provider will be created only one time when Searchy is starting, and Searchy will create as many instances of the provider as defined in the config file. Each time a request is received it launches the query in a thread, but you do not have to worry about it.

You can prepare your class to be a Searchy class with only four simple steps:

  1. Import the required classes and interface.

    import es.rediris.searchy.engine.map.*; 
    import es.rediris.searchy.dc.*;  
    import es.rediris.searchy.engine.provider.*;
  2. Implement in your class declaration the interface ``es.rediris.searchy.engine.provider.Provider''.
  3. Implement method

    public void setConfig(ProviderConfig)
    This method is called when Searchy initialises the providers and is used to set the configuration of the provider. All the provider init stuff should be placed here. If you prefere, it might also be placed in the class constructor, but you will not have the config info of agent.xml, so, we strongly suggest using setConfig instead of the constructor for initialisation stuff.

    ProviderConfig is basically a hash table that keeps the config info given in agent.xml. You can retrieval a parameter with the method getParameter(String). All the checks about parameters should be done here, if there is a parameter not given, you should raise an exception. Searchy will do the rest of the job.

    ProviderConfig also keeps the map, that is needed to perform the mapping to Dublin Core.

    An example may clarify all this. Following you can find the setConfig method used in the Google provider.

    1.- public void setConfig(ProviderConfig config){  
    2.-   this.config = config;  
    3.-   this.map = config.getMap(); 
    4.-  
    5.-   this.key = config.getParameter(``key''); 
    6.-   if (this.key == null) {  
    7.-     throw new IllegalArgumentException( 
    8.-     "Google client key required");  
    9.-   }  
    10.-}
    In line 2 we keep the given ProviderConfig to be used in other methods of the class, it might not be necessary. Line 3 stores the mapping, it has not to been done, you may use method getMap() each time you need it, it depends on your coding style.
    The following code keeps the needed parameters for better performance, and verify if they exist; in this case, only one parameter in needed, the Google key, if it is not found, an exception is raised.

  4. Implement method

    public DC doQuery(DCResource query)
    This is the main method, where you put your business logic. There are some subtasks that must be done.

    1. Get the query string. Just copy and paste the following code:

      Enumeration f = query.elementsDCElement();  
      DCElement element = (DCElement) f.nextElement();  
      String query = this.map.mapQuery(element.getName(),element.getValue());
      The string query contains the user query and mapped as defined in the map section of the config file. If there is some error in the query map, a null string is returned.
      The interpretation of the query string depends on you.

    2. Do anything you want.
    3. Return the result. It might be the most difficult task.
      The result must be returned in an DC object. This object encapsulates a set of resources described using Dublin Core (each resource is a DCResource object, but you do not have to worry about it). The key here is to construct a DC object, for this task you have some help. You will not need to deal directly with Dublin Core or worry about how to map the info you have about the resource to Dublin Core, this stuff is done by Searchy.
      Just create a HashMap object and keep the pairs key-valour in it for each resource you have. The keys are those fields in the form %foo% used in the mapping section of the config file (see Quick User's guide); for example, the title or the creator of a resource. With the hash table call the method mapResponse(), this method will return a DCResouce that have to be added to the DC object using the add() method.
      Let's see it with an example based on the SQL provider code.

      1.- ResultSet rs = this.stmt.executeQuery(SQLquery);  
      2.- ResultSetMetaData meta = rs.getMetaData(); 
      3.- 

      4.- while (rs.next()) {  
      5.-   HashMap resultHash = new HashMap(15);  
      6.-  
      7.-   for (int i=1; i<meta.getColumnCount()+1; i++) {  
      8.-     String valour = rs.getString(i);

      9.-     String key = meta.getColumnName(i); 
      10.-    resultHash.put(key, valour); 
      11.-  }  
      12.- 
      13.-  result.add(this.map.mapResponse(resultHash));  
      14.- } 
      15.- 
      16.- rs.close();  
      17.- return result;

      Lines 1 to 4 are specific to JDBC and it is not the objective of this guide, you just have to know that line 4 will perform a loop to process each entry of the data base, the loop will finish when there are no more entries in the data base. Line 5 creates a HashMap object with size 15 (the number of elements of Dublin Core, you may use the size that best fits in you provider). The next loop, in lines 7 to 11 puts the pairs key-valour in the hash table. Here we use the column name as key and its content as valour.
      The map from this domain to Dublin Core is defined in the config file, and is done in line 13 when the hash table is completed. Point out that the method mapResponse() returns a DCResource object while result.add() takes a DCResource as parameter, so we can compose them. In some application storing the return of mapResponse() in a DCResource object may be recommended. Line 16 is a JDBC issue.
      To finish, the provider returns the DC object. That's all.

We encourage you to read the JavaDoc documentation (execute ``ant javadoc'' in Searchy home to generate it) to get more details. Reading the code of any working provider may be a quite good exercise, for example, the Google provider is a quite simple provider, and may be a good starting point, see it in src/es/rediris/searchy/engine/providers/ProviderGoogle.java.

Stating up the provider

Once you have programmed your provider, it is time to execute it. Edit the config file (usually it is conf/agent.xml) and in the provider section, insert a new provider of type custom. Set a class label with the name of the class that implements your provider and any other label to set the provider parameters as it needs. Those parameters will be the ones given in the ProvideConfig object.

At least, if you use the init scripts, they must be modified to include your class and its dependences in the CLASSPATH. Extra .jar files in the scripts may be placed in the indicated place, so, it is quite easy to modify.

Logging

If you are going to use your provider in production, using a logging system is highly recommended. Searchy uses log4j, a logging system developed by the Apache team.

Using log4j in you provider is extremely simple, just import class org.apache.log4j.Logger, and insert this line in your class, it is usually the first line after the class declaration.

static Logger logger = Logger.getLogger(YouClassNameHere.class);
That's all that must be done to enable log4g. Now, instead of using System.out.print(), use one of the following sentences:

logger.debug(message);

logger.info(message);

logger.warn(message);

logger.error(message);

logger.fatal(message);

Depending of the sort of message you want to send, you should use one or another method. Using log4j lets you to have a full and precise control over the format of the messages that your provider produces using the log4j config file, usually in conf/log4j.properties.

Small warning

In a future, our intention is to add Searchy RDF full support, with any vocabulary. If this feature is added, the providers interface may change. The general idea may not change, but backwards compatibility may be broken, so small changes to providers may be done in order to be used in future versions of Searchy.

Support

If you have any question, or there are something you want to comment, please, do not dub and contact us thought the mailing lists, the forum or directly by email.

Contributions

Any contribution to this project will be welcomed. It you have created a provider and you think that is may useful to anyone, let we know to make it part of Searchy.

About this document ...

Developing new providers

This document was generated using the LaTeX2HTML translator Version 2002 (1.62)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -no_subdir -split 0 -no_navigation providersGuide.tex

The translation was initiated by on 2004-06-16


2004-06-16