Tuesday, September 23, 2014

Lucene + Hibernate criteria : A hybrid approach to create effective filtering of results

Apache Lucene

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Apache Lucene is an open source project available for free download. Please use the links on the right to access Lucene.

Hibernate Search
Built on top of Apache Lucene Hibernate search offers full-text search support for objects stored by Hibernate ORM, Infinispan and other sources. Think of it as Google for your entities:


  • search words with text
  • order results by relevance
  • find by approximation (fuzzy search)


One day I received a requirement, where-in I was supposed to implement fuzzy search ability in my application. My immediate thought is to implement hibernate-search as the project was already making use of hibernate as ORM. But then I came across difficulties which appeared to be hurdles in implementation.

I had many entities in my application with soft delete mechanisms "status = deleted" and then subscription date range, approval statuses "status = approved" and all. Now such criterias started becoming a big pain as I was newbe in Lucene and Hibernate-Search technologies. I started hunting internet to find an approach where I should be able to use words/phrases/sentences and also apply these default filters on my entities so as to fetch only those entities containing passed words/phrases/sentences and which are fallen within a given date range/status and all.

Finally after lot of hunt on internet and a very kewl hibernate-search documentation I was able to find a hybrid approach where I will be able to create a Lucene query along with hibernate criteria to filter out results. Here I am trying to explain the same thing.

Assume having a following entity structure

@Entity
@Table(name="organization")//Db table
@Indexed(index="Organization")//Physical directory name on file system path can be mentioned in hibernate property file
//Custom analyzer to provide custom tokanization for phonetic search ability
@AnalyzerDef(name="customanalyzer",
  tokenizer=@TokenizerDef(factory = StandardTokenizerFactory.class),
  filters={
   @TokenFilterDef(factory = LowerCaseFilterFactory.class),
   @TokenFilterDef(factory = PhoneticFilterFactory.class,
    params=@Parameter(name = "encoder", value = "DoubleMetaphone")
   )
}
)
public class ShodOrganization implements Serializable {
 private static final long serialVersionUID = 1L;

 @Id
 @Column(unique=true, nullable=false)
 @GeneratedValue
 private Integer organizationId;

 @Column
 @Field(index=Index.YES,store=Store.YES,analyze=Analyze.YES)
 @Analyzer(definition="customanalyzer")
 private String address;

 @Column
 @Field(index=Index.YES,store=Store.YES,analyze=Analyze.YES)
 @Analyzer(definition="customanalyzer")
 private String city;

 @Column
 private String country;

 @Column
 private String description;

 @Column(nullable=false)
 private Integer isApproved;

 @Column(nullable=false)
 private Integer enabled;

 @Column(nullable=false)
 @Field(index=Index.YES,store=Store.YES,analyze=Analyze.YES)
 @Analyzer(definition="customanalyzer")
 private String organizationName;

 @Column
 @Field(index=Index.YES,store=Store.YES,analyze=Analyze.YES)
 @Analyzer(definition="customanalyzer")
 private String pin;

 @Column
 @Field(index=Index.YES,store=Store.YES,analyze=Analyze.YES)
 @Analyzer(definition="customanalyzer")
 private String state;

 @Column
 private String email;
 
 @Column
 private String website;
 


 public ShodOrganization() {
 }

 public Integer getOrganizationId() {
  return this.organizationId;
 }

 public void setOrganizationId(Integer organizationId) {
  this.organizationId = organizationId;
 }

 public String getAddress() {
  return this.address;
 }

 public void setAddress(String address) {
  this.address = address;
 }

 public String getCity() {
  return this.city;
 }

 public void setCity(String city) {
  this.city = city;
 }

 public String getCountry() {
  return this.country;
 }

 public void setCountry(String country) {
  this.country = country;
 }

 public String getDescription() {
  return this.description;
 }

 public void setDescription(String description) {
  this.description = description;
 }

 public Integer getIsApproved() {
  return this.isApproved;
 }

 public void setIsApproved(Integer isApproved) {
  this.isApproved = isApproved;
 }

 public String getOrganizationName() {
  return this.organizationName;
 }

 public void setOrganizationName(String organizationName) {
  this.organizationName = organizationName;
 }

 public String getPin() {
  return this.pin;
 }

 public void setPin(String pin) {
  this.pin = pin;
 }

 public String getState() {
  return this.state;
 }

 public void setState(String state) {
  this.state = state;
 }

 public Integer getEnabled() {
  return enabled;
 }

 public void setEnabled(Integer enabled) {
  this.enabled = enabled;
 }

 public String getEmail() {
  return email;
 }

 public void setEmail(String email) {
  this.email = email;
 }
}

//Following API can be put in DAO implementation class in my case it was OrganizationDAOImpl
private Criteria getCriteria(String[]selectedCategories){
  Criteria criteria = currentSession().createCriteria(Organization.class);
  criteria.add(Restrictions.eq("isApproved", Constants.APPROVESTATUS.APPROVE.getValue()));
  criteria.add(Restrictions.eq("enabled", Constants.ENABLESTATUS.ENABLED.getValue()));
  return criteria;
  
 }

//Following API can be put again in same or a special Generic class for more Generic implementation



/**
* t : Type of class on which search needs to performed
* queryString : word or phrase which needs to be searched with full text search ability
* pageNo,pageSize for pagination
* criteria: hibernate criteria obtained from above method
*/
public List getSearchResults(Class t,String queryString,
   Integer pageNo,Integer pageSize, Criteria criteria){
  LOGGER.debug("Getting results for "+t.getName());
  if(null == queryString || queryString.isEmpty()){
   return getSearchResults(criteria, pageNo, pageSize);
  }
  Session session = getSession();//Get hibernate session from sessionfactory
  String [] searchFields = new String {"organizationName","address","city","state","pin","description"};
  FullTextSession fullTextSession = Search.getFullTextSession(session);
  org.apache.lucene.search.Query query = buildQuery(queryString, searchFields, fullTextSession, t);
  FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(query,t);
  if(null != criteria){
   fullTextQuery.setCriteriaQuery(criteria);
  }
  //Paginate
  if(pageNo == null || pageNo == 0){
   pageNo = 1;
  }
  if(pageSize == null || pageSize == 0){
   pageSize = 5;
  }
  Integer rec = pageNo*pageSize - (pageSize -1);
  fullTextQuery.setFirstResult(rec-1);//Start from pageno.
  fullTextQuery.setMaxResults(pageSize);//Return this much records
  LOGGER.debug("search query is "+query.toString());
  if(null != criteria){
   LOGGER.debug("criteria is "+ criteria.toString());
  }
  LOGGER.debug("records number "+rec+" pagesize is "+pageSize);
  return fullTextQuery.list();
  
 }
private org.apache.lucene.search.Query buildQuery(String queryString,String [] searchFields,FullTextSession fullTextSession,Class t ){
  if(null != queryString && !StringUtils.isEmpty(queryString) && StringUtils.contains(queryString, ' ')){
   LOGGER.info("creating phrase query");
   BooleanQuery bq = new BooleanQuery();
   for(String searchField : searchFields){
    PhraseQuery query = new PhraseQuery();
    StringTokenizer st = new StringTokenizer(queryString, " ");
     if(null != st){
      while(st.hasMoreTokens()){
       query.add(new Term(searchField,st.nextToken()));
      }
      bq.add(query, Occur.SHOULD);
     }
    }
   return bq;
  }
  
  QueryBuilder qb = fullTextSession.getSearchFactory().buildQueryBuilder().forEntity(t).get();
  return qb.keyword().onFields(searchFields).matching(queryString).createQuery();
}
Using above way you should be able to use hibernate criteria along with lucene text search ability to implement a state-of-art search in your application