My schedule still sucks, so I am immediately jumping in to the deep end of the Lucene pool with this article. At some point Real Soon Now, I’ll do a proper introduction to Lucene and Lucene.NET. Look for additional Lucene content this summer as I (hopefully) continue to work with this technology.
Faceted search/navigation is a hip, trendy feature that’s becoming more and more common in information retrieval systems (which is fancy talk for search engines). It enables users to drill down through information along different dimensions. If you’ve used Amazon in the last decade, odds are that you’re actually already familiar with faceted navigation. You use it every time you drill down into a product category or narrow your results by price.
Faceted search is not supported out of the box with Lucene.NET, but there are numerous posts and E-mails floating around the web about how to implement it for different scenarios. However, none of the examples I saw worked for what I needed: I needed faceting across attributes that were hierarchical in nature. An example of such an attribute is a date. A date can be partitioned by decade, year, quarter, month, day and more. For my system, I didn’t want to hard-code the faceting by a single partition, because that prevents users from drill down into the data. What I wanted was a system that would automatically determine how to partition the data based on the dates attached to the documents in the search results: if the documents spanned multiple years, then they should be partitioned by year; if they occur in the same year but span months, then partition by month.
The solution I came up with appears fast and works well. In today’s article, we’ll look at the class that actually performs the faceting. In a future post, I’ll show you how it is used to provide faceted navigation to users in an ASP.NET MVC application.
Enough talk, time for some code:
This somewhat-simple class implements an accumulator that divides date information into buckets first by year, then by month. The counts are maintained using this simple structure:
The structure is generic so that it can (potentially) support faceting over non-numeric values.
When it comes time to build and return the facet, it evaluates the tree structure of dates and determines the correct way to partition the data. The resulting facet contains query clauses that can be appended to the current query to provide drill-down like functionality.
That’s it for this post, look for me Real Soon Now!