My schedule still sucks, so I am immediately jumping in to the deep end of the Lucene pool with this article. At some point Real Soon Now, I’ll do a proper introduction to Lucene and Lucene.NET.  Look for additional Lucene content this summer as I (hopefully) continue to work with this technology.

Faceted search/navigation is a hip, trendy feature that’s becoming more and more common in information retrieval systems (which is fancy talk for search engines).  It enables users to drill down through information along different dimensions.  If you’ve used Amazon in the last decade, odds are that you’re actually already familiar with faceted navigation.  You use it every time you drill down into a product category or narrow your results by price.

Faceted search is not supported out of the box with Lucene.NET, but there are numerous posts and E-mails floating around the web about how to implement it for different scenarios.  However, none of the examples I saw worked for what I needed: I needed faceting across attributes that were hierarchical in nature.  An example of such an attribute is a date.  A date can be partitioned by decade, year, quarter, month, day and more.  For my system, I didn’t want to hard-code the faceting by a single partition, because that prevents users from drill down into the data.  What I wanted was a system that would automatically determine how to partition the data based on the dates attached to the documents in the search results: if the documents spanned multiple years, then they should be partitioned by year; if they occur in the same year but span months, then partition by month. 

The solution I came up with appears fast and works well.  In today’s article, we’ll look at the class that actually performs the faceting.  In a future post, I’ll show you how it is used to provide faceted navigation to users in an ASP.NET MVC application.

Enough talk, time for some code:

   1: /// <summary>
   2: /// Simple faceter that accumulates DateTime objects.
   3: /// </summary>
   4: public class DateFaceter
   5: {
   6:     #region Private Fields
   7:  
   8:     /// <summary>
   9:     /// Used to extract dates from documents.  Exposing this as an action
  10:     /// allows a single type of DateFaceter to work for multiple types of 
  11:     /// dates.
  12:     /// </summary>
  13:     private Func<InspireDocument, DateTime?> mDateSelector;
  14:  
  15:     /// <summary>
  16:     /// The name of the facet.
  17:     /// </summary>
  18:     private string mName;
  19:  
  20:     /// <summary>
  21:     /// This is used to partition dates first by year and then
  22:     /// by month.
  23:     /// </summary>
  24:     private HierarchicalPartition<int> mDatePartition;
  25:  
  26:     /// <summary>
  27:     /// The name of the Lucene field being faceted. 
  28:     /// </summary>
  29:     private string mFieldName;
  30:  
  31:     #endregion
  32:  
  33:     #region Constructors
  34:  
  35:  
  36:     /// <summary>
  37:     /// Creates a faceter that will use the specified selector to 
  38:     /// get date information from documents.
  39:     /// </summary>
  40:     /// <param name="name">The user-friendly label for the facet.</param>
  41:     /// <param name="fieldName">The name of the Lucene field.</param>
  42:     /// <param name="dateSelector">A function that returns the date from the document.</param>
  43:     public DateFaceter(string name, string fieldName, Func<InspireDocument, DateTime?> dateSelector)
  44:     {
  45:         mName = name;
  46:         mDateSelector = dateSelector;
  47:         mFieldName = fieldName;
  48:  
  49:         mDatePartition = new HierarchicalPartition<int>();
  50:     }
  51:  
  52:     #endregion
  53:  
  54:     #region Private Methods
  55:  
  56:     /// <summary>
  57:     /// Creates a new facet based on the year counts.
  58:     /// </summary>
  59:     /// <returns></returns>
  60:     private Facet GetFacetByYear()
  61:     {
  62:         //Create the facets
  63:         FacetDimension[] dimensions = new FacetDimension[mDatePartition.Keys.Length];
  64:         //This is used to sort the facets since sorting by the name won't give
  65:         //the intended order for months.
  66:         int[] sortKeys = new int[mDatePartition.Keys.Length];
  67:  
  68:         for (int i = 0; i < mDatePartition.Keys.Length; i++)
  69:         {
  70:             int year = mDatePartition.Keys[i];
  71:             
  72:             sortKeys[i] = year;
  73:  
  74:             //Create the criteria filter.
  75:             string criteria = string.Format("{0}:[{1} TO {2}]", mFieldName,
  76:                                 new DateTime(year, 1, 1).ToString(LuceneHelper.DateFormat),
  77:                                 new DateTime(year, 12, 31).ToString(LuceneHelper.DateFormat));
  78:  
  79:             dimensions[i] = new FacetDimension(sortKeys[i].ToString(), 
  80:                                 mDatePartition.Get(mDatePartition.Keys[i]).Count,
  81:                                 criteria);
  82:         }
  83:  
  84:         Array.Sort(sortKeys, dimensions);
  85:  
  86:         return new Facet(mName, dimensions);
  87:     }
  88:  
  89:     /// <summary>
  90:     /// Creates a new facet based on the months counts for the only year.
  91:     /// </summary>
  92:     /// <returns></returns>
  93:     private Facet GetFacetByMonth()
  94:     {
  95:         int year = mDatePartition.Keys[0];
  96:  
  97:         //Get the year's partition.
  98:         HierarchicalPartition<int> yearPartition = mDatePartition.Get(year);
  99:  
 100:         //If it doesn't have at least two entries, then we won't even try to create a facet.
 101:         if (yearPartition.Keys.Length < 2)
 102:         {
 103:             return null;
 104:         }
 105:  
 106:         //Create the facets
 107:         FacetDimension[] dimensions = new FacetDimension[yearPartition.Keys.Length];
 108:         //This is used to sort the facets since sorting by the name won't give
 109:         //the intended order for months.
 110:         int[] sortKeys = new int[yearPartition.Keys.Length];
 111:  
 112:         for (int i = 0; i < yearPartition.Keys.Length; i++)
 113:         {
 114:             int month = yearPartition.Keys[i];
 115:  
 116:             sortKeys[i] = month;
 117:  
 118:             string monthName = CultureInfo.CurrentCulture.DateTimeFormat.GetMonthName(month);
 119:  
 120:             //Build the filter criteria
 121:             string criteria = string.Format("{0}:[{1} TO {2}]", mFieldName,
 122:                                 new DateTime(year, month, 1).ToString(LuceneHelper.DateFormat),
 123:                                 //This is a handy trick to get the last day of the target month:
 124:                                 //from the first day of the target month, add one month, then
 125:                                 //subtract one day, which will then be the last day of the
 126:                                 //target month.
 127:                                 new DateTime(year, month, 1).AddMonths(1).AddDays(-1).ToString(LuceneHelper.DateFormat));
 128:  
 129:             //This creates a partition that has the name of the month instead of the month's numeric value.
 130:             dimensions[i] = new FacetDimension(monthName,
 131:                 yearPartition.Get(month).Count, criteria);
 132:         }
 133:  
 134:         Array.Sort(sortKeys, dimensions);
 135:  
 136:         //This creates a facet named like "Start Date - YYYY"
 137:         return new Facet(mName + " - " + year, dimensions);
 138:     }
 139:  
 140:     #endregion 
 141:  
 142:     /// <summary>
 143:     /// Accumulates facet counts from the specified document's data.
 144:     /// </summary>
 145:     /// <param name="document"></param>
 146:     public void Accumulate(InspireDocument document)
 147:     {
 148:         //if the document has a date, process it.
 149:         DateTime? date = mDateSelector(document);
 150:  
 151:         if (date != null)
 152:         {
 153:             //Partition the date based on the year.
 154:             HierarchicalPartition<int> yearPartition = mDatePartition.Get(date.Value.Year);
 155:             yearPartition.Count++;
 156:  
 157:             //Further partition by month.
 158:             HierarchicalPartition<int> monthPartition = yearPartition.Get(date.Value.Month);
 159:             monthPartition.Count++;
 160:         }
 161:     }
 162:  
 163:     /// <summary>
 164:     /// Creates a new facet based on the data observed by Accumulate.
 165:     /// </summary>
 166:     /// <returns></returns>
 167:     public Facet GetFacet()
 168:     {
 169:         int[] keys = mDatePartition.Keys;
 170:  
 171:         //If there is nothing to facet, return null
 172:         if (keys.Length == 0)
 173:         {
 174:             return null;
 175:         }
 176:         //If there is only one year, attempt to facet by month..
 177:         else if (keys.Length == 1)
 178:         {
 179:             return GetFacetByMonth();
 180:         }
 181:         //Otherwise, there are multiple years, so facet them.
 182:         else
 183:         {
 184:             return GetFacetByYear();
 185:         }
 186:     }
 187: }

This somewhat-simple class implements an accumulator that divides date information into buckets first by year, then by month.  The counts are maintained using this simple structure:

   1: /// <summary>
   2: /// A hierarchy of partitions.
   3: /// </summary>
   4: /// <typeparam name="TKey">The type of the key to the partition.</typeparam>
   5: /// <remarks>
   6: /// This class is not thread-safe. 
   7: /// </remarks>
   8: public class HierarchicalPartition<TKey>
   9: {
  10:     #region Private Fields
  11:  
  12:     /// <summary>
  13:     /// Stores the sub-partitions.
  14:     /// </summary>
  15:     private IDictionary<TKey,HierarchicalPartition<TKey>> mPartitions;
  16:  
  17:     #endregion
  18:  
  19:     #region Public Properties
  20:  
  21:     /// <summary>
  22:     /// The count for the partition.
  23:     /// </summary>
  24:     public int Count { get; set; }
  25:  
  26:     /// <summary>
  27:     /// The keys that are defined in this partition.
  28:     /// </summary>
  29:     public TKey[] Keys
  30:     {
  31:         get
  32:         {
  33:             return mPartitions.Keys.ToArray();
  34:         }
  35:     }
  36:  
  37:     #endregion
  38:  
  39:     #region Public Constructors
  40:  
  41:     /// <summary>
  42:     /// Creates a new empty partition.
  43:     /// </summary>
  44:     public HierarchicalPartition()
  45:     {
  46:         mPartitions = new Dictionary<TKey, HierarchicalPartition<TKey>>();
  47:     }
  48:  
  49:     #endregion
  50:  
  51:     #region Public Methods
  52:  
  53:     /// <summary>
  54:     /// Gets the sub-partition with the specified key.  If it doesn't exist,
  55:     /// it is created.
  56:     /// </summary>
  57:     /// <param name="value">The sub-partition's key.</param>
  58:     /// <returns></returns>
  59:     public HierarchicalPartition<TKey> Get(TKey value)
  60:     {
  61:         if (!mPartitions.ContainsKey(value))
  62:         {
  63:             HierarchicalPartition<TKey> partition = new HierarchicalPartition<TKey>();
  64:             mPartitions.Add(value, partition);
  65:         }
  66:  
  67:         return mPartitions[value];
  68:     }
  69:  
  70:     #endregion
  71: }

The structure is generic so that it can (potentially) support faceting over non-numeric values.

When it comes time to build and return the facet, it evaluates the tree structure of dates and determines the correct way to partition the data.  The resulting facet contains query clauses that can be appended to the current query to  provide drill-down like functionality.

That’s it for this post, look for me Real Soon Now!