Smart Date Searching with Solr

June 26, 2017  |  6 minutes to read


tl;dr: I created a custom Solr filter that allows for natural date searching. Here’s the source.


Out of the box, Solr comes with some pretty powerful date-searching capabilities. For example, say you wanted to find records from the beginning of time until August 19, 1976. Easy:

my_date_field:[* TO 1976-08-20T00:00:00}

Or maybe you only want records from last week?

my_date_field:[NOW-7DAY/DAY TO NOW]
Two people on a date by the beach at sunset
A date. And some solar rays.

But what if you wanted to find all records with a date in the month of March? Or on a Tuesday? Or every July 4th?

Solr’s default date-searching abilities can’t handle specific queries like this. Fortunately, there are a couple of ways around this.

Option 1: Break dates into more digestible chunks

There’s no reason you can’t create more fields in your Solr core that present duplicate data in a friendlier format. After all, denormalization is the whole point of Solr!

Say you have a field like this:

<!-- stores a person's date of birth (DOB) -->
<field name="dob" type="pdate" indexed="true" stored="true" />

By adding a new field for each “chunk” of data we want to search:

<!-- stores a person's date of birth (DOB) -->
<field name="dob" type="pdate" indexed="true" stored="true" />

<field name="dob_day" type="pint" indexed="true" stored="true" />
<field name="dob_month" type="pint" indexed="true" stored="true" />
<field name="dob_year" type="pint" indexed="true" stored="true" />
<field name="dob_day_of_week" type="string" indexed="true" stored="true" />

…you’ll end up with a Solr core that can answer some scarily specific questions:

Fetch all people born on Christmas, when Christmas fell on a Sunday:

dob_month:12 AND dob_day:25 AND dob_day_of_week:sunday

Find everyone born on a Tuesday in July during the 70’s:

dob:[1970-01-01T:00:00:00Z TO 1980-01-01T:00:00:00Z} AND
dob_day_of_week:tuesday AND
dob_month:7

With raw querying power like this, it’s important to remember: it’s not whether or not you should, it’s whether or not you can.

The Jurassic Park logo
I mean, it all worked out in the end, right?

But it’s not perfect…

This solution has a couple of drawbacks:

  1. Users of your new Solr core will now have a lot of homework to do before they submit data to your core. They’ll need to preprocess their dates, adding the additional chunks that you require. Essentially, you’ve offloaded some of indexing work to your clients - and this logic will need to be rewritten for each client that submits data to your Solr core.
  2. Consumers of your Solr instance will need to be aware of which field to query. An intuitive query like dob:tuesday won’t work.

This brings us to option #2…

Option 2: Create a custom Solr filter

If you’re new to Solr, this suggestion may seem a bit extreme, but Solr actually has very robust customization support. I won’t say that it’s easy - there are a lot of moving pieces, and familiarity with Java development is required - but the process is sane once you’ve climbed the learning curve.

The upside of this solution is that it provides near limitless flexibility. Custom filters allow you to intercept the indexing (or query analysis) process, giving you fine-grained control over how Solr breaks down your input into tokens.

Practically speaking, this means we can create a Solr filter that does all the preprocessing required in option #1 (breaking down dates into day, month, year, and day-of-the-week components) and includes this logic in Solr’s own indexing process!

Several months ago, I took the dive and created a custom Solr filter that does just that. Dates indexed using this filter can be searched using queries like dob:june or dob: 06. Without further ado, here’s the custom filter’s source code.

Here’s the general idea:

  • When indexing a field that uses my custom filter (NfDateFilter), Solr passes the string representation of the date (like “2018-06-26”) to my filter.
  • The custom filter parses the date into a regular Java Date object.
  • Based on the date’s month, I add a number of commonly-used abbreviations (like “january”, “jan”, or “01”) to the list of tokens that users can use to pull up this record.
  • A few other nice-to-have tokens are conditionally added. For example, dates with years like “1998” are augmented with a “98” token. Single digit days - for example, the 3rd of any month - are expanded to include both “3” and “03”.

To use this filter, I add a reference to my custom filter’s .jar file in my core’s solrconfig.xml:

<config>
  <lib path="${solr.install.dir:../../../..}/server/solr/cores/NfDateFilter.jar" />
</config>

… and define a Solr field in my core’s managed-schema that uses this filter at index time:

<fieldType name="text_date" class="solr.TextField">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="io.nathanfriend.solr.NfDateFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

Now any field that uses the text_date type can be searched in a natural, human way:

my_date_field:jan
The Jurassic Park logo
You can purchase solar filters like these in the NfDateFilter merchandise store!

Some caveats

  • This filter requires that clients send their dates to Solr in a very specific string format: yyyy-MM-dd. If the filter encounters a date string that deviates from this format, it throws an exception.
  • In its current form, this filter only allows for English month abbreviations.
  • This filter doesn’t allow for searching against the day of the week (i.e. “Tuesday” or “Friday”), although this would be trivial to add.

Some related links:

  • This very helpful pair of articles that describe the process of creating a custom Solr filter from scratch: Part 1 and Part 2
  • This Stack Overflow question & answer, which helped me understand the difference between the DateRangeField type and the non-range date types (TrieDateField and DatePointField), as well as how to use curly brackets ({ and } ) in date range queries
  • Working with Dates, Solr’s own guide on date indexing and searching

Other posts you may enjoy:

Live Reloading An Angular 2+ App Behind NGINX

May 14, 2018  |  4 minutes to read

Fantastic Fast Fonts with system-ui

April 27, 2018  |  1 minute to read

Inspirograph

January 26, 2015  |  6 minutes to read

Auto-Ejecting Event Handlers

March 6, 2013  |  1 minute to read

The Diamond Operator Is Your Friend

December 25, 2012  |  Less than 1 minute to read