Friday, November 25, 2011

Sqlite LIKE Query Slow problem and solution

http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html

http://joshua.perina.com/africa/gambia/fajara/post/converting-to-sqlite-and-like-query-optimization

Summary
Avoid using LIKE query in Sqlite since it can't make use of an index!

5.5.1 Replace GLOB and LIKE when possible

The GLOB and LIKE operators are expensive in SQLite because they can't make use of an index. One reason is that these are implemented by user functions which can be overridden, so the parser has no way of knowing how they might behave in that case. This forces a full scan of the table for the column being matched against, even if that column has an index.

Example: The expression (x GLOB 'abc*') can be replaced by (x >= 'abc' AND x < 'abd').
Note: The LIKE operator is case sensitive. This means that the above won't work for LIKE unless the strings involved are all either upper- or lower-case.
If a string of arbitrary characters of fixed length is to be matched, this can be replaced by the SQL length() function.
Example: The expression (x GLOB '???') can be replaced by (length(x) == 3).
If a * or % wildcard occurs in the middle of a string, or a complex pattern appears at the end, we can still use the above techniques to create a filter test which is done before the more expensive pattern test.
Example: The expression (x GLOB 'abc*jkl*') can be replaced by (x > 'abc' AND x < 'abd' AND x GLOB 'abc*jkl*').
Example: The expression (x LIKE 'abc_') can be replaced by (x > 'abc' AND x < 'abd' AND length(x) == 4).



Now to query this table for all records in Egypt we use the query:
    SELECT * FROM Table WHERE Path LIKE 'geo-Africa-Egypt-%'
This has always been fine in our SqlExpress days as even when we have had to do complex queries against 10s of thousands of records the response time was always under 100ms.  So it was with shock and concern that after converting to SQLite I suddently found the same queries taking as much as 16 seconds to run on the same machine! 
Fortunately for me a man name Jim Lyon, way back in september of 2003, wrote a document which both explained and solved my dilemma.  His document can be found here  as is an invaluable document about optimizing your SQLite databases and queries.  The crux of the problem is this
 The GLOB and LIKE operators are expensive in SQLite because they can't make use of an index.
There is some reason for this but fortunately for me our LIKE query just has a wildcard at the end.  Jim explains all the details, but in the end it is possible to optimize this query to look like the following:
    SELECT * FROM Table WHERE Path >= 'geo-Africa-Egypt-' AND Path < 'geo-Africa-Egypt-zzz'
There MUST be an index on the Path column for this to work.  By adding the 'zzz' at the end the query will naturally select everything in the alphabetic sort order between the first and second query parameters. 
The end result was spectacular, my newly optimized queries were now running super fast, on both SQLite (~50ms) and even improved on SqlExpress (~60ms).  I have to give credit to Microsoft, as they must have some optimizations already built in for this type of query and its been making my life easy up until this point.    

No comments:

Post a Comment