James Robinson | fa68617 | 2015-02-23 18:32:32 -0800 | [diff] [blame^] | 1 | |
| 2 | 1. OVERVIEW |
| 3 | |
| 4 | This README file describes the syntax of the arguments that may be passed to |
| 5 | the FTS3 MATCH operator used for full-text queries. For example, if table |
| 6 | "t1" is an Fts3 virtual table, the following SQL query: |
| 7 | |
| 8 | SELECT * FROM t1 WHERE <col> MATCH <full-text query> |
| 9 | |
| 10 | may be used to retrieve all rows that match a specified for full-text query. |
| 11 | The text "<col>" should be replaced by either the name of the fts3 table |
| 12 | (in this case "t1"), or by the name of one of the columns of the fts3 |
| 13 | table. <full-text-query> should be replaced by an SQL expression that |
| 14 | computes to a string containing an Fts3 query. |
| 15 | |
| 16 | If the left-hand-side of the MATCH operator is set to the name of the |
| 17 | fts3 table, then by default the query may be matched against any column |
| 18 | of the table. If it is set to a column name, then by default the query |
| 19 | may only match the specified column. In both cases this may be overriden |
| 20 | as part of the query text (see sections 2 and 3 below). |
| 21 | |
| 22 | As of SQLite version 3.6.8, Fts3 supports two slightly different query |
| 23 | formats; the standard syntax, which is used by default, and the enhanced |
| 24 | query syntax which can be selected by compiling with the pre-processor |
| 25 | symbol SQLITE_ENABLE_FTS3_PARENTHESIS defined. |
| 26 | |
| 27 | -DSQLITE_ENABLE_FTS3_PARENTHESIS |
| 28 | |
| 29 | 2. STANDARD QUERY SYNTAX |
| 30 | |
| 31 | When using the standard Fts3 query syntax, a query usually consists of a |
| 32 | list of terms (words) separated by white-space characters. To match a |
| 33 | query, a row (or column) of an Fts3 table must contain each of the specified |
| 34 | terms. For example, the following query: |
| 35 | |
| 36 | <col> MATCH 'hello world' |
| 37 | |
| 38 | matches rows (or columns, if <col> is the name of a column name) that |
| 39 | contain at least one instance of the token "hello", and at least one |
| 40 | instance of the token "world". Tokens may be grouped into phrases using |
| 41 | quotation marks. In this case, a matching row or column must contain each |
| 42 | of the tokens in the phrase in the order specified, with no intervening |
| 43 | tokens. For example, the query: |
| 44 | |
| 45 | <col> MATCH '"hello world" joe" |
| 46 | |
| 47 | matches the first of the following two documents, but not the second or |
| 48 | third: |
| 49 | |
| 50 | "'Hello world', said Joe." |
| 51 | "One should always greet the world with a cheery hello, thought Joe." |
| 52 | "How many hello world programs could their be?" |
| 53 | |
| 54 | As well as grouping tokens together by phrase, the binary NEAR operator |
| 55 | may be used to search for rows that contain two or more specified tokens |
| 56 | or phrases within a specified proximity of each other. The NEAR operator |
| 57 | must always be specified in upper case. The word "near" in lower or mixed |
| 58 | case is treated as an ordinary token. For example, the following query: |
| 59 | |
| 60 | <col> MATCH 'engineering NEAR consultancy' |
| 61 | |
| 62 | matches rows that contain both the "engineering" and "consultancy" tokens |
| 63 | in the same column with not more than 10 other words between them. It does |
| 64 | not matter which of the two terms occurs first in the document, only that |
| 65 | they be seperated by only 10 tokens or less. The user may also specify |
| 66 | a different required proximity by adding "/N" immediately after the NEAR |
| 67 | operator, where N is an integer. For example: |
| 68 | |
| 69 | <col> MATCH 'engineering NEAR/5 consultancy' |
| 70 | |
| 71 | searches for a row containing an instance of each specified token seperated |
| 72 | by not more than 5 other tokens. More than one NEAR operator can be used |
| 73 | in as sequence. For example this query: |
| 74 | |
| 75 | <col> MATCH 'reliable NEAR/2 engineering NEAR/5 consultancy' |
| 76 | |
| 77 | searches for a row that contains an instance of the token "reliable" |
| 78 | seperated by not more than two tokens from an instance of "engineering", |
| 79 | which is in turn separated by not more than 5 other tokens from an |
| 80 | instance of the term "consultancy". Phrases enclosed in quotes may |
| 81 | also be used as arguments to the NEAR operator. |
| 82 | |
| 83 | Similar to the NEAR operator, one or more tokens or phrases may be |
| 84 | separated by OR operators. In this case, only one of the specified tokens |
| 85 | or phrases must appear in the document. For example, the query: |
| 86 | |
| 87 | <col> MATCH 'hello OR world' |
| 88 | |
| 89 | matches rows that contain either the term "hello", or the term "world", |
| 90 | or both. Note that unlike in many programming languages, the OR operator |
| 91 | has a higher precedence than the AND operators implied between white-space |
| 92 | separated tokens. The following query matches documents that contain the |
| 93 | term 'sqlite' and at least one of the terms 'fantastic' or 'impressive', |
| 94 | not those that contain both 'sqlite' and 'fantastic' or 'impressive': |
| 95 | |
| 96 | <col> MATCH 'sqlite fantastic OR impressive' |
| 97 | |
| 98 | Any token that is part of an Fts3 query expression, whether or not it is |
| 99 | part of a phrase enclosed in quotes, may have a '*' character appended to |
| 100 | it. In this case, the token matches all terms that begin with the characters |
| 101 | of the token, not just those that exactly match it. For example, the |
| 102 | following query: |
| 103 | |
| 104 | <col> MATCH 'sql*' |
| 105 | |
| 106 | matches all rows that contain the term "SQLite", as well as those that |
| 107 | contain "SQL". |
| 108 | |
| 109 | A token that is not part of a quoted phrase may be preceded by a '-' |
| 110 | character, which indicates that matching rows must not contain the |
| 111 | specified term. For example, the following: |
| 112 | |
| 113 | <col> MATCH '"database engine" -sqlite' |
| 114 | |
| 115 | matches rows that contain the phrase "database engine" but do not contain |
| 116 | the term "sqlite". If the '-' character occurs inside a quoted phrase, |
| 117 | it is ignored. It is possible to use both the '-' prefix and the '*' postfix |
| 118 | on a single term. At this time, all Fts3 queries must contain at least |
| 119 | one term or phrase that is not preceded by the '-' prefix. |
| 120 | |
| 121 | Regardless of whether or not a table name or column name is used on the |
| 122 | left hand side of the MATCH operator, a specific column of the fts3 table |
| 123 | may be associated with each token in a query by preceding a token with |
| 124 | a column name followed by a ':' character. For example, regardless of what |
| 125 | is specified for <col>, the following query requires that column "col1" |
| 126 | of the table contains the term "hello", and that column "col2" of the |
| 127 | table contains the term "world". If the table does not contain columns |
| 128 | named "col1" and "col2", then an error is returned and the query is |
| 129 | not run. |
| 130 | |
| 131 | <col> MATCH 'col1:hello col2:world' |
| 132 | |
| 133 | It is not possible to associate a specific table column with a quoted |
| 134 | phrase or a term preceded by a '-' operator. A '*' character may be |
| 135 | appended to a term associated with a specific column for prefix matching. |
| 136 | |
| 137 | 3. ENHANCED QUERY SYNTAX |
| 138 | |
| 139 | The enhanced query syntax is quite similar to the standard query syntax, |
| 140 | with the following four differences: |
| 141 | |
| 142 | 1) Parenthesis are supported. When using the enhanced query syntax, |
| 143 | parenthesis may be used to overcome the built-in precedence of the |
| 144 | supplied binary operators. For example, the following query: |
| 145 | |
| 146 | <col> MATCH '(hello world) OR (simple example)' |
| 147 | |
| 148 | matches documents that contain both "hello" and "world", and documents |
| 149 | that contain both "simple" and "example". It is not possible to forumlate |
| 150 | such a query using the standard syntax. |
| 151 | |
| 152 | 2) Instead of separating tokens and phrases by whitespace, an AND operator |
| 153 | may be explicitly specified. This does not change query processing at |
| 154 | all, but may be used to improve readability. For example, the following |
| 155 | query is handled identically to the one above: |
| 156 | |
| 157 | <col> MATCH '(hello AND world) OR (simple AND example)' |
| 158 | |
| 159 | As with the OR and NEAR operators, the AND operator must be specified |
| 160 | in upper case. The word "and" specified in lower or mixed case is |
| 161 | handled as a regular token. |
| 162 | |
| 163 | 3) The '-' token prefix is not supported. Instead, a new binary operator, |
| 164 | NOT, is included. The NOT operator requires that the query specified |
| 165 | as its left-hand operator matches, but that the query specified as the |
| 166 | right-hand operator does not. For example, to query for all rows that |
| 167 | contain the term "example" but not the term "simple", the following |
| 168 | query could be used: |
| 169 | |
| 170 | <col> MATCH 'example NOT simple' |
| 171 | |
| 172 | As for all other operators, the NOT operator must be specified in |
| 173 | upper case. Otherwise it will be treated as a regular token. |
| 174 | |
| 175 | 4) Unlike in the standard syntax, where the OR operator has a higher |
| 176 | precedence than the implicit AND operator, when using the enhanced |
| 177 | syntax implicit and explict AND operators have a higher precedence |
| 178 | than OR operators. Using the enhanced syntax, the following two |
| 179 | queries are equivalent: |
| 180 | |
| 181 | <col> MATCH 'sqlite fantastic OR impressive' |
| 182 | <col> MATCH '(sqlite AND fantastic) OR impressive' |
| 183 | |
| 184 | however, when using the standard syntax, the query: |
| 185 | |
| 186 | <col> MATCH 'sqlite fantastic OR impressive' |
| 187 | |
| 188 | is equivalent to the enhanced syntax query: |
| 189 | |
| 190 | <col> MATCH 'sqlite AND (fantastic OR impressive)' |
| 191 | |
| 192 | The precedence of all enhanced syntax operators, in order from highest |
| 193 | to lowest, is: |
| 194 | |
| 195 | NEAR (highest precedence, tightest grouping) |
| 196 | NOT |
| 197 | AND |
| 198 | OR (lowest precedence, loosest grouping) |
| 199 | |
| 200 | Using the advanced syntax, it is possible to specify expressions enclosed |
| 201 | in parenthesis as operands to the NOT, AND and OR operators. However both |
| 202 | the left and right hand side operands of NEAR operators must be either |
| 203 | tokens or phrases. Attempting the following query will return an error: |
| 204 | |
| 205 | <col> MATCH 'sqlite NEAR (fantastic OR impressive)' |
| 206 | |
| 207 | Queries of this form must be re-written as: |
| 208 | |
| 209 | <col> MATCH 'sqlite NEAR fantastic OR sqlite NEAR impressive' |