Using Indexes

Indexes can speedup both selection queries and join queries if they are applied properly. The following two sections describe scenarios that indexes are explored inside the system for query processing purpose.

Selection Queries

The query optimizer chooses to use a secondary index for query execution if both of the following conditions are met:

  • The query contains a conjunctive equality or range predicate over one or more fields, or a join predicate (see the next section). The conjunctive predicate has a form:

    QualifiedName Operator Literal ( AND field Operator Literal )+

    where Operator is =, >, >=, <, <=, or BETWEEN;

  • There is an index with a key, such that the corresponding fields in the predicate form a prefix of that key. For example, suppose that there is an index on dataset foo with key fields c_s and c_d.

    CREATE INDEX idx_s_d ON foo(c_s:STRING, c_d:DOUBLE);

The following query uses the index because it has an equality predicate (=) on a field (c_s) that is a prefix of the indexed key (c_s, c_d):

SELECT f.c_x as res
FROM foo f
WHERE f.c_s = 'world';

If you would like an available index to not be used for a particular query predicate (e.g., because there will be many, many matching objects), a skip-index hint can be used:

SELECT f.c_x as res
FROM foo f
WHERE f.c_s /*+ skip-index */ = 'world';

If multiple indexes are eligible access paths, there can be two cases:

  • Two or more indexes are sharing the same prefix and the predicate is on that prefix: For example: indexA is on (c1, c2), indexB is on (c1, c3) and the predicate is on c1 (c1 = 100). In this case, the query optimizer picks the first index in the order of their names, e.g., indexA;

  • No indexes share the same prefix and the predicate refers to fields from each individual index. For example: indexA in on (c1) and indexB is on (c2), the predicate is on c1 and c2 (c1 = 100 and c2 = 200). In this case, both indexes will be used to retrieve matched primary keys and then the key sets will be intersected to further filter retrieved primary keys.

Join Queries

SQL++ support joins from standard SQL in the following forms:

  • Inner join:

    SELECT * FROM ds_outer, ds_inner WHERE <predicate>;
    SELECT * FROM ds_outer JOIN ds_inner ON <predicate>;
  • Left outer join:

    SELECT * FROM ds_outer LEFT OUTER JOIN ds_inner ON <predicate>;

ds_outer is the outer branch and ds_inner is the inner branch (in the order in which they appear in the FROM clause).

The query optimizer picks an index for join evaluation if the following conditions are met:

  • The join predicate is an equality or range predicate that refers to fields from both branches of the join, in the form of:

    fn(expr_outer) /*+ indexnl */ op field_inner

    where op is <, <=, =, >=, >, or BETWEEN;

  • field_inner is a field from the inner dataset on which the index is defined;

  • The index join hint, /*+ indexnl */, is provided for the join predicate;

  • fn() is a function that returns the same data type as the type specified in the index for the field_inner. Usually, fn() is one of the following: to_string(), to_double(), or to_bigint().

Note that if the type of expr_outer is not known at compile time therefore the index join cannot be selected if the join predicate is just expr_outer /*+ indexnl */ op field_inner.

For example, suppose there are two shadow datasets: foo1 and foo2, with an index on foo2:

CREATE index idx_f2 ON foo2(c_s2:string);

Then the following query would use this index for join evaluation:

SELECT f1.c_x1 as c1, f2.c_x2 as c2
FROM foo1 AS f1, foo2 AS f2
WHERE to_string(f1.c_s1)  /*+ indexnl */ = f2.c_s2