Subqueries

    +
    In N1QL, a subquery is a SELECT query that is a constituent part of another N1QL query or subquery.

    Using subqueries, you can create multiple levels of nesting of queries. The outer levels of queries are called outer or parent queries, and inner level subqueries are called inner or child queries, or simply subqueries.

    Syntax

    subquery expr
    • A subquery in N1QL is limited to a SELECT query, whereas the outermost parent query can be any DML statement such as SELECT, INSERT, UPDATE, DELETE or MERGE.

    • Subquery must be enclosed in parenthesis.

    • When subquery is used in FROM clause, it must be aliased.

    • Subqueries are N1QL expressions and N1QL evaluates them similar to any other N1QL language expressions. When an expression is contained in a N1QL statement, the expression is evaluated once for every input document to the statement, and same applies to a subquery-expression - the inner SELECT or subquery is executed once for every input document considered in the outer query.

    • A subquery expression returns an array every time it is evaluated, and the array contains the results of the inner SELECT subquery.

    Subqueries in DML Statement Clauses

    A subquery can occur as part of various clauses of the parent query where generic N1QL expressions can be used. For instance, a SELECT statement can have a subquery in projection-list, WHERE clause, FROM clause, LET clause, and GROUP BY clause. Similarly, an UPDATE, INSERT, DELETE can have a subquery in the WHERE clause. For example, consider the following query which finds all the cities with landmarks that have airports.

    Q1
      SELECT t1.city FROM `travel-sample`.inventory.landmark t1 WHERE t1.city IN (SELECT RAW city FROM `travel-sample`.inventory.airport );  
    Results
    [
      {
        "city": "Abbeville"
      },
      {
        "city": "Avignon"
      }..
    ]

    In the above example, the subquery is used as a part of WHERE condition of the parent query and it returns an array of cities that have airports.

    Note that the IN or WITHIN predicate needs an array on right-side expression as the subquery evaluates to. Further, the usage of keyword RAW in the subquery projection ensures just the city attribute values in the result without JSON wrapping. The outer query browses through each of the landmark cities and returns the city name if it is in the array of cities returned by the inner subquery.

    When subquery is used in the FROM clause, that must be aliased, to be able to access and refer to the results documents of the subquery. In the following example Q2, the subquery is named as t1, and it returns the aggregated value of airport names by city and country. The outer query further processes the result of the subquery to find total number of airports by country where each city has more than 5 airports.

    Q2
    SELECT t1.country, array_agg(t1.city), sum(t1.city_cnt) as apnum
    FROM (SELECT city, city_cnt, array_agg(airportname) as apnames, country
          FROM `travel-sample`.inventory.airport
          GROUP BY city, country LETTING city_cnt = count(city) ) AS t1
    WHERE t1.city_cnt > 5
    GROUP BY t1.country;
    Results
    [
      {
        "$1": [
          "Paris"
        ],
        "apnum": 9,
        "country": "France"
      },
      {
        "$1": [
          "London"
        ],
        "apnum": 13,
        "country": "United Kingdom"
      },
      {
        "$1": [
          "Houston",
          "New York",
          "San Diego"
        ],
        "apnum": 22,
        "country": "United States"
      }
    ]

    For more examples, see the Examples section.

    Subqueries in N1QL Expressions

    Subqueries can appear as independent SELECT queries or can be combined as a constituent part of other N1QL language expressions.

    For example, in the above queries Q1 and Q2, the subquery is a full independent SELECT query. In the following example Q3, the subquery is part of a N1QL expression.

    Q3
    SELECT array_max((SELECT  count(city) as cnt
                      FROM `travel-sample`.inventory.airport
                      GROUP BY city)[*].cnt);
    Results
    [
      {
        "$1": 14
      }
    ]

    In this example, the subquery is wrapped in a parenthesis ( ) and it occurs as part of the expression [*].cnt which treats the result of the subquery as an array of JSON documents (with one attribute cnt) and extracts the cnt values from all the subquery result array elements. Finally, the array_max() function is applied to this to get the maximum number of airports in any of the cities.

    Variables in Scope of a Subquery

    The scope of subquery is defined as the set of names, identifiers, and variables that are accessible to the subquery. Variables include all keyspace names, aliases, LET or LETTING variables that can be referenced by the subquery. For a multilevel nested subquery, the scope includes scope of the subquery and all the scopes of its parent queries.

    A subquery cannot access variables from its sibling subqueries or its parents’ sibling subqueries. For example, in the following nested query structure, the subquery SQ13 can only access the variables from SQ13, SQ1, and Q0, but not from SQ12, or SQ2.

    Q0: SELECT ...
          (SELECT ..
    	     (SELECT ...) AS SQ12
    	     (SELECT ...) AS SQ13   ) AS SQ1
          (SELECT ...
    	     (SELECT ...) AS SQ21
    	     (SELECT ...) AS SQ22   ) AS SQ2

    Note that the keyspace name identifiers themselves are NOT variables, whereas any aliases of the keyspace name identifiers are variables in the scope. For example, in the above query Q1, the keyspace identifier `travel-sample`.inventory.landmark in the outer query is not considered a variable for the subquery. In fact, that is the reason the subquery can also use the same keyspace name independently and still be a non-correlated subquery. However, the alias t1 defined in the outer query is considered a variable in scope for the subquery.

    The following example Q4 is a correlated subquery that uses the variable t1 from the outer query:

    Q4: Find the landmarks that are named with corresponding city name
    SELECT t1.city, t1.name
    FROM `travel-sample`.inventory.landmark t1
    WHERE t1.city IN SPLIT((SELECT RAW t2.name
                            FROM t1 AS t2)[0]);
    Results
    [
      {
        "city": "Aberdour",
        "name": "Aberdour Castle"
      },
      {
        "city": "Aberdour",
        "name": "Aberdour Railway Station"
      },
      {
        "city": "Aberdulais",
        "name": "Aberdulais Falls and Tin Works"
      }...
    ]

    This above example Q4 uses the alias t1 from outer query in the FROM clause of subquery. The subquery aliases t1 to t2 to avoid conflict in same variable name t1 being present in both outer and inner queries scope. The reference to t2 in the subquery is actually referring to the same document from `travel-sample`.inventory.landmark that is being processed as t1 in the outer query. In other words, the same query can be simply rewritten (without using subqueries) as follows:

    Q4A
    SELECT t1.city, t1.name
    FROM `travel-sample`.inventory.landmark t1
    WHERE t1.city IN SPLIT(t1.name);
    Results
    [
      {
        "city": "Aberdour",
        "name": "Aberdour Castle"
      },
      {
        "city": "Aberdour",
        "name": "Aberdour Railway Station"
      },
      {
        "city": "Aberdulais",
        "name": "Aberdulais Falls and Tin Works"
      }...
    ]

    Typically, subqueries may refer to any variables and aliases available in the scope to build correlated subqueries and to perform subqueries specific to some context of outer query. See Correlated Subqueries for more details and Examples.

    FROM clause in Subqueries

    Keyspace Identifier versus Expression

    As described in the FROM clause, the from-term can be a keyspace name or identifier or a N1QL expression:

    • Keyspace identifiers are independent sources of data for a query.

    • N1QL expressions can be constructed using various N1QL language constructs including subqueries.

      • Constant expressions are independent sources of data for a query.

      • Variable expressions depend on variables in scope and are evaluated to resolve as input data for the query. These are applicable to subqueries.

      • This applies irrespective of whether it is a simple identifier such as alias, var or a nested path identifier such as keyspace.subdoc1.subdoc2.field, alias.subdoc.field, or var.subdoc.field.

    An expression can be a simple identifier or variable such as alias, var or more complex with various N1QL language constructs. Either way, N1QL evaluates from-term to resolve to keyspace identifiers or expressions as follows:

    • A from-term is considered as an expression if it is not a keyspace name identifier.

    • If simple identifier can be considered as identifier or expression depending on various factors. An identifier var is considered as an expression if it is variable in scope defined through LET or LETTING, or explicit keyspace alias in parent queries.

    In the following equivalent queries, explicit alias t of `travel-sample`.inventory.airport or LET variable x is treated as an expression and hence a nested path like t.geo.lat is allowed in the subquery FROM clause.

    Q5A
    SELECT count(*) FROM `travel-sample`.inventory.airport t
    WHERE (SELECT RAW t.geo.alt FROM t t1)[0] > 6000 ;
    Q5B
    SELECT count(*) FROM `travel-sample`.inventory.airport t
    WHERE (SELECT RAW alt FROM t.geo.alt)[0] > 6000;
    Q5C
    SELECT count(*) FROM `travel-sample` .inventory.airport t
    LET x = t.geo
    WHERE (SELECT RAW y.alt FROM x y)[0] > 6000;
    Q5D
    SELECT count(*) FROM `travel-sample`.inventory.airport t
    WHERE (SELECT RAW geo.alt FROM t.geo)[0] > 6000;

    Implicit Alias

    When explicit alias is not defined, every identifier will have an implicit alias predefined with the same name as the identifier. Implicit alias of a nested path is defined as the last component in the path. In above example Q5D, the implicit alias of the nested path t.geo in subquery is geo, and in example Q5B the implicit alias of t.geo.alt is alt.

    For example, the following example Q6 has no explicit alias for `travel-sample`. So the `travel-sample`.inventory.airport used in the subquery FROM clause is considered a keyspace identifier, but not an expression. That makes the subquery non-correlated by the FROM clause, and the subquery returns all documents from keyspace `travel-sample`.inventory.airport.

    Q6
    SELECT array_length((SELECT RAW t1.geo.alt
                         FROM `travel-sample`.inventory.airport t1))
    FROM `travel-sample`.inventory.airport LIMIT 4;
    Results
    [
      {
        "$1": 1968
      },
      ...
    ]

    Contrast Q6 with Q6A below, the subquery is correlated by using the keyspace alias in the FROM clause. The result is only 1 because the subquery is applicable to only the current document t being processed in the parent query.

    Q6A
    SELECT array_length((SELECT RAW t1.geo.alt FROM t t1))
    FROM `travel-sample`.inventory.airport t;
    Results
    [
      {
        "$1": 1
      },
      ...
    ]

    Further, the subquery is required to alias its from-term to avoid conflict with the same identifier in both outer and inner queries. For example, the following example Q6B shows an error:

    Q6B
    SELECT array_length((SELECT RAW t1.geo.alt
                         FROM `travel-sample`.inventory.airport))
    FROM `travel-sample`.inventory.airport;
    Results
    [
      {
        "code": 4020,
        "msg": "Duplicate subquery alias airport",
        "query": "SELECT array_length((SELECT RAW t1.geo.alt\nFROM `travel-sample`.inventory.airport))\nFROM `travel-sample`.inventory.airport;"
      }
    ]

    An implicit keyspace alias is not considered as an expression, as in the above example Q6. However, the nested paths are expressions. In the following example Q6C, the FROM term has the nested path airport.geo as an expression, where airport refers to the implicit alias of the keyspace in the parent query. Hence this is a correlated subquery, and the result is 1 corresponds to the current document in the parent query.

    Q6C
    SELECT array_length((SELECT RAW t1.alt
                         FROM airport.geo t1))
    FROM `travel-sample`.inventory.airport
    LIMIT 1;
    Results
    [
      {
        "$1": 1
      }
    ]

    If the expression does not resolve to any of the variables in scope for the subquery, then that is treated as keyspace identifier and subsequently if the keyspace is not found, an error is raised. For example, in the following query `travel-sample`.inventory.hotel is not defined in the scope and is treated as a new keyspace identifier.

    Q7
    SELECT * FROM `travel-sample`.inventory.landmark t1
    WHERE t1.city IN (SELECT RAW city
                      FROM `travel-sample`.inventory.hotel
                      LIMIT 3
                      );
    Results
    [
      {
        "t1": {
          "activity": "see",
          "address": "Prince Arthur Road, ME4 4UG",
          "alt": null,
          "city": "Gillingham",
          "content": "Adult - £6.99 for an Adult ticket that allows you to come back for further visits within a year (children's and concessionary tickets also available). Museum on military engineering and the history of the British Empire. A quite extensive collection that takes about half a day to see. Of most interest to fans of British and military history or civil engineering. The outside collection of tank mounted bridges etc can be seen for free. There is also an extensive series of themed special event weekends, admission to which is included in the cost of the annual ticket.",
          "country": "United Kingdom",
          "directions": null,
          "email": null,
          "geo": {
            "accuracy": "RANGE_INTERPOLATED",
            "lat": 51.39184,
            "lon": 0.53616
          },
          "hours": "Tues - Fri 9.00am to 5.00pm, Sat - Sun 11.30am - 5.00pm",
          "id": 10019,
          "image": null,
          "name": "Royal Engineers Museum",
          "phone": "+44 1634 822839",
          "price": null,
          "state": null,
          "title": "Gillingham (Kent)",
          "tollfree": null,
          "type": "landmark",
          "url": "http://www.remuseum.org.uk"
        }
      }...
    ]

    Nested Path Expressions in Subqueries

    As mentioned in the FROM clause, the from-term of both parent and subqueries allow nested path expressions over constants and subqueries. However, only subqueries allow variable expressions (including paths), that are referenced through any variables defined in scope of the subquery. This is very powerful for language expressibility, simplicity and flexibility to N1QL queries. Especially, when combined with subqueries, nested path expressions over variables extend full power of N1QL syntax to array attributes/sub-documents without losing the structure of the array elements in results, or requiring tricky processing (with UNNEST, GROUP BY, ORDER BY and so). See the examples below.

    The usage of nested paths over keyspace identifiers is NOT allowed in the from-terms, and it results in a syntax error. Nested paths are always considered expressions in N1QL.

    Examples

    Example 1.

    The following query is valid because the nested path in the subquery is based on the explicit alias variable k1 in scope.

    SELECT * FROM keyspace1 AS  k1
    WHERE (SELECT … FROM k1.subdoc1.subdoc2.field3 …);
    Example 2.

    The following query is invalid and raises an error because the nested path is in the outermost query:

    SELECT * FROM keyspace1.subdoc1.subdoc2.field3 … ;

    The subquery is based on the keyspace identifiers. Note that, the outer query has explicit alias defined, and hence the keyspace1/keyspace2 in subquery from-term are treated as identifiers.

    SELECT * FROM keyspace1 AS  k1
    WHERE (SELECT … FROM keyspace1.subdoc1.field3 …);
    
    SELECT * FROM keyspace1 AS  k1
    WHERE (SELECT … FROM keyspace2.subdoc2 …);
    Example 3.

    The following example shows usage of nested path over subquery expression.

    SELECT x.alt
    FROM (SELECT geo from `travel-sample`.inventory.airport
          )[*].geo AS x
    LIMIT 2;
    Results
    [
      {
        "alt": 12
      },
      {
        "alt": 295
      }
    ]
    Example 4.

    The following example shows usage of nested path over constant expression.

    SELECT x
    FROM [{"a" : 1, "b" : {"c" : 2}},
          {"a" : 3, "b" : {"d" : 4}}][*].b AS x
    LIMIT 2;
    Results
    [
      {
        "x": {
          "c": 2
        }
      },
      {
        "x": {
          "d": 4
        }
      }
    ]
    Example 5.

    The following two queries show valid and invalid examples with the airport collection. Note the nested paths used in the FROM clause of the subquery.

    Q8: Find airports that are at altitudes more than 4000ft
    SELECT t1.city, t1.geo.alt
    FROM `travel-sample`.inventory.airport t1
    WHERE (SELECT RAW t2.alt
          FROM airport.geo t2)[0] > 4000;
    Results
    [
      {
        "code": 3000,
        "msg": "Ambiguous reference to field travel-sample.",
        "query_from_user": "SELECT t1.city, t1.geo.alt\nFROM `travel-sample` t1\nWHERE t1.type = \"airport\" AND \n(SELECT RAW t2.alt \n FROM `travel-sample`.geo t2)[0] > 4000;"
      }
    ]
    Q8A: Q8 rewritten to use nested path using outer query variable t1
    SELECT t1.city, t1.geo.alt
    FROM `travel-sample`.inventory.airport t1
    WHERE (SELECT RAW t2.alt FROM t1.geo t2)[0] > 4000;
    Results
    [
      {
        "alt": 6537,
        "city": "Grants"
      },
      {
        "alt": 5045,
        "city": "Prescott"
      }, ...
    ]
    Example 6.

    The following query demonstrates the power of using nested path expressions in correlated subqueries over the array subdocument hotel.reviews. The query Q9 finds the top 10 hotels and number of reviewers which have Overall rating at least 4 and rated by minimum 6 people.

    Q9
    SELECT name, cnt_reviewers
    FROM `travel-sample`.inventory.hotel AS t
    LET cnt_reviewers = (SELECT raw count(*)
                         FROM t.reviews AS s
                         WHERE s.ratings.Overall >= 4)[0]
    WHERE cnt_reviewers >= 6
    ORDER BY cnt_reviewers DESC
    LIMIT 10;
    Results
    [
      {
        "cnt_reviewers": 9,
        "name": "Holiday Inn London Kensington Forum"
      },
      {
        "cnt_reviewers": 9,
        "name": "Campanile"
      },
      {
        "cnt_reviewers": 9,
        "name": "Drop in Chalets"
      },
      {
        "cnt_reviewers": 9,
        "name": "Cadogan Hotel"
      },
      {
        "cnt_reviewers": 9,
        "name": "Negresco"
      },
      {
        "cnt_reviewers": 9,
        "name": "Suites at Fisherman's Wharf"
      },
      {
        "cnt_reviewers": 9,
        "name": "Wyndham Parc 55 Hotel"
      },
      {
        "cnt_reviewers": 8,
        "name": "Lochmaddy Hotel"
      },
      {
        "cnt_reviewers": 8,
        "name": "Kensington West"
      },
      {
        "cnt_reviewers": 8,
        "name": "Ibis Hotel Stratford"
      }
    ]
    Q9A
    SELECT name, cnt_reviewers
    FROM  `travel-sample`.inventory.hotel AS t
    LET cnt_reviewers = (SELECT raw count(*)
                         FROM `travel-sample`.inventory.hotel tmp
                         USE KEYS meta(t).id
                         UNNEST tmp.reviews s
                         WHERE s.ratings.Overall >= 4)[0]
    WHERE cnt_reviewers >= 6
    ORDER BY cnt_reviewers DESC
    LIMIT 10;

    The above query Q9A is an equivalent of query Q9 that does not use nested paths in the subquery FROM clause. Therefore, it requires a USE KEYS clause on the same document as in the outer query, that is meta(t).id where t refers to the outer query document. Nested paths cannot be used in the FROM clause and hence UNNEST is used on nested structures.