A newer version of this documentation is available.

View Latest

Querying with N1QL

You can use the N1QL query language to search and query documents from your applications.

Prerequisites for querying

To use N1QL queries, your cluster must have at least one node with the query and index services enabled.

Executing queries

Couchbase queries may be issued by one of the following mechanisms:

  • Couchbase SDK.

  • cbq shell.

  • cbc command-line (via cbc n1ql).

  • Query REST API.

  • Query workbench [Developer Preview]

As a N1QL query is essentially a string parsed by the query server, and as the result for each query is essentially JSON, queries will function the same regardless of whether they are executed using the cbq shell, an SDK, or using the REST API directly. Nevertheless, the result format received using an SDK may be a bit different than that received using cbq or the REST API.

To demonstrate the query language and functionality, we will be querying the travel-sample bucket which can be installed during installation. If you didn’t install it during install-time, you can install it using the Couchbase Web Console. See Install Sample Buckets for details.

When issuing N1QL queries using an SDK, ensure the SDK is connected to the bucket you are issuing queries against.

Running cbq

cbq is an interactive shell for executing N1QL queries. It is shipped with Couchbase Server.

It is not in the system-wide path by default. To run it, invoke the full path to cbq. Copy/paste the following into your terminal:

  • Linux

    /opt/couchbase/bin/
  • Mac OS X

    "/Applications/Couchbase Server.app/Contents/Resources/couchbase-core/bin/cbq"
  • Microsoft Windows

    C:\Program Files\Couchbase\Server\bin\cbq.exe

By default, cbq tries to connect to the query service running locally on the current node. If no query service is running on the current node, a connection refused error will be returned whenever you attempt to execute a query.

If you are running cbq remotely from a node, you can specify the --engine parameter and point the cbq shell to connect to a node that has the query service. For example:

cbq "--engine=http://URL_to_node_with_query_service:8093

Querying

An index is required to satisfy the query. Everything falls back to the primary index if no other indexes satisfy the query.
The query service communicates with the indexing service and requires that the indexing service be active for a given bucket. In order to eliminate unnecessary indexing activity on buckets that are not queried via the query service, buckets are not indexed by default.

Queries are issued in the SDK using the query or n1qlQuery API (in some SDKs this is just called query). This API accepts a string which is the statement to execute, and returns the rows (or an iterable object which returns rows) representing the query’s result. Here is the above query, executed in Python:

>>> for row in cb.n1ql_query('SELECT type FROM `travel-sample` LIMIT 4'):
...   print row
...
{u'type': u'airline'}
{u'type': u'airline'}
{u'type': u'airline'}
{u'type': u'airport'}

The above query using cbq:

cbq> SELECT type FROM `travel-sample` LIMIT 4;
{
    "requestID": "3693c61a-5126-4e28-a7a4-9b3a7f4ddbf3",
    "signature": {
        "type": "json"
    },
    "results": [
        {
            "type": "airport"
        },
        {
            "type": "airport"
        },
        {
            "type": "airport"
        },
        {
            "type": "airport"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "436.801261ms",
        "executionTime": "436.607057ms",
        "resultCount": 4,
        "resultSize": 164
    }
}

The result is a JSON object. The important field within the object is the results property, which contains an array of results (a result set). Each item within the result set may be thought of as a "row". In this case, since the entire document is selected, each result contains the entire content of the document.

Couchbase SDKs provide a row-based API which returns an object which can be iterated over. Internally however, the response looks like the cbq output above.

To understand the execution plan for a query, you can use the EXPLAIN statement before the query.

Creating the primary index

The travel-sample bucket already comes with predefined indexes. As such, creating the primary index is not needed for the travel-sample bucket.

To create the primary index, issue the following query:

CREATE PRIMARY INDEX ON `travel-sample`;

Note the backticks (`) around the bucket name. Also note that the statement ends with a semicolon: all N1QL statements end with a semicolon, as is standard in SQL.

You may also create the primary index using an SDK. Here is an example in Python:

cb.n1ql_query('CREATE PRIMARY INDEX ON `travel-sample`').execute()
Creating the primary index may fail if it already exists. This is OK for now.

For more information about improving the performance of a query using indexes (non-primary), see Query optimization using covering indexes.

Specifying query criteria

Specifying criteria is done much in the same manner as in SQL: Using a WHERE clause. Multiple conditions can be specified using the AND or OR keywords:

cbq> SELECT airportname, city, country FROM `travel-sample` WHERE type="airport" AND city="Reno";
{
    "requestID": "ef72fcdd-a4b8-4cb8-9a7b-a44aefb4f24d",
    "signature": {
        "airportname": "json",
        "city": "json",
        "country": "json"
    },
    "results": [
        {
            "airportname": "Reno Tahoe Intl",
            "city": "Reno",
            "country": "United States"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "9.903854596s",
        "executionTime": "9.903509441s",
        "resultCount": 1,
        "resultSize": 124
    }
}

In the above query, we select all records that are of type airport and have a city of Reno.

Here is an example of the above query in other SDKs:

C | Python | Java | .NET | Go | node.js

Querying nested documents

The above queries showed how to query top-level fields within a document. One of the main differences between JSON and flat rows is that JSON allows a nested structure where sub-documents can reside. N1QL contains extensions to deal with nested documents.

cbq> SELECT airportname, city, country, geo.alt FROM `travel-sample` WHERE type="airport" AND geo.alt > 4000 AND country="United States" LIMIT 3;
{
    "requestID": "0fe0bb33-987f-4095-acf3-220378344a85",
    "signature": {
        "airportname": "json",
        "alt": "json",
        "city": "json",
        "country": "json"
    },
    "results": [
        {
            "airportname": "Cheyenne Rgnl Jerry Olson Fld",
            "alt": 6156,
            "city": "Cheyenne",
            "country": "United States"
        },
        {
            "airportname": "Pueblo Memorial",
            "alt": 4726,
            "city": "Pueblo",
            "country": "United States"
        },
        {
            "airportname": "Cedar City Rgnl",
            "alt": 5622,
            "city": "Cedar City",
            "country": "United States"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "2.284177667s",
        "executionTime": "2.283990592s",
        "resultCount": 3,
        "resultSize": 473
    }
}

The above query selects airports in the United States which have an altitude of 4,000 feet above sea level or greater.

Note that the syntax geo.alt is used. In N1QL, this indicates that geo is a subdocument, and that its alt field (within the sub-document) is referenced. For comparison, the actual document looks like this (with the relevant field in bold):

{
                "airportname": "Cedar City Rgnl",
                "city": "Cedar City",
                "country": "United States",
                "faa": "CDC",
                "geo": {
                    "alt": 5622,
                    "lat": 37.700967,
                    "lon": -113.098847
                },
                "icao": "KCDC",
                "id": 3824,
                "type": "airport",
                "tz": "America/Denver"
}
In SQL, the dot syntax is used to indicate a specific column from a specific table. This syntax may also be used in N1QL if the top level item (the first element of the field) refers to a bucket.

Advanced subdocument access

While N1QL allows you to select documents based on whether a given field has a given value, it can also perform powerful subdocument searching, allowing you to select documents based on more powerful criteria typically only available within full-featured programming languages.

For example:

  • A document contains any subdocument of which any (or every) child matches a given criteria.

  • All of a document’s immediate children satisfy a given criteria.

  • Any child within a document (and any of its sub-documents) contains a given value.

These searches can be performed using the ANY, EVERY, and WITHIN operators.

Aggregate functions

You can also run analysis on records within the bucket. The following query returns the number of types in the travel-sample bucket:

>>> for row in cb.n1ql_query("SELECT DISTINCT(type) FROM `travel-sample`"):
...   print row
...
{u'type': u'route'}
{u'type': u'airport'}
{u'type': u'airline'}
{u'type': u'landmark'}

Query placeholders (Dynamic parameters)

This feature is only available in the SDK and REST API. cbq currently does not support placeholders

Query placeholders allow your application to securely use dynamic query parameters for your application.

Consider a function which returns all airports located within a given city (passed as input). The query may be divided into a fixed or static part (select all airports in a city) and a dynamic part (the actual city to search, specified via user input).

A naive implementation of this function might look something like this:

def airports_in_city(city):
  query_string = "SELECT airportname FROM `travel-sample` WHERE city="
  query_string += '"' + city + '"'
  return cb.n1ql_query(query_string)

The approach above is both unwieldy and insecure (subject to SQL injection attacks).

N1QL allows the use of placeholders to declare dynamic query parameters. Here’s a version of the above using placeholders:

def airports_in_city(city):
  query_string = "SELECT airportname FROM `travel-sample` WHERE city=$1"
  query = N1QLQuery(query_string, city)
  return cb.n1ql_query(query)

The $1 is a positional placeholder. When the query is constructed, it may receive arguments, with each argument being used as the placeholder value in the query. Thus, $1 refers to the first argument,$2 to the second, and so on.

Placeholders may also be named. This is particularly useful when there are many query parameters and ensuring that they are all in the correct order may be cumbersome. Name query placeholders take the form of $name.

query_string = (
"SELECT airportname FROM `travel-sample`"
"WHERE country=$country "
"AND geo.alt > $altitude "
"AND (geo.lat BETWEEN $min_lat AND $max_lat) "
"AND (geo.lon BETWEEN $min_lon AND $max_lon "
)

Then to issue the actual query:

query = N1QLQuery(
    query_string,
    country="United States",
    altitude=500, min_lat=-50, max_lat=50, min_lon=-180, max_lon=0)

A similar approach can be used in other SDKs. The links below show using dynamic queries in our SDKs:

C | Python | Java | .NET | Go | node.js

Query optimization using prepared (optimized) statements

When a N1QL query string is sent to the server, the server will inspect the string and parse it, planning which indexes to query. Once this is done, it generates a query plan. The computation for the plan adds some additional processing time and overhead for the query.

Often-used queries can be prepared so that its plan is generated only once. Subsequent queries using the same query string will use the pre-generated plan instead, saving on the overhead and processing of the plan each time.

You can indicate to the SDK that a given query should be optimized in the above fashion. When an SDK sees that a query should be optimized, it will internally prepare the statement and store the plan in an internal cache. When issuing the query again, the SDK will check to see if a plan exists in its cache, and will send the plan to the server.

To indicate that an SDK should optimize a query, the adhoc parameter should be set to false. When a query is not ad-hoc, the SDK will fetch the plan (if it does not already have it). Do not turn off the adhoc flag for each query since only a finite number of query plans (currently 5000) can be stored in the SDK.

query = N1QLQuery("SELECT airportname FROM `travel-sample` WHERE country=$1", "USA")
q.adhoc = False
Parameterized queries are considered the same query for caching and planning purposes, even if the supplied parameters are different.

For more information on how to optimize queries using prepared statements, see PREPARE statement in N1QL language reference.

Query optimization using covering indexes

Indexes help improve the performance of a query. When an index includes the actual values of all the fields specified in the query, the index covers the query and eliminates the need to fetch the actual values from the Data Service. An index, in this case, is called a covering index and the query is called a covered query. For more information, see Covering indexes.