Querying with SQL++

    You can query for documents in Couchbase using the SQL++ query language, a language based on SQL, but designed for structured and flexible JSON documents.

    On this page we dive straight into using the Query Service API from the Python Columnar SDK. For a deeper look at the concepts, to help you better understand the Query Service, and the SQL++ language, see the links in the Further Information section at the end of this page.

    Here we show queries against the Travel Sample collection, at cluster and scope level, and give links to information on adding other collections to your data.

    Before You Start

    This page assumes that you have installed the Python Columnar SDK, added your IP address to the allowlist, and created a Columnar cluster.

    Create a collection to work upon by importing the travel-sample dataset into your cluster.

    Querying Your Dataset

    Most queries return more than one result, and you want to iterate over the results:

    Scope Level Queries

    • Sync API

    • Async API

    scope = cluster.database('travel-sample').scope('inventory')
    
    query = """
            SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
            FROM route
            GROUP BY airline
            ORDER BY route_count DESC
            """
    
    res = scope.execute_query(query)
    
    print('Rows:')
    for row in res.rows():
        print(row)
    
    print(f'\nMetadata: {res.metadata()}')
    scope = cluster.database('travel-sample').scope('inventory')
    
    query = """
            SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
            FROM route
            GROUP BY airline
            ORDER BY route_count DESC
            """
    
    res = await scope.execute_query(query)
    
    print('Rows:')
    async for row in res.rows():
        print(row)
    
    print(f'\nMetadata: {res.metadata()}')

    Cluster Level Queries

    • Sync API

    • Async API

    query = """
            SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
            FROM `travel-sample`.inventory.route
            GROUP BY airline
            ORDER BY route_count DESC
            """
    
    res = cluster.execute_query(query)
    query = """
            SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
            FROM `travel-sample`.inventory.route
            GROUP BY airline
            ORDER BY route_count DESC
            """
    
    res = await cluster.execute_query(query)

    Positional and Named Parameters

    Supplying parameters as individual arguments to the query allows the query engine to optimize the parsing and planning of the query. You can either supply these parameters by name or by position.

    Positional Parameters

    Execute a query with positional arguments:

    • Sync API

    • Async API

    from couchbase_columnar.options import QueryOptions
    
    query = """
            SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
            FROM route
            WHERE sourceairport=$1 AND distance>=$2
            GROUP BY airline
            ORDER BY route_count DESC
            """
    
    res = scope.execute_query(query, QueryOptions(positional_parameters=['SFO', 1000]))
    from acouchbase_columnar.options import QueryOptions
    
    query = """
            SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
            FROM route
            WHERE sourceairport=$1 AND distance>=$2
            GROUP BY airline
            ORDER BY route_count DESC
            """
    
    res = await scope.execute_query(query, QueryOptions(positional_parameters=['SFO', 1000]))

    Named Parameters

    Execute a query with named arguments:

    • Sync API

    • Async API

    query = """
            SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
            FROM route
            WHERE sourceairport=$source_airport AND distance>=$min_distance
            GROUP BY airline
            ORDER BY route_count DESC
            """
    
    res = scope.execute_query(query, QueryOptions(named_parameters={'source_airport': 'SFO', 'min_distance': 1000}))
    query = """
            SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
            FROM route
            WHERE sourceairport=$source_airport AND distance>=$min_distance
            GROUP BY airline
            ORDER BY route_count DESC
            """
    
    res = await scope.execute_query(query, QueryOptions(named_parameters={'source_airport': 'SFO', 'min_distance': 1000}))

    Using the Query Result

    Results from the Couchbase Columnar SDK can easily be used with several common Data Analytics Python libraries, including Pandas and PyArrow.

    Importing the result to a pandas DataFrame.
    import pandas as pd
    
    res = scope.execute_query(query)
    df = pd.DataFrame.from_records(res.rows(), index='airline')
    
    print(df.head())
    #          route_count  avg_route_distance
    # airline
    # AA              2354         2314.884359
    # UA              2180         2350.365407
    # DL              1981         2350.494112
    # US              1960         2101.417609
    # WN              1146         1397.736500
    Importing the query result to a PyArrow table.
    import pyarrow as pa
    
    res = scope.execute_query(query)
    table = pa.Table.from_pylist(res.get_all_rows())
    
    print(table.to_string())
    # pyarrow.Table
    # route_count: int64
    # avg_route_distance: double
    # airline: string

    Further Information

    The SQL++ for Analytics Reference offers a complete guide to the SQL++ language for both of our analytics services, including all of the latest additions.