CURL Function

  • reference
    +

    The CURL() function implements a subset of cURL functionality and enables SQL++ queries to interact and integrate with external JSON data sources available over HTTP/REST. This allows federated queries against external data sources, such as the Google geocoding API, Yahoo API, or other Couchbase clusters. The federated queries can leverage full querying capabilities of SQL++, including functions, expressions, sub-queries, JOINs, NESTs, UNNESTs etc.

    The CURL() function can be used in various SQL++ expressions and in various clauses of the DML statements such as projection, WHERE predicates, FROM data source, and so on. For example, you can use the CURL function with the INSERT-SELECT statement to import external data into Couchbase Server.

    Syntax

    CURL(url, [options])

    Arguments

    url

    A string representing the URL of the data source. The URL needs to point to a JSON endpoint and must be either http:// or https:// only. No other protocol is supported. The redirection of URL is not allowed.

    options

    An optional JSON object representing various supported options. This includes options and parameters to be sent to the URL source endpoint.

    Options

    Table 1. Security Options
    Option Description Value

    user

    Server user name and password.

    USERNAME[:PASSWORD]

    basic

    Use HTTP Basic Authentication.

    Boolean

    insecure

    Allow connections to SSL sites without certificates (H).

    Boolean

    anyauth

    CURL to figure out authentication method by itself, and use the most secure one.

    This supports only basic authentication.

    Boolean

    cacert

    Specify CA signed certificate file name. Certificates should be stored on the local machine, on each query node within a cluster.

    • To store certificates, access /Couchbase/var/lib/couchbase/n1qlcerts. This is not visible to the user.

    • The file name cannot contain a path.

    • If the certificate is not a match to the existing contents of n1qlcerts directory, the function returns an error.

    • All expired and invalid certificates return an error.

    FILENAME

    For example, this is the certificate .pem file for AWS.

    Table 2. Transfer-related Options
    Option Description Value

    get

    The Get request for CURL.

    BOOLEAN

    Example: {"get": true}

    request

    Sets the request method. This only accepts GET or POST requests. This is case sensitive.

    String

    Example: {"request":"POST"}

    connect-timeout

    The maximum time allowed for connection in seconds.

    Integer

    Example: {"connect-timeout":30}

    max-time

    The maximum time allowed for the transfer in seconds.

    Integer

    Example: {"max-time":30}

    data

    POST data to be sent to the HTTP/REST service. The string data should be formatted exactly same as HTTP POST data.

    String or [...string, string….]

    Example: {"data":"address=Half+Moon+Bay"}

    {"data":"address=Half+Moon+Bay,california"}

    header

    Passes custom header line to the server.

    String or [...string, string….]

    Example: {"header":"Content-Type: application/json"}

    {"header":["Content-Type: application/json", "Content-Length: 115"]}

    show-error

    Displays error message.

    Boolean

    Example: {"show-error": true}

    silent

    The silent mode.

    Boolean

    Example: {"silent": false}

    keepalive-time

    The number of seconds to wait between the keepalive probes.

    Integer

    Example: {"keepalive-time":20}

    user-agent

    The value for the user-agent to send it to the server.

    String

    Example: {"header":[ "user-agent: bsmith","Content-Type: application/json"]}

    data-urlencode

    Encode the data and send it to the server.

    This is a test => this%20is%20a%20test

    String or [...string, string….]

    Return Value

    The CURL() function returns either a single JSON object, or multiple objects in an array. These objects are returned by the HTTP/REST service at the URL. Note that the result is expected to be JSON, and the function itself doesn’t do any additional processing. However, if the query parameter pretty=true, then the data received is in pretty format. If the returned data from the URL is not a well defined JSON, it may result in errors or undefined behavior. The errors returned by CURL (such as moved permanently) can be in different formats, like HTML, XML, plain strings, and can be a large blob. The SQL++ function returns a "Not a JSON endpoint" error when it returns any other format other than JSON.

    Security

    It is important to understand the potential security implications in using the CURL() function. Note that CURL() function can connect to any REST end point accessible to the Query node (where the function is executed) inside or outside the firewall. To avoid security vulnerabilities, multiple security measures have been implemented. These can be used to control and minimize the risks associated with using the CURL() function. See Table 1 for the list of security options that can be used with the CURL() function.

    In addition to the security options, a Full Administrator can also list URLs and REST endpoints that can be accessed by the CURL() function. The CURL() function can access URLs that satisfy a prefix match, which means only URLs specified on the list or the prefixes.

    Consider the following use case where a deployment of Couchbase Server and Mobile Sync Gateway exist on the same machine, and the user has access to the query service but not the Sync Gateway admin endpoint. Let’s assume that the user has been granted QUERY_EXTERNAL_ACCESS role by the Full Administrator. This means that the user can write queries using the CURL() function and potentially use CURL to access the Mobile Sync Gateway admin endpoint. To avoid this, the Full Administrator can create an access list for CURL() access and add the Admin endpoint of Sync Gateway to the disallowed_urls list.

    For more details on creating the access list and the structure of the access list file, see Creating an Access List for CURL().

    The following security measures help control risks when using the CURL() function:

    • Enable the CURL() function only for the Full Administrator role.

    • For all other users, the RBAC role QUERY_EXTERNAL_ACCESS is required to run the CURL() function. Only the Full Administrator can assign the QUERY_EXTERNAL_ACCESS role to other users.

    • An access list of URLs and REST points must be configured on the query nodes before being able to use the CURL() function. See Creating an Access List for CURL() for details.

    • Each query node in the cluster must define its access list file.

    • The CURL() function internally supports a specific set of SSL ciphers (MEDIUM or HIGH). This is dependent on the COUCHBASE_SSL_CIPHER_LIST.

    • The CURL() function runs on the query node within a cluster. In order to identify a request from SQL++'s CURL() function, a custom header is always set for all requests. The custom header format is: "X-N1QL-User-Agent: couchbase/n1ql/<n1ql-version>". External clients that wish to disallow SQL++ CURL() from accessing their REST API endpoints can explicitly check for the above header and block it. The general external endpoints will ignore this header value.

    • A user-agent is always set by default. This can be reset using the -user-agent option. The value set by default is couchbase/n1ql/<n1ql-version>.

      The current <n1ql-version> is "1.7.0-N1QL".

    Creating an Access List for CURL()

    An access list allows a Full Administrator to list out the permitted REST endpoints and URLs for the CURL() function. To enable access based on the access list, a Full Administrator must create the file containing the access list, which can be created two ways:

    1. From the Query Workbench UI in the Settings  Advanced Query Settings section.

    2. From CBQ via a cURL command.

    From the Query Workbench

    In the Query Workbench, navigate to the Settings  Advanced Query Settings section as shown below:

    The Settings Screen showing the Advanced Query Settings

    After expanding the Advanced Query Settings section, you can choose the Function Access:

    Restricted

    Access applies only the sites explicitly listed.

    Unrestricted

    Access applies to all sites within the explicitly listed sites.

    Under the Allowed CURL URLs and Disallowed CURL URLs headings, enter your allowed or disallowed URL in the appropriate textbox, and press the Enter key or click anywhere else on this screen to enter your URL.

    Click + to add another URL to the list.

    Click - to remove a URL from the list.

    From CBQ

    From a CBQ prompt, you can send a CURL() command to allow or disallow specific URLs, for example:

    #!/bin/sh
    
    # tag::request[]
    curl -v -X POST -u Administrator:password \
    http://localhost:8091/settings/querySettings/curlWhitelist \
    -d '{"all_access": false,
         "allowed_urls": ["https://company1.com"],
         "disallowed_urls": ["https://company2.com"]}'
    # end::request[]

    The access list file command structure is described in the following table.

    Table 3. Structure of Access List for CURL()
    Name Description Schema

    all_access
    required

    Defines whether the user has access to all URLs, or only URLs specified by the access list.

    This field set must be set to false to enable the allowed_urls and disallowed_urls fields.

    Setting this field to true enables access to all endpoints.
    Default : false

    boolean

    allowed_urls
    optional

    An array of strings, each of which is a URL to which you wish to grant access. Each URL is a prefix match. The CURL() function will allow any URL that starts with this value.

    For example, if you wish to allow access to all Google APIs, add the URL https://maps.googleapis.com to the array. To allow complete access to localhost, use http://localhost.

    Note that each URL must include the port, protocol, and all other components of the URL.

    < string > array

    disallowed_urls
    optional

    An array of strings, each of which is a URL that will be restricted for all roles. Each URL is a prefix match. The CURL() function will disallow any URL that starts with this value.

    If both allowed_urls and disallowed_urls fields are populated, the disallowed_urls field takes precedence over allowed_urls.

    Note that each URL must include the port, protocol, and all other components of the URL.

    < string > array

    Design Considerations

    Here are some of the design considerations to keep in mind when using CURL:

    • The URL needs to point to a JSON endpoint. The redirection of URL is not allowed.

    • Only HTTP and HTTPS protocols are supported. This means that files on the local file system cannot be accessed.

    • The amount of memory used for the CURL result is controlled using the result-cap option. The default is 20MB.

    • Any values passed to the arguments of CURL() must be static values. That means, they cannot include any references to names, aliases of documents, attributes in the documents, or any SQL++ functions or expressions that need to be evaluated.

      Consider the following example:

      SELECT CURL(b.url, { "data" : "address="||b.data })
      FROM keyspace b;

      The above example is invalid, because the first argument b.url refers to the alias b and the attribute url in the document. In the second argument, the string concatenation operator (||) cannot be evaluated.

    Examples

    The following examples are using CURL in the query projection list.

    Example 1. Use Google Maps API to convert static address into coordinates

    The following SQL++ query fetches details about the address "Half Moon Bay" using the Google maps API. The Geocoding API from Google Maps allows you to convert static addresses into coordinates. (For more information refer to the Geocoding API Developer Guide.) The corresponding curl command is also provided below.

    Request
    curl https://maps.googleapis.com/maps/api/geocode/json?address=Half+Moon+Bay
    Query
    SELECT CURL("https://maps.googleapis.com/maps/api/geocode/json",
               {"data":"address=Half+Moon+Bay" , "request":"GET"} );
    Results
    [
      {
        "$1": {
          "results": [
            {
              "address_components": [
                {
                  "long_name": "Half Moon Bay",
                  "short_name": "Half Moon Bay",
                  "types": [
                    "locality",
                    "political"
                  ]
                },
                {
                  "long_name": "San Mateo County",
                  "short_name": "San Mateo County",
                  "types": [
                    "administrative_area_level_2",
                    "political"
                  ]
                },
                {
                  "long_name": "California",
                  "short_name": "CA",
                  "types": [
                    "administrative_area_level_1",
                    "political"
                  ]
                },
                {
                  "long_name": "United States",
                  "short_name": "US",
                  "types": [
                    "country",
                    "political"
                  ]
                }
              ],
              "formatted_address": "Half Moon Bay, CA, USA",
              "geometry": {
                "bounds": {
                  "northeast": {
                    "lat": 37.5226389,
                    "lng": -122.4165183
                  },
                  "southwest": {
                    "lat": 37.4249286,
                    "lng": -122.4778879
                  }
                },
                "location": {
                  "lat": 37.4635519,
                  "lng": -122.4285862
                },
                "location_type": "APPROXIMATE",
                "viewport": {
                  "northeast": {
                    "lat": 37.5226389,
                    "lng": -122.4165183
                  },
                  "southwest": {
                    "lat": 37.4249286,
                    "lng": -122.4774494
                  }
                }
              },
              "place_id": "ChIJC8sZCqULj4ARVJvnNcic_V4",
              "types": [
                "locality",
                "political"
              ]
            }
          ],
          "status": "OK"
        }
      }
    ]
    Example 2. Use Google Maps API to extract geometry (address and geographic location bounds) of a given street address

    This is similar to Example 1, but following SQL++ query fetches details about Santa Cruz in Spain using the Google geocoding API and extracts the geometry field from the result. This query retrieves the address and geographic location bounds of the address, Santa Cruz, ES. We use the address and components parameters from the Geocoding API. The data option represents the HTTP POST data.

    Request
    curl https://maps.googleapis.com/maps/api/geocode/json?address=santa+cruz&components=country:ES
    Query
    SELECT CURL("https://maps.googleapis.com/maps/api/geocode/json",
               {"data":["address=santa+cruz","components=country:ES"],"get":true});
    Results
    [
      {
        "$1": {
          "results": [
            {
              "address_components": [
                {
                  "long_name": "Santa Cruz de Tenerife",
                  "short_name": "Santa Cruz de Tenerife",
                  "types": [
                    "locality",
                    "political"
                  ]
                },
                {
                  "long_name": "Santa Cruz de Tenerife",
                  "short_name": "TF",
                  "types": [
                    "administrative_area_level_2",
                    "political"
                  ]
                },
                {
                  "long_name": "Canary Islands",
                  "short_name": "CN",
                  "types": [
                    "administrative_area_level_1",
                    "political"
                  ]
                },
                {
                  "long_name": "Spain",
                  "short_name": "ES",
                  "types": [
                    "country",
                    "political"
                  ]
                }
              ],
              "formatted_address": "Santa Cruz de Tenerife, Spain",
              "geometry": {
                "bounds": {
                  "northeast": {
                    "lat": 28.487616,
                    "lng": -16.2356646
                  },
                  "southwest": {
                    "lat": 28.4280248,
                    "lng": -16.3370045
                  }
                },
                "location": {
                  "lat": 28.4636296,
                  "lng": -16.2518467
                },
                "location_type": "APPROXIMATE",
                "viewport": {
                  "northeast": {
                    "lat": 28.487616,
                    "lng": -16.2356646
                  },
                  "southwest": {
                    "lat": 28.4280248,
                    "lng": -16.3370045
                  }
                }
              },
              "place_id": "ChIJcUElzOzMQQwRLuV30nMUEUM",
              "types": [
                "locality",
                "political"
              ]
            }
          ],
          "status": "OK"
        }
      }
    ]
    Example 3. Join two keyspaces on different Couchbase clusters

    This SQL++ query shows how to JOIN two keyspaces on different Couchbase clusters. The JOIN is the same as is explained in the section on the JOIN Clause, but the left and right side keyspaces are in two different Couchbase clusters.

    • The left side keyspace route is from the cluster running on hostname. If you don’t have a second cluster running, you should substitute the hostname with 127.0.0.1 or the IP-address of the local cluster.

    • The right side keyspace airline is from the local cluster.

    For this example, set the query context to the inventory scope in the travel sample dataset. For more information, see Query Context.

    Query
    SELECT DISTINCT airline.name, airline.callsign,
       route.destinationairport, route.stops, route.airline
    FROM CURL("http://localhost:8093/query/service",
       {"data": "statement=SELECT * FROM route t
                           WHERE t.sourceairport = 'SFO'
               & query_context=travel-sample.inventory",
        "user": "Administrator:password"}).results[*].t route
    JOIN airline
    ON KEYS route.airlineid
    LIMIT 4;

    Note that the results from the CURL() output are embedded in the results[] array under the keyspace alias t used in the remote query. So, we extract the result documents appropriately with the expression CURL(...).results[*].t and alias it to route as the left side keyspace for the JOIN.

    RBAC credentials are required when CURL() is accessing Couchbase Server version 5.0 or later.

    Results
    [
      {
        "airline": "B6",
        "callsign": "JETBLUE",
        "destinationairport": "AUS",
        "name": "JetBlue Airways",
        "stops": 0
      }, // …
    ]
    Example 4. Full text search (FTS) in a SQL++ query

    The following example shows how to use the CURL() function to include a full text search from the Search service in a SQL++ query. Assuming the FTS index fts_travel is created on the default collection in the default scope of the travel sample dataset, running the following SQL++ query finds all documents that have "sanfrancisco" anywhere in the document.

    Note that in Couchbase Server 6.5 and later, you can use search functions to use a full text search in a SQL++ query, as long as the Search service is available on the cluster. You can still use the CURL() function if you need to access the Search service on another cluster.

    Query
    SELECT result.hits[*].id
    FROM CURL("http://Administrator:password@127.0.0.1:8094/api/index/fts_index/query",
         {"header":"Content-Type: application/json",
          "request" : "POST",
          "data":'{"explain":false,"fields": ["*"],"highlight": {},
                  "query": {"query": "san fran isco"}}' }) result;
    Results
    [
      {
        "id": [
          "hotel_25509",
          "hotel_25508",
          "hotel_26139",
          "hotel_25587",
          "hotel_25503",
          "hotel_25667",
          "hotel_25502",
          "hotel_25597",
          "hotel_26493",
          "hotel_25670"
        ]
      }
    ]
    Example 5. Use Yahoo Finance API in a WHERE clause to find a stock’s lowest value for the day

    The following example uses the CURL() function with a WHERE clause. It uses the Yahoo finance API to find the day’s low value (i.e DaysLow) of HDP stock and finds all the documents in the users keyspace that have min_threshold attribute value greater than the DaysLow stock value.

    For this example, unset the query context using the query context drop-down menu in the Query Workbench, or the query_context request-level parameter.

    Insert the following documents, representing customers and their minimum thresholds, into the default collection in the default scope of the travel sample data, and then run the SELECT query:

    Data
    INSERT INTO `travel-sample` (KEY, VALUE)
      VALUES
         ("k1",
             {"custID" : 12345, "min_threshold" : 4}),
         ("k2",
             {"custID" : 44444, "min_threshold" : 12});
    Query
    SELECT min_threshold,
           meta().id,
           to_number(hdp_low) hdp_low
    FROM `travel-sample`
    USE KEYS ["k1", "k2"]
    LET hdp_low = curl("https://query.yahooapis.com/v1/public/yql", {"data":"q=select%20*%20from%20yahoo.finance.quotes%20where%20symbol%20in%20(%22HDP%22)&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback="}).query.results.quote.DaysLow
    WHERE to_number(hdp_low) < min_threshold;
    Results
    [
      {
        "hdp_low": 9.48,
        "id": "k2",
        "min_threshold": 12
      }
    ]
    Example 6. Use CURL() to allow two URLs and disallow one URL
    Request
    curl -X POST -u Administrator:password \
    -d '{
    "all_access": true,
    "allowed_urls" : ["company1.com", "couchbase.com"],
    "disallowed_urls" : ["company2.com"]
    }' http://localhost:8091/settings/querySettings/curlWhitelist
    Example 7. Use CURL() to allow all access to all endpoints
    Request
    curl -X POST -u Administrator:password \
    -d '{
    "all_access": true
    }' http://localhost:8091/settings/querySettings/curlWhitelist
    Example 8. Use CURL() to turn off access to all endpoints and clear the Allowed and Disallowed lists
    Request
    curl -X POST -u Administrator:password \
    -d '{
    "all_access": false,
    "allowed_urls" : [],
    "disallowed_urls" : []
    }' http://localhost:8091/settings/querySettings/curlWhitelist
    Example 9. Use CURL() to turn off access to all endpoints but make no changes to the Allowed and Disallowed lists
    Request
    curl -X POST -u Administrator:password \
    -d '{
    "all_access": false
    }' http://localhost:8091/settings/querySettings/curlWhitelist
    Example 10. Use CURL() to turn off access to all endpoints, allow one URL, and clear the Disallowed list
    Request
    curl -X POST -u Administrator:password \
    -d '{
    "all_access": false,
    "allowed_urls" : ["https://maps.googleapis.com/maps/api/geocode/json"],
    "disallowed_urls" : []
    }' http://localhost:8091/settings/querySettings/curlWhitelist
    Example 11. Use CURL() to turn off access to all endpoints, disallow one URL, and clear the Allowed list
    Request
    curl -X POST -u Administrator:password \
    -d '{
    "all_access": false,
    "disallowed_urls" : ["https://maps.googleapis.com/maps/api/geocode/json"],
    "allowed_urls" : []
    }' http://localhost:8091/settings/querySettings/curlWhitelist
    Example 12. Use CURL() to allow an IP address and port instead of a website name
    Request
    curl -X POST -u Administrator:password \
    -d '{
    "all_access": false,
    "disallowed_urls" : ["https://maps.googleapis.com/maps/api/geocode/json"],
    "allowed_urls" : ["http://127.0.0.1:9499/query/service"]
    }' http://localhost:8091/settings/querySettings/curlWhitelist
    Example 13. Use CURL() to allow and disallow the same URL — and get an error
    Request
    curl -X POST -u Administrator:password \
    -d '{
    "all_access": false,
    "disallowed_urls" : ["https://maps.googleapis.com/maps/api/geocode/json"],
    "allowed_urls" : ["https://maps.googleapis.com/maps/api/geocode/json"]
    }' http://localhost:8091/settings/querySettings/curlWhitelist
    Example 14. Use CURL() with dynamic named parameters
    Query
    SELECT CURL(b.url, $params) FROM keyspace b WHERE b.username = "joe";

    If we wanted to use Node.JS, we would use:

    keyspace.query(SELECT CURL(b.url, $params) FROM keyspace b WHERE b.username = "joe", { params: { data: "..." } },
     (error, result) => {} );

    $params is a named parameter, so we name it in the parameters object when executing the query. Then we populate the properties with the data that’s in the documents since those properties can be variables.