CURL Function
- reference
The CURL() function implements a subset of cURL functionality and enables SQL++ queries to interact and integrate with external JSON data sources available over HTTP/REST.
This allows federated queries against external data sources, such as the Google geocoding API, Yahoo API, or other Couchbase clusters.
The federated queries can use full querying capabilities of SQL++, including functions, expressions, subqueries, JOINs, NESTs, UNNESTs, and so on.
You can use the CURL() function in various SQL++ expressions and clauses of DML statements such as projection, WHERE predicates, FROM data source, and more.
For example, you can use it with an INSERT-SELECT statement to import external data into Couchbase Server.
Syntax
curl-function ::= 'CURL' '(' url (',' options)? ')'
Arguments
- url
-
A string representing the URL of the data source. The URL needs to point to a JSON endpoint and must be either
http://orhttps://only. No other protocol is supported. The redirection of URLs is not allowed. - options
-
An optional JSON object representing various supported options. This includes options and parameters to be sent to the URL source endpoint.
Options
| Option | Description | Value | ||
|---|---|---|---|---|
user |
Server user name and password, in the form |
String |
||
basic |
Use HTTP Basic Authentication. |
Boolean |
||
insecure |
Allow connections to SSL sites without certificates (H). |
Boolean |
||
anyauth |
CURL to figure out authentication method by itself, and use the most secure one.
|
Boolean |
||
cacert |
The CA signed certificate filename. Certificates should be stored on the local machine, on each query node within a cluster.
For example, this is the certificate |
String |
||
cert |
The chain certificate filename. Certificates should be stored on the local machine, on each query node within a cluster.
Example: |
String |
||
key |
The client key filename. Key files should be stored on the local machine, on each query node within a cluster. If the key file is encrypted, it should use PKS8 v2, and you must supply the passphrase to decrypt it.
Example: |
String |
||
passphrase |
The passphrase to decrypt the key file. Owing to its sensitive nature, you’re recommended to use a named parameter for this option. The parameter name should start and end with an underscore Example: |
String |
| Option | Description | Value |
|---|---|---|
get |
If true, perform a GET request. Example: |
Boolean |
request |
Sets the request method. This only accepts GET or POST requests. This is case sensitive. Example: |
String |
connect-timeout |
The maximum time allowed for connection in seconds. Example: |
Integer |
max-time |
The maximum time allowed for the transfer in seconds. Example: |
Integer |
data |
POST data to be sent to the HTTP/REST service. The string data should be formatted exactly same as HTTP POST data. Examples:
|
String or array of strings |
header |
Passes custom header line to the server. To send the user-agent string to the server, add the value to this parameter. Examples:
|
String or array of strings |
show-error |
If true, display error message. Example: |
Boolean |
silent |
If true, use silent mode. Example: |
Boolean |
keepalive-time |
The number of seconds to wait between the Example: |
Integer |
data-urlencode |
Like the For example, |
String or array of strings |
Return Value
The CURL() function returns the response from the HTTP/REST service at the specified URL as one of the following:
-
A single JSON object
-
An array of JSON objects
The function returns the data as is, without any additional processing.
If you include the query parameter pretty=true, the function returns the data in a formatted, readable JSON structure.
Responses that are not in a well-defined JSON format can result in errors or undefined behavior.
For example, if the URL returns any other format other than JSON, the function returns Not a JSON endpoint error.
The function can return errors in different formats such as HTML, XML, plain strings, or a large blob.
Security
The CURL() function can connect to any REST end point accessible to the Query node, inside or outside the firewall.
To avoid security vulnerabilities, the CURL() function includes multiple security measures and options.
You can use them to control and minimize the risks associated with using the function.
For a complete list of security options, see Table 1.
Additionally, a Full Administrator can define a list of URLs and REST endpoints that the CURL() function can access.
The CURL() function can access only URLs that match a specified prefix, meaning only URLs specified on the list or their prefixes are accessible.
Consider a use case where a deployment of Couchbase Server and Mobile Sync Gateway exist on the same machine, and the user has access to the Query Service but not the Sync Gateway Admin endpoint.
If the Full Administrator grants the user the QUERY_EXTERNAL_ACCESS role, the user can write queries using the CURL() function and potentially access the Sync Gateway Admin endpoint.
To avoid this, the Full Administrator can create an access list for CURL() and add the Sync Gateway Admin endpoint to the disallowed_urls list.
For more information on creating the access list and the structure of the access list file, see Creating an Access List for CURL().
The following security measures help control risks when using the CURL() function:
-
Enable the
CURL()function only for the Full Administrator role. -
For all other users, the RBAC role
QUERY_EXTERNAL_ACCESSis required to run theCURL()function. Only the Full Administrator can assign theQUERY_EXTERNAL_ACCESSrole to other users. -
An access list of URLs and REST points must be configured on the query nodes before being able to use the
CURL()function. See Creating an Access List for CURL() for details. -
Each query node in the cluster must define its access list file.
-
The
CURL()function internally supports a specific set of SSL ciphers (MEDIUM or HIGH). This is dependent on theCOUCHBASE_SSL_CIPHER_LIST. -
The
CURL()function runs on the query node within a cluster. In order to identify a request from SQL++'sCURL()function, a custom header is always set for all requests. The custom header format is:"X-N1QL-User-Agent: couchbase/n1ql/<n1ql-version>". External clients that wish to disallow SQL++CURL()from accessing their REST API endpoints can explicitly check for the above header and block it. The general external endpoints will ignore this header value. -
A
user-agentis always set by default. This can be reset using the-user-agentoption. The value set by default iscouchbase/n1ql/<n1ql-version>.The current <n1ql-version> is "1.7.0-N1QL".
Creating an Access List for CURL()
An access list allows a Full Administrator to list out the permitted REST endpoints and URLs for the CURL() function.
To enable access based on the access list, a Full Administrator must create the file containing the access list, which can be created two ways:
-
From the Query Workbench UI in the section.
-
From CBQ via a cURL command.
From the Query Workbench
To create an access list for the CURL() function from the Query Workbench:
-
Go to the tab.
-
Under CURL() Function Access, select one of the following options:
-
Restricted:: Access applies only the sites explicitly listed.
-
Unrestricted:: Access applies to all sites within the explicitly listed sites.
-
-
Under the Allowed CURL URLs and Disallowed CURL URLs headings, enter your allowed or disallowed URL in the appropriate text field.
-
Press Enter or click anywhere outside the text field to save the URL.
-
To add another URL, click +.
-
To remove a URL, click -.
From the Query Settings REST API
You can send a call to the Query Settings REST API to allow or disallow specific URLs, for example:
curl -v -X POST -u Administrator:password \
http://localhost:8091/settings/querySettings/curlWhitelist \
-d '{"all_access": false,
"allowed_urls": ["https://company1.com"],
"disallowed_urls": ["https://company2.com"]}'
The access list file command structure is described in the following table.
| Property | Schema | |
|---|---|---|
all_access |
Defines whether the user has access to all URLs, or only URLs specified by the access list. This field set must be set to Setting this field to |
Boolean |
allowed_urls |
An array of strings, each of which is a URL to which you wish to grant access. Each URL is a prefix match. The CURL() function will allow any URL that starts with this value. For example, if you wish to allow access to all Google APIs, add the URL Each URL must include the port, protocol, and all other components of the URL. |
String array |
disallowed_urls |
An array of strings, each of which is a URL that will be restricted for all roles. Each URL is a prefix match. The CURL() function will disallow any URL that starts with this value. If both Each URL must include the port, protocol, and all other components of the URL. |
String array |
Design Considerations
When using CURL, consider the following:
-
The URL must point to a JSON endpoint. The redirection of URL is not allowed.
-
Only HTTP and HTTPS protocols are supported. You cannot access files on the local filesystem.
-
The amount of memory used for the CURL result is controlled using the
result-capoption. The default is 20MB.
Examples
-
Use Google Maps API to convert static address into coordinates
-
Use Yahoo Finance API in a WHERE clause to find a stock’s lowest value for the day
-
Turn off access to all endpoints and clear the Allowed and Disallowed lists
-
Turn off access to all endpoints but make no changes to the Allowed and Disallowed lists
-
Turn off access to all endpoints, allow one URL, and clear the Disallowed list
-
Turn off access to all endpoints, disallow one URL, and clear the Allowed list
The following examples are using CURL in the query projection list.
The following SQL++ query and curl command fetch details about the address "Half Moon Bay" using the Google maps API.
The Geocoding API from Google Maps allows you to convert static addresses into coordinates.
For more information, see the Geocoding API Developer Guide.
curl https://maps.googleapis.com/maps/api/geocode/json?address=Half+Moon+Bay
SELECT CURL("https://maps.googleapis.com/maps/api/geocode/json",
{"data":"address=Half+Moon+Bay" , "request":"GET"} );
[
{
"$1": {
"results": [
{
"address_components": [
{
"long_name": "Half Moon Bay",
"short_name": "Half Moon Bay",
"types": [
"locality",
"political"
]
},
{
"long_name": "San Mateo County",
"short_name": "San Mateo County",
"types": [
"administrative_area_level_2",
"political"
]
},
{
"long_name": "California",
"short_name": "CA",
"types": [
"administrative_area_level_1",
"political"
]
},
{
"long_name": "United States",
"short_name": "US",
"types": [
"country",
"political"
]
}
],
"formatted_address": "Half Moon Bay, CA, USA",
"geometry": {
"bounds": {
"northeast": {
"lat": 37.5226389,
"lng": -122.4165183
},
"southwest": {
"lat": 37.4249286,
"lng": -122.4778879
}
},
"location": {
"lat": 37.4635519,
"lng": -122.4285862
},
"location_type": "APPROXIMATE",
"viewport": {
"northeast": {
"lat": 37.5226389,
"lng": -122.4165183
},
"southwest": {
"lat": 37.4249286,
"lng": -122.4774494
}
}
},
"place_id": "ChIJC8sZCqULj4ARVJvnNcic_V4",
"types": [
"locality",
"political"
]
}
],
"status": "OK"
}
}
]
This is similar to Example 1, but following SQL++ query fetches details about Santa Cruz in Spain using the Google geocoding API and extracts the geometry field from the result.
This query retrieves the address and geographic location bounds of the address, Santa Cruz, ES, by using the address and components parameters from the Geocoding API.
The data option specifies the HTTP POST data.
curl https://maps.googleapis.com/maps/api/geocode/json?address=santa+cruz&components=country:ES
SELECT CURL("https://maps.googleapis.com/maps/api/geocode/json",
{"data":["address=santa+cruz","components=country:ES"],"get":true});
[
{
"$1": {
"results": [
{
"address_components": [
{
"long_name": "Santa Cruz de Tenerife",
"short_name": "Santa Cruz de Tenerife",
"types": [
"locality",
"political"
]
},
{
"long_name": "Santa Cruz de Tenerife",
"short_name": "TF",
"types": [
"administrative_area_level_2",
"political"
]
},
{
"long_name": "Canary Islands",
"short_name": "CN",
"types": [
"administrative_area_level_1",
"political"
]
},
{
"long_name": "Spain",
"short_name": "ES",
"types": [
"country",
"political"
]
}
],
"formatted_address": "Santa Cruz de Tenerife, Spain",
"geometry": {
"bounds": {
"northeast": {
"lat": 28.487616,
"lng": -16.2356646
},
"southwest": {
"lat": 28.4280248,
"lng": -16.3370045
}
},
"location": {
"lat": 28.4636296,
"lng": -16.2518467
},
"location_type": "APPROXIMATE",
"viewport": {
"northeast": {
"lat": 28.487616,
"lng": -16.2356646
},
"southwest": {
"lat": 28.4280248,
"lng": -16.3370045
}
}
},
"place_id": "ChIJcUElzOzMQQwRLuV30nMUEUM",
"types": [
"locality",
"political"
]
}
],
"status": "OK"
}
}
]
This SQL++ query shows how to JOIN two keyspaces on different Couchbase clusters. The JOIN is the same as is explained in JOIN Clause, but the left and right side keyspaces are in two different Couchbase clusters.
-
The left side keyspace
routeis from the cluster running onhostname. If you do not have a second cluster running, you should substitute thehostnamewith 127.0.0.1 or the IP-address of the local cluster. -
The right side keyspace
airlineis from the local cluster.
For this example, set the query context to the inventory scope in the travel sample dataset.
For more information, see Query Context.
SELECT DISTINCT airline.name, airline.callsign,
route.destinationairport, route.stops, route.airline
FROM CURL("http://localhost:8093/query/service",
{"data": "statement=SELECT * FROM route t
WHERE t.sourceairport = 'SFO'
& query_context=travel-sample.inventory",
"user": "Administrator:password"}).results[*].t route
JOIN airline
ON KEYS route.airlineid
LIMIT 4;
The results from the CURL() output are embedded in the results[] array under the keyspace alias t used in the remote query.
To extract the result documents for the left side keyspace of the JOIN, use the expression
CURL(...).results[*].t and alias it as route.
RBAC credentials are required when CURL() accesses Couchbase Server version 5.0 or later.
[
{
"airline": "B6",
"callsign": "JETBLUE",
"destinationairport": "AUS",
"name": "JetBlue Airways",
"stops": 0
}, // …
]
The following example shows how to use the CURL() function to include a full-text search from the Search Service in a SQL++ query.
If the FTS index fts_travel is created on the default collection in the default scope of the travel-sample dataset, running the following query finds all documents that contain "sanfrancisco".
In Couchbase Server 6.5 and later, you can use Search functions to use a full-text search in a query, as long as the Search Service is available on the cluster.
However, to access the Search Service on another cluster, you must use the CURL() function.
SELECT result.hits[*].id
FROM CURL("http://Administrator:password@127.0.0.1:8094/api/index/fts_index/query",
{"header":"Content-Type: application/json",
"request" : "POST",
"data":'{"explain":false,"fields": ["*"],"highlight": {},
"query": {"query": "san fran isco"}}' }) result;
[
{
"id": [
"hotel_25509",
"hotel_25508",
"hotel_26139",
"hotel_25587",
"hotel_25503",
"hotel_25667",
"hotel_25502",
"hotel_25597",
"hotel_26493",
"hotel_25670"
]
}
]
The following example uses the CURL() function with a WHERE clause.
It uses the Yahoo finance API to find the day’s low value (i.e DaysLow) of HDP stock and finds all the documents in the users keyspace that have min_threshold attribute value greater than the DaysLow stock value.
For this example, unset the query context. For more information, see Query Context.
Insert the following documents, representing customers and their minimum thresholds, into the default collection in the default scope of the travel sample data, and then run the SELECT query:
INSERT INTO `travel-sample` (KEY, VALUE)
VALUES
("k1",
{"custID" : 12345, "min_threshold" : 4}),
("k2",
{"custID" : 44444, "min_threshold" : 12});
SELECT min_threshold,
meta().id,
to_number(hdp_low) hdp_low
FROM `travel-sample`
USE KEYS ["k1", "k2"]
LET hdp_low = curl("https://query.yahooapis.com/v1/public/yql", {"data":"q=select%20*%20from%20yahoo.finance.quotes%20where%20symbol%20in%20(%22HDP%22)&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback="}).query.results.quote.DaysLow
WHERE to_number(hdp_low) < min_threshold;
[
{
"hdp_low": 9.48,
"id": "k2",
"min_threshold": 12
}
]
curl -X POST -u Administrator:password \
-d '{
"all_access": true,
"allowed_urls" : ["company1.com", "couchbase.com"],
"disallowed_urls" : ["company2.com"]
}' http://localhost:8091/settings/querySettings/curlWhitelist
curl -X POST -u Administrator:password \
-d '{
"all_access": true
}' http://localhost:8091/settings/querySettings/curlWhitelist
curl -X POST -u Administrator:password \
-d '{
"all_access": false,
"allowed_urls" : [],
"disallowed_urls" : []
}' http://localhost:8091/settings/querySettings/curlWhitelist
curl -X POST -u Administrator:password \
-d '{
"all_access": false
}' http://localhost:8091/settings/querySettings/curlWhitelist
curl -X POST -u Administrator:password \
-d '{
"all_access": false,
"allowed_urls" : ["https://maps.googleapis.com/maps/api/geocode/json"],
"disallowed_urls" : []
}' http://localhost:8091/settings/querySettings/curlWhitelist
curl -X POST -u Administrator:password \
-d '{
"all_access": false,
"disallowed_urls" : ["https://maps.googleapis.com/maps/api/geocode/json"],
"allowed_urls" : []
}' http://localhost:8091/settings/querySettings/curlWhitelist
curl -X POST -u Administrator:password \
-d '{
"all_access": false,
"disallowed_urls" : ["https://maps.googleapis.com/maps/api/geocode/json"],
"allowed_urls" : ["http://127.0.0.1:9499/query/service"]
}' http://localhost:8091/settings/querySettings/curlWhitelist
curl -X POST -u Administrator:password \
-d '{
"all_access": false,
"disallowed_urls" : ["https://maps.googleapis.com/maps/api/geocode/json"],
"allowed_urls" : ["https://maps.googleapis.com/maps/api/geocode/json"]
}' http://localhost:8091/settings/querySettings/curlWhitelist
SELECT CURL(b.url, $params) FROM keyspace b WHERE b.username = "joe";
To execute the query in Node.js, use:
keyspace.query(SELECT CURL(b.url, $params) FROM keyspace b WHERE b.username = "joe", { params: { data: "..." } },
(error, result) => {} );
Because $params is a named parameter, you must define it within the parameters object when executing the query.
You can populate these properties dynamically using data from your documents.