Python Sample App Backend Tutorial

The Python SDK tutorial bridges the gap between simple and advanced concepts by walking through a complete web application.

The full source code for the tutorial is available on GitHub: http://github.com/couchbaselabs/try-cb-python. The primary focus of the tutorial is to explain the function and theory behind the Couchbase Python client and how it works together with Couchbase Server, and especially new features in versions 4.0/4.5 like N1QL, FTS and sub-document. It makes use of the travel-sample data set. The code that generates the web application is provided with the source code, but is not discussed in this tutorial.

Specific Python prerequisites and set up

In addition to the prerequisites mentioned in Sample Application, you’ll need:

To get set up for the tutorial proper, follow these steps:

  • git clone https://github.com/couchbaselabs/try-cb-python.git or download the source

  • Install necessary libraries: pip install -r requirements.txt

  • Start the application: python travel.py

  • Run the app: http://localhost:8080

  • If you don’t want to connect to Couchbase on localhost, change the configuration in travel.py (eg. CONNSTR).

  • Press CONTROL+C to stop the application when finished

This tutorial focuses on querying through N1QL and FTS rather than views. If you want information about using views, see Views.

Walking Through the API

The following sections lead you through the primary functions of the sample application. All of the REST API application code is in the travel.py file. The following shows you how to program with the various features and services of Couchbase including: connecting to a cluster and bucket, key/value iteraction, document query through N1QL and full text searches.

Configure and Bootstrap the SDK

Where: connect_db()

Goals: Connecting to the Cluster and getting a reference to a Bucket, learn to reuse it.

The first step is to let the application connect to your cluster and obtain a reference to a Bucket (the Bucket is your entry point for the whole storage API). The CONNSTR variable defines the connection string including hostname and bucket. The PASSWORD variable is used separately and gets passed to the Bucket() method as an option.

Connecting to the Cluster and Bucket

def connect_db():
    return Bucket(CONNSTR, password=PASSWORD)

db = connect_db()

The Bucket() method both connects to the Cluster and returns the connection to the requested Bucket, using the global CONNSTR and PASSWORD variables.

Both bucket and cluster can be managed through the SDK as well (eg. add views or create new buckets), see Managing Clusters using the Python SDK with Couchbase Server for more information.

Managing Users using Key/Value API

Where: class UserView()

Goals: Use Bucket operations and discover the Document API.

Couchbase Server is a document oriented database which provides access to your data both through its document ID (for high performance access), as well as through views and N1QL (as powerful query languages).

This is noticeable in the API, where the methods reflect Key/Value operations (retrieve_in, upsert, etc...) and work with a Sub Document interface that has an id() and requests specific content from the document. The default Document implementation accepts a simple representation of JSON as its content.

Creating New Users

A new user first goes through the signup() method which does a POST of username and password encoded as a JSON object.

userrec = {'username': user, 'password': password}

It uses a JWT encoded version of the password to keep things secure. The JWT token is also used through further interaction to validate the user session and is created using a custom function make_user_key().

The "user::" prefix is arbitrary to this application, this is just a convention that the app uses to obtain unique keys and have additional information in it, but the key could have been anything else (even sequence numbers or UUIDs) really.

Here comes the part where the Couchbase API is used to store the document, it’s rather simple, the application uses an upsert method which does an insert or an update, depending on whether the records exists already or not:

db.upsert(make_user_key(user), userrec)

The resulting JSON data and confirmation is sent back to the client with a success or an exception if it failed.

When it comes to storing a document, you have a choice of three methods:

  • insert() will only work if no document currently exists for the given ID, otherwise a DocumentAlreadyExistsException will be thrown.

  • replace() on the contrary will only work if the document does already exist (otherwise a DocumentDoesNotExistException is thrown).

  • upsert() will always work, replacing or creating the document as needed.

So the result in fact just contains a JWT (Json WebToken) to identify the new user. If there is a problem, an AuthenticationServiceException will be thrown and correctly translated by the controller layer. But what is this narration object in the Result?

The frontend understands this second part of the Result, the narration, as something that it can display in a console, so that users of the application can directly get an idea of what is going on on the server side while browsing the app. It is similar to a log, but sent to the frontend.

Checking login by getting the User’s document

In the login() method, the application checks a User’s credential and for that it needs to retrieve the corresponding document. Since user documents are identified by prefixing their username with user::, this is pretty simple. Using the sub-document retrieve_in() it can request only the precise values it wants from the document, in this case the password:

doc_pass = db.retrieve_in(userdockey, 'password')[0]

If that particular key doesn’t exist, the retrieve_in() method returns NotFoundError. That’s useful to check if the user exists at all.

Otherwise it’s just a matter of checking the hashed password with the one provided by the user, and responding accordingly.

A First N1QL Query: Finding Airports

Where: class Airport()

Goals: Use N1QL and the DSL to perform your first SELECT on Couchbase.

In the SDK, there is a query method that accepts all variants of querying with Couchbase (views, spatial/geo views, N1QL and FTS). For N1QL, the n1ql_query() function is used.

N1QL is a super-set of SQL, so if you’re familiar with SQL you’ll feel at ease.

Statements can be provide either in String form or using the DSL.

The findAll() method is expected to return a List (several matching rows) of Maps representing the JSON value.

The application just selects the airport name from relevant documents in the bucket. Since it filters relevant document on a criteria that depends on the input length, it just does the SELECT and FROM clauses first:queryprep = "SELECT airportname FROM `travel-sample` WHERE "

Then it can chose the correct fields to look into depending on the length of the input. The user can enter either a ICAO or FAA code or a full name of an airport to search for, so it accommodates each scenario as it builds the N1QL statement. The application also uses wildcards in the statement to give a free form expression:

if len(querystr) == 3:
    queryprep += "LOWER(faa) = $1"
    queryargs = [querystr]
elif len(querystr) == 4:
    queryprep += "LOWER(icao) = $1"
    queryargs = [querystr]
else:
    queryprep += "LOWER(airportname) LIKE $1"
    queryargs = ['%' + querystr + '%']

The statement is ready! You can execute this statement by wrapping it in a N1QLQuery() and invoking db.n1ql_query(). Here it is very simple, no placeholders and no particular tuning of the query is necessary, so the application uses the simple method: res = db.n1ql_query(N1QLQuery(queryprep, *queryargs))

The results of the query are return in a list of records, which then iterate over to build a JSON document of the results:

airportslist = [x for x in res]
context = [queryprep]

response = make_response(jsonify({"data": airportslist, "context": context}))

More Complex Queries: Finding Routes

Where: class FlightPathsView()

Goals: Let the DSL guide you into making more complex N1QL queries.

In this class, there are two more complex queries. The first aims at transforming the human-readable airport name for the departure and arrival airports to FAA codes:

SELECT faa AS fromAirport FROM `travel-sample` WHERE airportname = "Los Angeles Intl"
  UNION SELECT faa AS toAirport FROM `travel-sample` WHERE airportname = "San Francisco Intl"

The second aims at constructing the result set of available flight paths that connect the two airports:

SELECT a.name, s.flight, s.utc, r.sourceairport, r.destinationairport, r.equipment
  FROM `travel-sample` AS r
  UNNEST r.schedule AS s
  JOIN `travel-sample` AS a ON KEYS r.airlineid
  WHERE r.sourceairport = "LAX" AND r.destinationairport = "SFO" AND s.day = 6
  ORDER BY a.name ASC
Yes, you read that right, N1QL can do joins (on a single bucket or on several). It works as long as the "foreign key" described by ON KEYS clause can be mapped to a document’s key in the joined bucket.

A specificity of N1QL that can be seen in the second statement is UNNEST. It extracts a sub-JSON and puts it at the same root level as the bucket (so its possible to do joins on each element in this sub-JSON as if they were entries in a left-hand side bucket).

For this final step, try to obtain the equivalent of these statements via the DSL and see how it guides you through the possibilities of the query language.

Indexing the Data: N1QL & GSI

Goals: Use the Index DSL to make sure data is indexed for N1QL to query it.

Index management is a bit more advanced (and is already done when loading the sample), so now that you’ve learned about N1QL, you can have a look at it. There is no code example in this application, but some is presented below for your reference.

For N1QL to work, you must first ensure that at least a Primary Index has been created. For that you can use the DSL from the Index class:

db.n1ql_query('CREATE PRIMARY INDEX ON `travel-sample`).execute()

You can also create secondary indexes on specific fields of the JSON, for better performance:

db.n1ql_query('CREATE INDEX `def_username` ON `travel-sample`(username)').execute()

In this case, give a name to your index, specify the target bucket AND the field(s) in the JSON to index.

Full Text Search: Finding Hotels

Where: class HotelView()

Goals: Use FTS to search for matching Hotels. Use sub-document API to fetch the relevant data for each hit.

In this service, the hotels can be looked up using more fuzzy criteria like the content of the address or the description of a hotel, using FTS. Once there are some results, fetch only the relevant data for each result to be displayed in the UI using the sub-document API.

The findall() method accepts two parameters, location and description, which are the two possible refining criteria for a hotel search.

def findall(self, description, location):
    qp = FT.ConjunctionQuery(FT.TermQuery(term='hotel', field='type'))

A ConjunctionQuery allows you to combine multiple FTS queries into one, as an AND operation. That search always includes an exact match criteria that restricts it to the hotel data type (as reflected in the type field of the JSON document).

If the user provided a location keyword, a second component is added to the FTS query that will look for that keyword in several address-related fields of the document. That is done in an OR fashion, using a Disjunction this time:

if location != '*':
    qp.conjuncts.append(
        FT.DisjunctionQuery(
            FT.MatchPhraseQuery(location, field='country'),
            FT.MatchPhraseQuery(location, field='city'),
            FT.MatchPhraseQuery(location, field='state'),
            FT.MatchPhraseQuery(location, field='address')
        ))

Similarly, if a description keyword was provided by the user, the application looks at the freeform text description field and name field of the document:

if description != '*':
    qp.conjuncts.append(
        FT.DisjunctionQuery(
            FT.MatchPhraseQuery(description, field='description'),
            FT.MatchPhraseQuery(description, field='name')
        ))

The MatchPhraseQuery can contain several words and will search for variations of the words (eg. including plural forms or words with the same root...).

The compound FTS query is now ready to be executed. The application builds a SearchQuery object out of it, which also determines which FTS index to use ("hotels") and allows it to set various parameters (like a limit of maximum 100 hits to return). The query is logged (and kept for narration) then executed, returning a SearchQueryResult object:

q = db.search('hotels', qp, limit=100)

The FTS results are then iterated over, and the document corresponding to each result is fetched. In actuality, only the parts of the document that will be displayed in the UI are required. This is where the sub-document API comes in.

The sub-document API allows you to fetch or mutate only a set of paths inside a JSON document, without having to send the whole document back and forth. This can save network bandwidth if the document is large and the parts that the application is interested in are small. So here the results of the FTS search are iterated over and appropriate subdoc calls are triggered:

results = []
for row in q:
    subdoc = db.retrieve_in(row['id'], 'country', 'city', 'state',
                            'address', 'name', 'description')
    if subdoc['state'] == None:
        addrstate = "none"
    else:
        addrstate = subdoc['state']

    if subdoc['address'] == None:
        addr = ""
    else:
        addr = subdoc['address']

    subresults = {'name': subdoc['name'],
                    'description': subdoc['description'],
                    'address': ', '.join((
                        addr, subdoc['city'],
                        addrstate, subdoc['country']))
                    }
    results.append(subresults)

response = {'data': results}
return jsonify(response)

Each FTS hit is represented as a row dictionary with each document’s id. The application can use the sub-document API dedicated to fetching data (db.retrieve_in(documentId, fields...)) and specify what fields it wants: country, city, state, address, name and description. In the rest of the code, the address-related fields are checked for some empty values and then aggregated together and the data obtained is returned as a JSON document for the browser.