Hands-on with Remodel: a new Python ODM for RethinkDB

This week, Andrei Horak released Remodel, a new Python-based object document mapping (ODM) library for RethinkDB. Remodel simplifies RethinkDB application development by automating much of the underlying logic that comes into play when working with relations.

Remodel users create high-level model objects and rely on a set of simple class attributes to define relationships. The framework then uses the model objects to generate tables and indexes. It abstracts away the need to do manual work like performing join queries or populating relation attributes when inserting new items. Remodel also has built-in support for connection pooling, which obviates the need to create and manage connections. In this brief tutorial, I'll give you a hands-on look at Remodel and show you how to use it in a web application.

Define your models

To start using Remodel, first install the library. You can use the setup.py included in the source code or you can install it from pip by typing pip install remodel at the command line.

For the purposes of this tutorial, let's assume that we want to build a Starfleet crew roster that correlates crew members with their starships. The first step is to define the models and create the tables:

import remodel.utils
import remodel.connection
from remodel.models import Model

remodel.connection.pool.configure(db="fleet")

class Starship(Model):
    has_many = ("Crewmember",)

class Crewmember(Model):
    belongs_to = ("Starship",)

remodel.utils.create_tables()
remodel.utils.create_indexes()

In an application built with Remodel, all of the model classes must inherit remodel.models.Model. In this application, there are two models: Starship and Crewmember. The has_many and belongs_to class attributes are used to define the relationships between objects. In this case, each Starship can have many Crewmember instances and each Crewmember instance belongs to only one Starship.

The create_tables and create_indexes methods will, as the names suggest, automatically generate tables and indexes based on your defined models. Remodel pluralizes your table names, which means that the Starship model will get a starships table.

The framework instantiates a connection pool, accessible at remodel.connection.pool. You can use the pool's configure method to adjust its behavior and specify connection options, such as the desired database name, host, and port.

Populate the database

Now that the models are defined, you can populate the database with content. To create a new database record, call the create method on one of the model classes:

voyager = Starship.create(name="Voyager", category="Intrepid", registry="NCC-74656")

Remodel doesn't enforce any schemas, so you can use whatever properties you want when you create a record. The create method used above will automatically add the Voyager record to the starships table. Because the Starship model defines a has_many relationship with the Crewmember model, the voyager record comes with a crewmembers property that you can use to access the collection of crew members that are associated with the ship. Use the following code to add new crew members:

voyager["crewmembers"].add(
    Crewmember(name="Janeway", rank="Captain", species="Human"),
    Crewmember(name="Neelix", rank="Morale Officer", species="Talaxian"),
    Crewmember(name="Tuvok", rank="Lt Commander", species="Vulcan"))

The records provided to the add method are instantiated directly from the Crewmember class. You don't want to use the create method in this case because the add method called on the Voyager instance handles the actual database insertion. It will also automatically populate the relation data, adding a starship_id property to each Crewmember record.

To make the example more interesting, add a few more Starship records to the database:

enterprise = Starship.create(name="Enterprise", category="Galaxy", registry="NCC-1701-D")
enterprise["crewmembers"].add(
    Crewmember(name="Picard", rank="Captain", species="Human"),
    Crewmember(name="Data", rank="Lt Commander", species="Android"),
    Crewmember(name="Troi", rank="Counselor", species="Betazed"))

defiant = Starship.create(name="Defiant", category="Defiant", registry="NX-74205")
defiant["crewmembers"].add(
    Crewmember(name="Sisko", rank="Captain", species="Human"),
    Crewmember(name="Dax", rank="Lt Commander", species="Trill"),
    Crewmember(name="Kira", rank="Major", species="Bajoran"))

Query the database

When you want to retrieve a record, you can invoke the get method on a model class. When you call the get method, you can either provide the ID of the specific record that you want or you can provide keyword arguments that perform a query against record attributes. If you want to get a specific starship by name, for example, you can do the following:

voyager = Starship.get(name="Voyager")

You can take advantage of the relations that you defined in your models. If you want to find all of the human members of Voyager's crew, you can simply use the filter method on the crewmembers property:

voyager = Starship.get(name="Voyager")
for human in voyager["crewmembers"].filter(species="Human"):
  print human["name"]

Perform filtering on an entire table by calling the filter method on a model class. The following code shows how to display the captain of each ship:

for person in Crewmember.filter(rank="Captain"):
  print person["name"], "captain of", person["starship"]["name"]

As you might have noticed, the starship property of the Crewmember instance points to the actual starship record. Remodel populates the property automatically to handle the Crewmember model's belongs_to relationship.

When you want to perform more sophisticated queries, you can use ReQL in conjunction with Remodel. Let's say that you want to evaluate Starfleet's diversity by determining how many crew members are of each species. You can use ReQL's group command:

Crewmember.table.group("species").ungroup() \
          .map(lambda item: [item["group"], item["reduction"].count()]) \
          .coerce_to("object").run()

The table property of a model class provides the equivalent of a ReQL r.table expression. You can chain additional ReQL commands to the table property just as you would when creating any ReQL query.

Put it all together

Just for fun, I'm going to show you how to build a web application for browsing the Starfleet crew roster. The app is built with Flask, a lightweight framework for web application development. The example also uses Jinja, a popular server-side templating system that is commonly used with Flask.

In a Flask application, the developer defines URL routes that are responsible for displaying specific kinds of information. The application uses templates to render the data in HTML format. Create a route at the application root:

app = flask.Flask(__name__)

@app.route("/")
def ships():
    return flask.render_template("ships.html", ships=Starship.all())

if __name__ == "__main__":
    app.run(host="localhost", port=8090, debug=True)

When the user visits the site root, the application will fetch all of the starships from the database and display them by rendering the ships.html template. The following is from the template file:

<ul>

</ul>

In the example above, the template iterates over every ship and displays a list item for each one. The list item includes an anchor tag that points to a URL with the ship's ID.

To make the application display the crew members of the ship when the user clicks one of the links, create a new /ship/x route that takes an arbitrary ship ID as a parameter:

@app.route("/ship/<ship_id>")
def ship(ship_id):
    ship = Starship.get(ship_id)
    crew = ship["crewmembers"].all()
    return flask.render_template("ship.html", ship=ship, crew=crew)

Fetch the desired ship from the database using the provided ID. In a real-world application, you might want to check to make sure that the record exists and throw an error if it doesn't. Once you have the ship, fetch the crew via the crewmembers property. Pass both the ship and the crew to the template:

<h1></h1>
<ul>

</ul>

Now create a /member/x route so that the user can see additional information about a crewman when they click one in the list:

@app.route("/member/<member_id>")
def member(member_id):
    member = Crewmember.get(member_id)
    return flask.render_template("crew.html", member=member)

Finally, define the template for that route:

<h1></h1>
<ul>
  <li><strong>Rank:</strong> </li>
  <li><strong>Species:</strong> </li>
</ul>

The template HTML files should go in a template folder alongside your Python script. When you run the Python script, it will start a Flask server at the desired port. You should be able to visit the URL and see the application in action.

Check out Remodel and Install RethinkDB to try it for yourself.

Resources

Make beautiful charts with RethinkDB queries and Charted.co

While building applications with RethinkDB, I often find cases where I want to be able to produce simple visualizations to help me better understand my data. Ideally, I'd like to take the output of a simple query and see what it looks like in a graph with as little work as possible. A new project recently introduced by the developers at Medium offers a compelling solution.

Medium's product science team built a lightweight web application called Charted that makes it easy for users to generate and share graphs. As input, the user provides a URL that points to CSV data. Charted processes the data and produces simple graphs with a clean and elegant design. No configuration is needed, though it allows the user to choose between bar and line formats and customize certain aspects of the output.

Charted is built on D3, a popular frontend JavaScript library that is widely used for data visualization. Simplicity is the chief advantage that Charted offers over rolling your own D3-based data visualizations by hand. Medium runs a hosted instance at Charted.co that anyone can use to publish and share graphs. You can also download the Charted source code from Github and run your own installation.

In order to use Charted with RethinkDB, you will need to convert the output of the desired query into CSV format and publish it at a URL. Fortunately, there are a number of libraries that make it very easy to perform the necessary conversion. In this tutorial, I will show you how I used the Python-based CSVKit framework with Flask to expose the output of a RethinkDB query in a form that I could pass to Charted.

Prepare your data with CSVKit

CSVKit is an open source toolkit for manipulating CSV content. It's primarily intended for use at the command line, but you can also consume it as a library in a Python script. It has a wide range of features, but we are primarily interested in using its built-in support for converting JSON to CSV.

You can import the json2csv function from the csvkit.convert.js module. The function expects to receive a file-like object, which means that you will need to wrap the content in StringIO if you would like to use a string instead of a file:

from csvkit.convert.js import json2csv

data = """[
  {"name": "Scott Summers", "codename": "Cyclops"},
  {"name": "Hank McCoy", "codename": "Best"},
  {"name": "Warren Worthington", "codename": "Angel"}
]"""

print json2csv(StringIO.StringIO(data))

If you run the code above, it will correlate the matching keys and display a comma-separated table of the values:

name,codename
Scott Summers,Cyclops
Hank McCoy,Best
Warren Worthington,Angel

Not bad so far, right? The conversion process is relatively straightforward. If you have nested objects, it will simply ignore them—it only operates on the top-level keys.

Transform data from RethinkDB

Now that you know how to convert JSON to CSV, the next step is applying the function to the output of your desired query. For the purposes of this tutorial, I'm going to use a feed of earthquake data from the USGS. As you might recall, I used that same data a few months ago in a tutorial that introduced geospatial queries.

In this case, I want to get the total number of earthquakes for each given day so that I will be able to plot it on a graph. Start by creating the table and loading the earthquake feed into the database:

c = r.connect()
r.db_create("quake").run(c)
r.db("quake").table_create("quakes").run(c)

url = "earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_month.geojson"
r.table("quakes").insert(r.http(url)["features"]).run(c)

To retrieve the relevant data, start by using the group command to organize the earthquakes by date. Next, append the ungroup command to chain additional operations to the grouped output. Finally, use the merge command to add a property that contains a total count of the records for each individual group:

output = r.db("quake").table("quakes") \
    .group(r.epoch_time(r.row["properties"]["time"] / 1000).date()) \
    .ungroup().merge({"count": r.row["reduction"].count()}).run(conn)

The group command will create a property called reduction that contains all of the values for each group. To get the total number of items for the group, you can simply call the count method on the array stored in reduction. The USGS feed uses high-precision timestamps, so you have to divide the value of the time property by 1000 to get the number of seconds before applying the epoch_time command.

There are a few minor wrinkles that you have to sort out before you convert the output to CSV. The group keys are date objects, which you can't really use for graphing. You must convert those timestamps to simple date strings that are suitable for use in the graph. The order of the keys is also important, because Charted will automatically use the first column as the x-axis in its graphs.

In order to specify the key order and format the timestamps, you will need to iterate over each item in the result set and create an OrderedDict that contains all of the values:

data = json.dumps([OrderedDict([
    ["date", item["group"].strftime("%D")],
    ["count", item["count"]]]) for item in output])

print json2csv(StringIO.StringIO(data))

Serve the output

In order to get the data into Charted, you will need to serve the generated CSV content through a public URL. For the purposes of this tutorial, I chose to accomplish that with Flask, a simple Python library for building server-side web applications.

In a Flask application, you use a Python decorator to associate a function with a URL route. I chose to create two routes, one that exposes the content in JSON format and one that exposes it in CSV format. The latter simply wraps the output of the former:

@app.route("/quakes")
def quakesJSON():
    conn = r.connect()
    output = r.db("quake").table("quakes") \
        .group(r.epoch_time(r.row["properties"]["time"] / 1000).date()) \
        .ungroup().merge({"count": r.row["reduction"].count()}).run(conn)

    conn.close();
    return json.dumps([OrderedDict([
        ["date", item["group"].strftime("%D")],
        ["count", item["count"]]]) for item in output])

@app.route("/quakes/csv")
def quakesCSV():
    return json2csv(StringIO.StringIO(quakesJSON()))

Now that you have a running server that outputs your data set in CSV format, you can take the URL and provide it to Charted. If you intend to use the public instance of Charted that is hosted at Charted.co, you will need to make sure that your Flask application server is publicly accessible. You might want to consider using a tool like ngrok to make a Flask application running on your local system accessible to the rest of the Internet. If you don't want to publicly expose your data, you could also optionally run your own local instance of Charted.

You can find a complete 50-line example by visiting this gist on GitHub. Install RethinkDB to try it for yourself.

For additional information, you can refer to:

A tasty RethinkDB video roundup for Thanksgiving

Thanksgiving is almost here: it is time to configure your dinner tables for family clusters and prepare some turkey for batch insertion of stuffing. To show how thankful we are for our amazing community, we put together this tasty video playlist with our best leftovers from October and November. It will help keep you entertained while you try not to succumb to the inevitable post-turkey tryptophan coma. Enjoy!


RethinkDB on FLOSS Weekly

RethinkDB co-founder Slava Akhmechet participated in a recent episode of TWiT's FLOSS Weekly video podcast. The hour-long interview includes a lengthy discussion about RethinkDB's origins, open source values, and suitability for real-time application development. Slava also shared lessons learned during RethinkDB development and talked about some future plans for the project.


Scale up RethinkDB apps on AWS with Docker

Climb Amazon's Elastic Beanstalk with RethinkDB co-founder Michael Glukhovsky in a presentation filmed at this month's Docker meetup. During the 20-minute talk, Michael demonstrated how to deploy a RethinkDB application on AWS with Docker. Learn how Docker containers and RethinkDB changefeeds make it easy to scale real-time apps in the cloud.


Build realtime location-aware apps with RethinkDB

RethinkDB Developer Evangelist Ryan Paul shook up a crowd last month with a short presentation about earthquake mapping. Ryan demonstrated how to use geospatial queries in RethinkDB to plot earthquake data on a map. Ryan also demonstrated how location-aware applications can take advantage of RethinkDB changefeeds to deliver real-time updates.


Pub/Sub made easy with RethinkDB

RethinkDB engineer Josh Kuhn made Gotham's streets a little safer during a five-minute presentation at last month's RethinkDB meetup. He demonstrated how to track comic book superhero match-ups in real-time using repubsub, a lightweight pub/sub library that uses RethinkDB as a message exchange.


RethinkDB hosting webinar with Compose

Our friends at Compose now offer a managed RethinkDB hosting service in the cloud. In a webinar last month, RethinkDB co-founder Slava Akhmechet and Compose CEO Kurt Mackey demonstrated how to use the service and discussed how it works.

CatThink: see the cats of Instagram in realtime with RethinkDB and Socket.io

Modern frameworks and standards make it easy for developers to build web applications that support realtime updates. You can push the latest data to your users, offering a seamless experience that results in higher engagement and better usability. With the right architecture on the backend, you can put polling out to pasture and liberate your users from the tyranny of the refresh button.

In this tutorial, I'll show you how I built a realtime Instagram client for the web. The application, which is called CatThink, displays a live feed of new Instagram pictures that have the #catsofinstagram tag. Why cats of Instagram? Because it's one of the photo service's most popular and beloved tags. People on the internet really, really like cats. Or maybe we just think we do because our feline companions have reprogrammed us with brain parasites.

The cat pictures appear in real time, as they are posted by their respective users. CatThink shows the pictures in a grid, accompanied by captions and other relevant metadata. In a secondary view, the application uses geolocation info to plot the cat pictures on a map.

CatThink's architecture

The CatThink backend is built with Node.js and Express on top of RethinkDB. The HTML frontend uses jQuery and Handlebars to display the latest cat pictures. The frontend map view is built with Leaflet, a popular map library that uses tiles from OpenStreetMap. The application uses Socket.io to facilitate communication between the frontend and backend.

CatThink takes advantage of Instagram's realtime APIs to determine when new images are available. Instagram offers a webhook-based system that allows a backend application to subscribe to updates on a given tag. When there are new posts with the #catsofinstagram tag, Instagram's servers send an HTTP POST request to a callback URL on your server. The POST request doesn't actually include the new content, it just includes a timestamp and the name of the updated tag---your application has to fetch the new records using Instagram's conventional REST API endpoints.

When the CatThink backend receives a POST request from Instagram, it performs a RethinkDB query that uses the r.http command to fetch the latest records from the Instagram REST API and add them directly to the database. The database itself performs the HTTP GET request and parses the returned data.

Because the operation is performed entirely with ReQL, the backend application isn't responsible for fetching or processing any of the new Instagram pictures. Of course, the backend application will still need to know about new cat pictures so that it can send them to the frontend with Socket.io. CatThink accomplishes that with changefeeds, a RethinkDB feature that lets applications subscribe to changes on a table. Whenever the database adds, removes, or changes a document in the table, it will notify subscribed applications.

CatThink subscribes to a changefeed on the table where the cat records are stored. Whenever the database inserts a new cat record, CatThink receives the data through the changefeed and then broadcasts it to all of the Socket.io connections.

Connect to the Instagram realtime API

To use the Instagram API, you will have to register an application key on the Instagram developer site. You will need to use the client ID and client secret provided by Instagram in order to hit the API endpoints. You don't need to configure the key with a redirect URI, however, as you won't be using authentication.

To subscribe to a tag with Instagram's realtime API, make an HTTP POST request to the api.instagram.com/v1/subscriptions/. In the form data attached to the request, you will need to provide the application key data, the name of the tag, a verification token, and the callback URL where you want Instagram to send new data. The verification token is an arbitrary string that Instagram will pass back to your application when it hits the callback URL.

Note: the callback URL that you provide to Instagram must be publicly-accessible to outside networks. For development purposes, it can be helpful to use a tool like ngrok that exposes a local port to the public internet.

In CatThink, I use the request library to perform the initial request to the Instagram server:

var params = {
  client_id: "XXXXXXXXXXXXXXXXXXXXXXXXX",
  client_secret: "XXXXXXXXXXXXXXXXXXXXXXXXX",
  verify_token: "somestring",
  object: "tag", aspect: "media",
  object_id: "catsofinstagram",
  callback_url: "http://mycatapp.ngrok.com/publish/photo"
};

request.post({url: api + "subscriptions", form: params},
  function(err, response, body) {
    if (err) console.log("Failed to subscribe:", err);
    else console.log("Successfully subscribed.");
});

If the subscription API call is properly formed, Instagram will immediately attempt to make an HTTP GET request at the callback URL. It will send several query parameters, including the verification token and a challenge key. In order to complete the subscription, you have to make the GET request return the provided challenge key. With Express, create a GET handler for the callback URL:

app.get("/publish/photo", function(req, res) {
  if (req.param("hub.verify_token") == "somestring")
    res.send(req.param("hub.challenge"));
  else res.status(500).json({err: "Verify token incorrect"});
});

Fetch the latest cats

The next step is to implement the POST handler for the callback URL. When Instagram sends the application a POST request to inform it of new content on the subscribed tag, it includes several bits of information in the request body:

[{
        "changed_aspect": "media",
        "object": "tag",
        "object_id": "catsofinstagram",
        "time": 1414995025,
        "subscription_id": 14185203,
        "data": {}
}]

The object_id property is obviously the name of the subscribed tag. The time property is a UNIX timestamp that reflects when the event occurred. The subscription_id property is a value that uniquely identifies the individual subscription.

Whenever the application receives a POST request at the callback URL, it will tell the database to fetch the latest cat records from Instagram's REST API. The application also provides a response so that Instagram knows that the POST request didn't fail. If the POST requests that Instagram sends to the application start to fail, Instagram will automatically taper off requests and eventually cancel the tag subscription.

app.post("/publish/photo", function(req, res) {
  var update = req.body[0];
  res.json({success: true, kind: update.object});

  if (update.time - lastUpdate < 1) return;
  lastUpdate = update.time;

  var path = "https://api.instagram.com/v1/tags/" +
             "catsofinstagram/media/recent?client_id=" +
             instagramClientId;


  r.connect(config.database).then(function(conn) {
    this.conn = conn;
    return r.table("instacat").insert(
      r.http(path)("data").merge(function(item) {
        return {
          time: r.now(),
          place: r.point(
            item("location")("longitude"),
            item("location")("latitude")).default(null)
        }
      })).run(conn)
  })
  .error(function(err) { console.log("Failure:", err); })
  .finally(function() {
    if (this.conn)
      this.conn.close();
  });
});

In the code above, the ReQL query uses the r.point command in a merge operation to turn the geographical coordinates for each cat photo into a native geolocation point object. That's not used in the application, but it might be useful later if you wanted to create a geospatial index and query for cat pictures based on location.

In order to avoid hitting the Instagram API limit, the application checks the timestamp provided with each POST request and does some basic throttling to ensure that new cat records aren't typically going to be fetched more than once per second.

The path variable in the handler code is the URL of the Instagram REST API endpoint that the application uses to fetch the latest cat. In this example, the "catsofinstagram" tag is hard-coded into the URL path. It's worth noting, however, that you could use the name of the subscribed tag from the object_id property if you wanted to use the same POST handler to deal with multiple tag subscriptions.

Verify the request origin

In cases where you rely on the object_id property, you'd probably also want to validate the source of the POST request to make sure that it actually came from Instagram. If you don't verify the origin, somebody might figure out your URL endpoint and send you malicious POST requests that include an object_id for a rogue tag that you don't want to appear in your application. You wouldn't want some nefarious anti-cat vigilante to trick your application into showing dogs, for example.

Every POST request from Instagram will have an X-Hub-Signature header with a hash that you can validate using your secret key and the request body. The bodyParser middleware provides a verify option that is specifically intended for such purposes:

app.use("/publish/photo", bodyParser.json({
  verify: function(req, res, buf) {
    var hmac = crypto.createHmac("sha1", "XXXXXXXXXXXXXXX");
    var hash = hmac.update(buf).digest("hex");

    if (req.header("X-Hub-Signature") == hash)
      req.validOrigin = true;
  }
}));

At the beginning of your POST handler, you would simply check the value of req.validOrigin and make sure that it's true before continuing.

Use changefeeds to handle new cats

The CatThink backend uses RethinkDB changefeeds to detect when the database adds new records to the cat table. In a ReQL query, the changes command returns a cursor that exposes every modification that is made to the specified table. The following code shows how to consume the data emitted by the changefeed and broadcast each new item with Socket.io:

r.table("instacat").changes().run(this.conn).then(function(cursor) {
  cursor.each(function(err, item) {
    if (item && item.new_val)
      io.sockets.emit("cat", item.new_val);
  });
})
.error(function(err) {
  console.log("Error:", err);
});

CatThink broadcasts every cat to every user, so you don't need to worry about tracking individual Socket.io connections or routing messages to the right users.

In addition to broadcasting new cats, it's also a good idea to pass the user a modest backlog of cats when they first establish their connection with the server so that their initial view of the application is populated with some data. In a Socket.io connection event handler, CatThink performs a ReQL query that fetches the 60 most recent cats and then sends the result set back to the user:

io.sockets.on("connection", function(socket) {
  r.connect(config.database).then(function(conn) {
    this.conn = conn;
    return r.table("instacat").orderBy({index: r.desc("time")})
            .limit(60).run(conn)
  })
  .then(function(cursor) { return cursor.toArray(); })
  .then(function(result) {
    socket.emit("recent", result);
  })
  .error(function(err) { console.log("Failure:", err); })
  .finally(function() {
    if (this.conn)
      this.conn.close();
  });
});

Implement the frontend

The CatThink frontend has a very simple user interface: It displays the grid of cats and the accompanying map view. A full-blown JavaScript MVC framework would likely be overkill, so it uses a pretty light dependency stack. It uses Leaflet for the map, jQuery for the UI logic, and Handlebars templating to generate the markup for each new cat picture.

After some initial setup for the tab switching and map view, the bulk of the frontend code is housed in a single addCat function that applies the template to the cat data, inserts the new markup into the grid, and then creates the location marker for cats with geolocation data:

var map = L.map("map").setView([0, 0], 2);
map.addLayer(L.tileLayer(mapTiles, {attribution: mapAttrib}));

var template = Handlebars.compile($("#cat-template").html());
var markers = [];

function addCat(cat) {
  cat.date = moment.unix(cat.created_time).format("MMM DD, h:mm a");
  $("#cats").prepend(template(cat));

  if (cat.place) {
    var count = markers.unshift(L.marker(L.latLng(
        cat.place.coordinates[1],
        cat.place.coordinates[0])));

    map.addLayer(markers[0]);
    markers[0].bindPopup(
        "<img src=\"" + cat.images.thumbnail.url + "\">",
        {minWidth: 150, minHeight: 150});

    markers[0].openPopup();

    if (count > 100)
      map.removeLayer(markers.pop());
  }
}

The map markers are stored in an array so that the application can easily remove old markers as it adds new ones. The marker cap is set to 100 in the code above, but you could likely raise it considerably if desired. It's important to have some kind of cap, however, because Leaflet can sometimes exhibit odd behavior if you have too many.

The Handlebars template that the application applies to the cat data is embedded in the HTML page itself, using a script tag with a custom type:

<script id="cat-template" type="text/x-handlebars-template">
  <div class="cat">
    <div class="user"></div>
    <div class="meta">
      <div class="time">Posted at </div>
      <div class="caption"></div>
    </div>
    <img class="thumb" src="">
  </div>
</script>

The last piece of the puzzle is implementing Socket.io on the client side. The application needs to establish a Socket.io connection with the server and then provide event handlers for the backlog and new cats. Both handlers will simply use the addCat function shown above.

var socket = io.connect();

socket.on("cat", addCat); 
socket.on("recent", function(data) {
  data.reverse().forEach(addCat);
});

The handler for the "cat" event receives a single cat object, which is immediately passed into the addCat function. The handler for the "recent" event receives an array of cat objects from the server. It reverses the array before adding the cats so that the images will display in reverse-chronological order, consistent with how they are added in real time.

Next steps

Although CatThink is not particularly complex, changefeeds helped to simplify the application and reduce the total amount of necessary code. Without changefeeds, the CatThink backend would have to fetch, parse, and process all of the cat records on its own instead of offloading that work to the database with a simple ReQL query.

In larger realtime applications, changefeeds can potentially offer more profound architectural advantages. You can increase the modularity of your application by decoupling the parts that handle and process data from the parts that convey updates to the frontend. There are also cases where you can use changefeeds to eliminate the need for dedicated message queue systems.

In the current version of RethinkDB, changefeeds offer a useful way to monitor changes on individual tables. In future versions, changefeeds will support a richer set of capabilities. Users will be able to monitor filtered data sets and detect change events on complex aggregations, like a player leader board or realtime moving averages. You can look forward to seeing the first round of new changefeed features in an upcoming release.

Install RethinkDB and try the ten-minute guide to experience the database in action.

For additional information, you can refer to:

Deploying RethinkDB applications with Docker using Dokku

Dokku is a simple application deployment system built on Docker. It gives you a Heroku-like PaaS environment on your own Linux system, enabling you to deploy your applications with git. Dokku automatically configures the proper application runtime environment, installs all of the necessary dependencies, and runs each application in its own isolated container. You can easily run Dokku on your own server or an inexpensive Linux VPS.

The RethinkDB Dokku plugin, created by Stuart Bentley, lets developers create containerized RethinkDB instances for their Dokku-deployed apps. I've found that Dokku is a really convenient way to share my RethinkDB demos while I'm prototyping without having to manually deploy and configure each one. In this short tutorial, I'll show you how you can set up Dokku and install the plugin on a Digital Ocean droplet.

Set up a Digital Ocean droplet

If you want to set up Dokku somewhere other than Digital Ocean, you can use the Dokku project's official install script to get it running on any conventional Ubuntu 14.04 system.

Digital Ocean provides a selection of base images that make it easy to create new droplets that come with specific applications or development stacks. Dokku is among the applications that Digital Ocean supports out of the box. When you create a new droplet, simply select the Dokku image from the Applications tab.

You can configure the droplet with the size, region, and hostname of your choice. Be sure to add an SSH key---it will be used later to identify you when you deploy to the system.

After Digital Ocean finishes creating the new droplet, navigate to the droplet's IP address in your browser. The server will display a Dokku configuration panel. The page will prompt you for a public key and a hostname. The key that you selected during droplet creation will automatically appear in the public key field. In the hostname box, you can either put in a domain or the IP address of the droplet.

If you use an IP address, Dokku will simply assign a unique port to each of your deployed applications. If you configure Dokku with a domain, it will automatically create a virtual host configuration with a subdomain for each application that you deploy. For example, if you set apps.mydomain.com as the hostname, an app called demo1 will be available at demo1.apps.mydomain.com. After you fill in the form, click the Finish Setup button to complete the Dokku configuration.

If you chose to use a domain, you also have to set up corresponding DNS records. In your DNS configuration system, add two A records---one for the domain itself and a wildcard record for the subdomains. Both records should use the IP address of your droplet.

A   apps.mydomain.com     xxx.xxx.xxx.xxx
A   *.apps.mydomain.com   xxx.xxx.xxx.xxx

Install the RethinkDB Dokku plugin

The next step is installing the plugin. Use ssh to log into the droplet as root. After logging into the system, navigate to the Dokku plugin folder:

$ cd /var/lib/dokku/plugins

Inside of the Dokku plugin folder, use the git clone command to obtain the plugin repository and put it in a subdirectory called rethinkdb. When the repository finishes downloading, use the dokku plugins-install command to install the plugin.

$ git clone https://github.com/stuartpb/dokku-rethinkdb-plugin rethinkdb
$ dokku plugins-install

Configure your application for deployment

Before you deploy an application, you will need to use Dokku to set up a linked RethinkDB container. While you are logged into the droplet as root, use the following command to set up a new RethinkDB instance:

$ dokku rethinkdb:create myapp

You can replace myapp with the name that you want to use for your application. When you deploy an application, Dokku will automatically link it with the RethinkDB container that has the same name. Now that you have created a RethinkDB container, it is time to deploy your first application.

Dokku supports a number of different programming languages and development stacks. It uses certain files in the project root directory to determine what dependencies to install and how to run the application. For a Ruby demo that I built with Sinatra, all I needed was a Gemfile and a config.ru. For a node.js application built with Express, I used a package.json that included the dependencies and a start script.

You can also optionally use a Heroku-style Procfile to specify how to start the app. Dokku is largely compatible with Heroku, so you can refer to the Heroku docs to see what you need to do for other programming language stacks.

In the source code for your application, you will need to specify the host and port of the RethinkDB instance in the linked container. The RethinkDB Dokku plugin exposes those through environment variables called RDB_HOST and RDB_PORT. In my Ruby application, for example, I used the following code to connect to the database:

DBHOST = ENV["RDB_HOST"] || "localhost"
DBPORT = ENV["RDB_PORT"] || 28015

conn = r.connect :host => DBHOST, :port => DBPORT
...

After you finish configuring your application so that it will run in Dokku, be sure to commit your changes to your local git repository. To deploy the application, you will need to create a new remote:

$ git remote add dokku dokku@apps.mydomain.com:myapp

In the example above, use the domain or IP address of the droplet. Replace the word myapp with the name of your application. The name should match the one that you used when you created the RethinkDB container earlier.

Deploy your application

When you are ready to deploy the application, simply push to dokku:

$ git push dokku master

When you push the application, Dokku will automatically create a new container for it on the droplet, install the necessary dependencies, and start running the application. After the deployment process is complete, you will see the address in your output. If you used an IP address, it will just be the IP and port. If you used a domain, it will be a subdomain like myapp.apps.mydomain.com. Visit the site in a web browser to see if it worked correctly.

If your application didn't start correctly, you can log into the droplet to troubleshoot. Use the following command to see the logs emitted by the deploy process:

$ dokku logs myapp

Replace myapp with the name that you used for your application. That command will show you the log output, which should help you determine if there were any errors. If you want to delete the deployed application, perform the following command:

$ dokku delete myapp

You can type dokku help to see the full list of available commands. I also recommend looking at the advanced usage examples for the RethinkDB Dokku plugin to learn about other capabilities that it provides. You can, for example, expose the web console for a specific containerized RethinkDB instance through a public port on the host.

Although the initial setup process is a little bit involved, Dokku makes it extremely easy to deploy and run your RethinkDB applications. Be sure to check out our example projects if you are looking for a sample RethinkDB application to try deploying with Dokku.

For additional information about using Dokku with RethinkDB, check out: