Upcoming RethinkDB events for October and November

Join the RethinkDB team at the following upcoming events:

RethinkDB at HTML5DevConf

October 21-22, Moscone Center

RethinkDB will be in San Francisco next week for HTML5DevConf, a popular event for frontend web developers. Conference attendees will be able to find us at table 29 in the conference expo hall. You can see our latest demo apps and meet RethinkDB co-founder, Slava Akhmechet. We will also have some fun RethinkDB goodies on hand to give away, including shirts and stickers.

Webinar with Compose

Wednesday, October 22 at 1:30PM PST

Our friends at Compose recently introduced a new service that provides managed RethinkDB hosting in the cloud. They have published several guides to help new users get started with the service. If you would like to learn more, be sure to catch our joint webinar with Compose next week. The live video event will feature Slava Akhmechet and Compose co-founder Kurt Mackey.

RSVP Here »

RethinkDB at DevCon5

November 18-19, San Jose Convention Center

RethinkDB will be at the HTML5 Communications Summit next month in San Jose. Slava will present a talk about real-time web application development with RethinkDB. We will also have a booth where you can see RethinkDB demos, meet members of the team, and get some nice RethinkDB goodies to bring home.

Move Fast and Break Things meetup

Wednesday, November 19 at 6:30 PM, Heavybit Industries, 325 Ninth Street (map)

RethinkDB will give a presentation for the Move Fast and Break Things meetup group in San Francisco. Learn how the RethinkDB team works, including details about the tools and collaborative processes that we use to deliver new RethinkDB releases. More details about the event will be available as it approaches.

RSVP Here »

Hosted RethinkDB deployments in the cloud now available from Compose

We are pleased to announce that our friends at Compose now offer RethinkDB hosting in the cloud. Their new service lets you get a managed RethinkDB deployment in a matter of seconds, providing a fast and easy way to start working on your RethinkDB project without the overhead of managing your own infrastructure or provisioning your own cluster.

Compose, formerly known as MongoHQ, is a dedicated Database as a Service (DBaaS) company. RethinkDB is the third database in their product lineup, launching alongside their existing support for MongoDB and Elasticsearch. Available today as a public beta, their hosted RethinkDB deployments come with automatic scaling and backups.

Each deployment provided by Compose is configured as a high-availability cluster with full redundancy. Their elastic provisioning service manages the entire environment, scaling deployments as needed to accommodate user workloads. Pricing starts at $45 per month for a three-node cluster with 2GB of storage capacity.

Migrate data from a MongoDB deployment

In addition to elastic scaling, Compose also offers a data migration system called a Transporter. If you have data in an existing MongoDB deployment managed by Compose, you can seamlessly import it into a RethinkDB deployment.

The import can be a one-time event or maintained on an ongoing basis with continuous updates—regularly pulling the latest changes into RethinkDB from your MongoDB deployment. If you have an existing MongoDB application that you would like to consider migrating to RethinkDB, Compose makes it really easy to get started.

Get started with Compose

To create a hosted RethinkDB instance, click the Add Deployment button in the Compose admin panel and select RethinkDB. Simply enter a name for the deployment—Compose handles the rest. You will need to input billing information for your Compose account if you have not done so previously.

Each RethinkDB deployment hosted by Compose has its own private network. Compose uses SSH tunneling to provide secure access to a hosted cluster. When you create a RethinkDB deployment in the Compose admin console, it will give you the host and port information that you need to connect.

Once you set up the SSH tunnel on your client system, you can work with the hosted RethinkDB instance in much the same way you would work with a local installation of the database. Even the RethinkDB admin console and Data Explorer operate as expected.

Building your next application with RethinkDB couldn't be easier. Register an account at Compose.io and get started right away. For more details:

BeerThink: infinite scrolling in a mobile app with Ionic, Node.js, and RethinkDB

Developers often use pagination to display large collections of data. An application can fetch content in batches as needed, presenting a fixed number of records at a time. On the frontend, paginated user interfaces typically provide something like "next" and "previous" navigation buttons so that users can move through the data set. In modern mobile apps, it is increasingly common to implement an infinite scrolling user interface on top of paginated data. As the user scrolls through a list, the application fetches and appends new records.

To demonstrate the use of pagination in RethinkDB applications, I made a simple mobile app called BeerThink. It displays a list of beers and breweries, providing a detailed summary when the user taps an item. The app uses a data dump from the Open Beer Database, which contains information about roughly 4,400 beers and 1,200 breweries. I converted the data to JSON so that it is easy to import into RethinkDB. There are two tables, one for beers and one for breweries. The application uses RethinkDB's support for table joins to correlate the beers with their respective breweries.

BeerThink's backend is built with Node.js and Express. It exposes beer and brewery data retrieved from a RethinkDB database, providing a paginated API that returns 50 records at a time.

The BeerThink frontend is built with Ionic, a popular AngularJS-based JavaScript framework designed for mobile web apps. BeerThink uses an infinite scrolling list to present the beers in alphabetical order.

BeerThink's architecture aligns with the API-first approach used by many modern mobile web applications. The backend is solely an API layer, completely decoupled from the frontend. The frontend is a single-page web application designed to consumes the backend API. This particular approach makes it easy to build multiple frontend experiences on top of the same backend. You could, for example, easily make native desktop and mobile applications that consume the same backend API.

This tutorial demonstrates how BeerThink's pagination works at each layer of the stack: the RethinkDB database, the Node backend, and the Ionic client application.

Efficient pagination in RethinkDB

If you'd like to follow along and try the pagination queries yourself, create a table and then use the r.http command to add the beer list to a database:

r.table("beers").insert(r.http("https://raw.githubusercontent.com/rethinkdb/beerthink/master/data/beers.json", {result_format: "json"}))

To efficiently alphabetize and paginate the beer list, you first need to create an index on the name property:

r.table("beers").indexCreate("name")

After creating the index, you can use it in the orderBy command to fetch an alphabetized list of names:

r.table("beers").orderBy({index: "name"})

When paginating records from a database, you want to be able to obtain a subset of ordered table records. In a conventional SQL environment, you might accomplish that by using OFFSET and LIMIT. RethinkDB's skip and limit commands are serviceable equivalents, but the skip command doesn't offer optimal performance.

The between command, which is commonly used to fetch all documents that are between two keys in a table, is a much more efficient way to get the start position of a table subset. You can optionally specify a secondary index when using the between command, which means that it can operate on the indexed name property of the beers table.

The following example shows how to use the between command on the name index to get all of the beers between "Petrus Speciale" and "Plank Road Pale Ale" in alphabetical order:

r.table("beers")
  .between("Petrus Speciale", "Plank Road Pale Ale", {index: "name"})
  .orderBy({index: "name"})

When the BeerThink application starts, it uses orderBy and limit to fetch the first page of data. To get subsequent pages, it uses the between and limit commands. The value that the program supplies for the between command's start position is simply the index of the very last item that was fetched on the previous page.

r.table("beers")
  .between("Petrus Speciale", null, {leftBound: "open", index: "name"})
  .orderBy({index: "name"}).limit(50)

The example above shows how to fetch 50 records, starting from a particular beer. Because the program doesn't actually know what beer will be at the end of the new page of data, the between command is given null as its closing index value. That will cause the between command to return everything from the start index to the end of the table. The query uses the limit command to get only the desired number of records.

Setting the value of the leftBound option to open tells the between command to omit the first record, the one that we use to define the start index. That's useful because the item is one that you already have at the end of your list---you don't want to add it again.

The slice command

The between command is a good way to implement pagination in many cases, but it isn't universally applicable. There are cases where you won't have the last item of the previous page to use as a starting point.

Consider a situation where you want the user to be able to visit an arbitrary page without first iterating through the entire set. You might, for example, want to build a web application that accepts an arbitrary page number as a URL path segment and returns the relevant results. In such cases, the best approach is to use the slice command.

The slice command takes a start index and an end index. To get 50 records that are 3000 records down from the top of the table, simply pass 3000 and 3050 as the parameters:

r.table("beers").orderBy({index: "name"}).slice(3000, 3050)

When the user requests an arbitrary page, you simply multiply by the number of items per page to determine the slice command's start and end positions:

query.slice((pageNumber - 1) * perPage, pageNumber * perPage)

In the example above, use the desired values for pageNumber and perPage. Although the slice command isn't as fast as using between and limit, it is still much more efficient than using the skip command.

Pagination in BeerThink's API backend

The BeerThink backend is built with Node and Express. It provides simple API endpoints that are consumed by the frontend client application. The /beers endpoint provides the list of beers, 50 records at a time. The application also has a /breweries endpoint that similarly displays a list of beers.

For pagination, the user can optionally pass a last URL query parameter with the name of the most recently-fetched item. Both API endpoints support the same pagination mechanism. Taking advantage of the ReQL query language's composability, I generalized the operation that I use for pagination into a function that I can apply to any table index:

function paginate(table, index, limit, last) {
  return (!last ? table : table
    .between(last, null, {leftBound: "open", index: index}))
  .orderBy({index: index}).limit(limit)
}

The table parameter takes a RethinkDB expression that references a table. The index parameter is the name of the table index on which to operate. The limit parameter is the total number of desired items. The last parameter is the item to use to find the start of the page. If the last parameter is null or undefined, the application will fetch the first page of data instead of applying the between command.

In the /breweries endpoint, apply the paginate function to the breweries table. Use the req.param method provided by Express to get the URL query parameter that has the value of the last list item. If the user didn't provide the URL query parameter, the value will be undefined. All you have to do is run the query and give the user the JSON results:

app.get("/breweries", function(req, res) {
  var last = req.param("last");

  paginate(r.table("breweries"), "name", 50, last).run(req.db)
  .then(function(cursor) { return cursor.toArray(); })
  .then(function(output) { res.json(output); })
  .error(function(err) {
    res.status(500).json({error: err});
  });
});

The /beers endpoint is implemented the exact same way as the /breweries endpoint, using the same paginate function that I defined above. The query is a little more complex, however, because it has to use an eqJoin operation to get the brewery for each beer:

app.get("/beers", function(req, res) {
  var last = req.param("last");

  paginate(r.table("beers"), "name", 50, last)
    .eqJoin("brewery_id", r.table("breweries"))
    .map(function(item) {
      return item("left").merge({"brewery": item("right")})
    }).without("brewery_id").run(req.db)
  .then(function(cursor) { return cursor.toArray(); })
  .then(function(output) { res.json(output); })
  .error(function(err) {
    res.status(500).json({error: err});
  });
});

Even though the two endpoints used different queries, the same pagination function worked well on both. Abstracting common ReQL patterns into reusable functions can greatly simplify your code. If you wanted to make it possible for the client specify how many records are returned for each page, you could easily achieve that by taking another request variable and passing it to the paginate function as the value of the limit parameter.

Slice-style pagination on the backend

Although the between command is the best approach to use for pagination in the BeerThink application, the slice command is also easy to implement on the backend. I've included a brief explanation here for those who would like to see an example.

When you define a URL handler in Express, you can use a colon to signify that a particular URL segment is a variable. If you define the breweries endpoint as /breweries/:page, the page number passed by the user in the URL segment will be assigned to the request's page parameter.

In the handler for the endpoint, use parseInt or a plus sign to coerce the page number into an integer that can be passed into the ReQL query. Next, use the orderBy command to alphabetize the breweries. Finally, use the slice command with the page number and item count to fetch the desired subset of items.

app.get("/breweries/:page", function(req, res) {
  var pageNum = parseInt(req.params.page) || 1;

  r.table("breweries").orderBy({index: "name"})
    .slice((pageNum - 1) * 50, pageNum * 50).run(req.db)
  .then(function(cursor) { return cursor.toArray(); })
  .then(function(output) { res.json(output); })
  .error(function(err) {
    res.status(500).json({error: err});
  });
});

If the user browses to /breweries/3, the application will give them the third page of brewery data formatted in JSON. In the example above, you might notice that the code assigns a default value of 1 to the pageNum variable if a page number wasn't provided with the request. That makes it so visiting /breweries by itself, without a page URL segment, will return the first page of data.

Consuming the paginated API in Ionic

Now that the endpoint is defined, the client can simply iterate through the pages as the user scrolls, adding each page of data to a continuous list. It's especially easy to accomplish with Ionic, because the framework includes an AngularJS directive called ion-infinite-scroll that you can use alongside any list view to easily implement infinite scrolling:

<ion-content>
  <ion-list>
    <ion-item collection-repeat="beer in items" ...>
      ...
    </ion-item>
  </ion-list>

  <ion-infinite-scroll on-infinite="fetchMore()" distance="25%">
  </ion-infinite-scroll>
</ion-content>

In the markup above, the framework will execute the code in the on-infinite attribute whenever the user scrolls to the position described in the distance attribute. In this case, the application will call the fetchMore method on the active scope whenever the user scrolls within 25% of the list's bottom.

In the associated AngularJS controller, the fetchMore method uses the $http service to retrieve the next page of data. It passes the name property of the most recently-fetched list item as the last URL query parameter, telling the backend which page to return.

app.controller("ListController", function($scope, $http) {
  $scope.items = [];
  var end = false;

  $scope.fetchMore = function() {
    if (end) return;

    var count = $scope.items.length;
    var params = count ? {"last": $scope.items[count-1].name} : {}

    $http.get("/beers", {params: params}).success(function(items) {
      if (items.length)
        Array.prototype.push.apply($scope.items, items);
      else end = true;
    }).error(function(err) {
      console.log("Failed to download list items:", err);
      end = true;
    }).finally(function() {
      $scope.$broadcast("scroll.infiniteScrollComplete");
    });
  };
});

Each time that the fetchMore function retrieves data, it appends the new records to the items scope variable. If the backend returns no data, the application assumes that it has reached the end of the list and will stop fetching additional pages. Similarly, it will stop fetching if it encounters an error. In a real-world application, you might want to handle errors more gracefully and make it so that the user can force a retry.

The ion-item element in the HTML markup is bound to the items array, which means that new records will automatically display in the list. When I first built the application, I originally implemented the repeating list item with Angular's ng-repeat directive. I soon discovered that ng-repeat doesn't scale very well to lists with thousands of items---scrolling performance wasn't very good and switching back from the beer detail view was positively glacial.

I eventually switched to Ionic's relatively new collection-repeat directive, which is modeled after the cell reuse techniques that found in native mobile frameworks. Adopting collection-repeat substantially improved scrolling performance and eliminated detail view lag. If you are building mobile web apps with infinite scrolling lists that will house thousands of items, I highly recommend collection-repeat.

Going further

The application has a number of other features that are beyond the scope of this article, but you can get the source code from GitHub and have a look if you would like to learn more.

Install RethinkDB and check out the 10-minute intro guide to start building your first project.

Building an earthquake map with RethinkDB and GeoJSON

RethinkDB 1.15 introduced new geospatial features that can help you plot a course for smarter location-based applications. The database has new geographical types, including points, lines, and polygons. Geospatial queries makes it easy to compute the distance between points, detect intersecting regions, and more. RethinkDB stores geographical types in a format that conforms with the GeoJSON standard.

Developers can take advantage of the new geospatial support to simplify the development of a wide range of potential applications, from location-aware mobile experiences to specialized GIS research platforms. This tutorial demonstrates how to build an earthquake map using RethinkDB's new geospatial support and an open data feed hosted by the USGS.

Fetch and process the earthquake data

The USGS publishes a global feed that includes data about every earthquake detected over the past 30 days. The feed is updated with the latest earthquakes every 15 minutes. This tutorial uses a version of the feed that only includes earthquakes that have a magnitude of 2.5 or higher.

In the RethinkDB administrative console, use the r.http command to fetch the data:

r.http("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_month.geojson")

The feed includes an array of geographical points that represent earthquake epicenters. Each point comes with additional metadata, such as the magnitude and time of the associated seismic event. You can see a sample earthquake record below:

{
  id: "ak11383733",
  type: "Feature",
  properties: {
    mag: 3.3,
    place: "152km NNE of Cape Yakataga, Alaska",
    time: 1410213468000,
    updated: 1410215418958,
    ...
  },
  geometry: {
    type: "Point",
    coordinates: [-141.1103, 61.2728, 6.7]
  }
}

The next step is transforming the data and inserting it into a table. In cases where you have raw GeoJSON data, you can typically just wrap it with the r.geojson command to convert it into native geographical types. The USGS earthquake data, however, uses a non-standard triple value for coordinates, which isn't supported by RethinkDB. In such cases, or in situations where you have coordinates that are not in standard GeoJSON notation, you will typically use commands like r.point and r.polygon to create geographical types.

Using the merge command, you can iterate over earthquake records from the USGS feed and replace the value of the geometry property with an actual point object. The output of the merge command can be passed directly to the insert command on the table where you want to store the data:

r.table("quakes").insert(
  r.http("earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_month.geojson")("features")
    .merge(function(quake) {
      return {
        geometry: r.point(
          quake("geometry")("coordinates")(0),
          quake("geometry")("coordinates")(1))
      }
    })
  )

The r.point command takes longitude as the first parameter and latitude as the second parameter, just like GeoJSON coordinate arrays. In the example above, the r.point command is passed the coordinate values from the earthquake object's geometry property.

As you can see, it's easy to load content from remote data sources into RethinkDB. You can even use the query language to perform relatively sophisticated data transformations on the fetched data before inserting it into a table.

Perform geospatial queries

The next step is to create an index on the geometry property. Use the indexCreate command with the geo option to create an index that supports geospatial queries:

r.table("quakes").indexCreate("geometry", {geo: true})

Now that there is an index, try querying the data. For the first query, try fetching a list of all the earthquakes that took place within 200 miles of Tokyo:

r.table('quakes').getIntersecting(
  r.circle([139.69, 35.68], 200,
    {unit: "mi"}), {index: "geometry"})

In the example above, the getIntersecting command will find all of the records in the quakes table that have a geographic object stored in the geometry property that intersects with the specified circle. The r.circle command creates a polygon that approximates a circle with the desired radius and center point. The unit option tells the r.circle command to use a particular unit of measurement (miles, in this case) to compute the radius. The coordinates used in the above example correspond with the latitude and longitude of Tokyo.

Let's say that you wanted to get the largest earthquake for each individual day. To organize the earthquakes by day, use the group command on the date. To get the largest from each day, you can chain the max command and have it operate on the magnitude property.

r.table("quakes").group(r.epochTime(
    r.row("properties")("time").div(1000)).date())
  .max(r.row("properties")("mag"))

The USGS data uses timestamps that are counted in milliseconds since the UNIX epoch. In the query above, div(1000) is used to normalize the value so that it can be interpreted by the r.epochTime command. It's also worth noting that commands chained after a group operation will automatically be performed on the contents of each individual group.

Build a simple API backend

The earthquake map application has a simple backend built with node.js and Express. It implements several API endpoints that client applications can access to fetch data. Create a /quakes endpoint, which returns a list of earthquakes ordered by magnitude:

var r = require("rethinkdb");
var express = require("express");

var app = express();
app.use(express.static(__dirname + "/public"));

var configDatabase = {
  db: "quake",
  host: "localhost",
  port: 28015
}

app.get("/quakes", function(req, res) {
  r.connect(configDatabase).then(function(conn) {
    this.conn = conn;

    return r.table("quakes").orderBy(
      r.desc(r.row("properties")("mag"))).run(conn);
  })
  .then(function(cursor) { return cursor.toArray(); })
  .then(function(result) { res.json(result); })
  .finally(function() {
    if (this.conn)
      this.conn.close();
  });
});

app.listen(8081);

Add an endpoint called /nearest, which will take latitude and longitude values passed as URL query parameters and return the earthquake that is closest to the provided coordinates:

app.get("/nearest", function(req, res) {
  var latitude = req.param("latitude");
  var longitude = req.param("longitude");

  if (!latitude || !longitude)
    return res.json({err: "Invalid Point"});

  r.connect(configDatabase).then(function(conn) {
    this.conn = conn;

    return r.table("quakes").getNearest(
      r.point(parseFloat(longitude), parseFloat(latitude)),
      { index: "geometry", unit: "mi" }).run(conn);
  })
  .then(function(result) { res.json(result); })
  .finally(function(result) {
    if (this.conn)
      this.conn.close();
  });
});

The r.point command in the code above is given the latitude and longitude values that the user included in the URL query. Because URL query parameters are strings, you need to use the pareFloat function (or a plus sign prefix) to coerce them into numbers. The query is performed against the geometry index.

In addition to returning the closest item, the getNearest command also returns the distance. When using the unit option in the getNearest command, the distance is converted into the desired unit of measurement.

Build a frontend with AngularJS and leaflet

The earthquake application's frontend is built with AngularJS, a popular JavaScript MVC framework. The map is implemented with the Leaflet library and uses tiles provided by the OpenStreetMap project.

Using the AngularJS $http service, retrieve the JSON quake list from the node.js backend, create a map marker for each earthquake, and assign the array of earthquake objects to a variable in the current scope:

$scope.fetchQuakes = function() {
  $http.get("/quakes").success(function(quakes) {
    for (var i in quakes)
      quakes[i].marker = L.circleMarker(L.latLng(
        quakes[i].place.coordinates[1],
        quakes[i].place.coordinates[0]), {
        radius: quakes[i].properties.mag * 2,
        fillColor: "#616161", color: "#616161"
      });

    $scope.quakes = quakes;
  });
};

To display the points on the map, use Angular's $watchCollection to apply or remove markers as needed when a change is observed in the contents of the quakes array.

$scope.map = L.map("map").setView([0, 0], 2);
$scope.map.addLayer(L.tileLayer(mapTiles, {attribution: mapAttrib}));

$scope.$watchCollection("quakes",
  function(addItems, removeItems) {
    if (removeItems && removeItems.length)
      for (var i in removeItems)
        $scope.map.removeLayer(removeItems[i].marker);

    if (addItems && addItems.length)
      for (var i in addItems)
        $scope.map.addLayer(addItems[i].marker);
  }
);

You could just call $scope.map.addLayer in the fetchQuakes method to add markers directly as they are created, but using $watchCollection is more idiomatically appropriate for AngularJS---if the application adds or removes items from the array later, it will dynamically add or remove the corresponding place markers on the map.

The application also displays a sidebar with a list of earthquakes. Clicking on an item in the list will focus the associated point on the map. That part of the application was relatively straightforward, built with a simple ng-repeat that binds to the quakes array.

To complete the application, the last feature to add is support for plotting the user's own location on the map and indicating which earthquake in the list is the closest to their position.

The HTML5 Geolocation standard introduced a browser method called geolocation.getCurrentPosition that provides coordinates of the user's current location. In the callback for that method, assign the received coordinates to the userLocation variable in the current scope. Next, use the $http service to send the coordinates to the /nearest endpoint.

$scope.updateUserLocation = function() {
  navigator.geolocation.getCurrentPosition(function(position) {
    $scope.userLocation = position.coords;

    $http.get("/nearest", {params: position.coords})
      .success(function(output) {
        if (output.length)
          $scope.nearest = output[0].doc;
      });
  });
};

To display the user's position on the map, use $watch to observe for changes to the value of userLocation. When it changes, create a new place marker at the user's coordinates.

$scope.$watch("userLocation", function(newVal, oldVal) {
  if (!newVal) return;

  if ($scope.userMarker)
    $scope.map.removeLayer($scope.userMarker);

  var point = L.latLng(newVal.latitude, newVal.longitude);
  $scope.userMarker = L.marker(point, {
    icon: L.icon({iconUrl: "mark.png"})
  });

  $scope.map.addLayer($scope.userMarker);
});

Put a pin in it

To view the complete source code, you can check out the repository on GitHub. To try the example, run npm install in the root directory and then execute the application by running node app.js.

To learn more about using geospatial queries in RethinkDB, check out the documentation. Geospatial support is only one of the great new features introduced in RethinkDB 1.15. Be sure to read the release announcement to get the whole story.

RethinkDB 1.15: Geospatial queries

Today, we're happy to announce RethinkDB 1.15 (). Download it now!

The 1.15 release includes over 50 enhancements and introduces geospatial queries to RethinkDB. This has been by far the most requested feature by RethinkDB users. In addition, we've sped up many queries dramatically by lazily deserializing data from disk. This release also brings a new r.uuid command that allows server-side generation of UUIDs.

Thanks primarily to Daniel Mewes, RethinkDB now has rich geospatial features including:

  • r.geojson and r.to_geojson for importing and exporting GeoJSON
  • Commands to create points, lines, polygons and circles
  • Geospatial queries:
    • get_intersecting: finds all documents that intersect with a given geometric object
    • get_nearest: finds the closest documents to a point
  • Geospatial indexes to make get_intersecting and get_nearest blindingly fast
  • Functions that operate on geometry:
    • r.distance: gets the distance between a point and another geometric object
    • r.intersects: determines whether two geometric objects intersect
    • r.includes: tests whether one geometric object is completely contained in another
    • r.fill: converts a line into a polygon
    • r.polygon_sub: subtracts one polygon from another

If you're upgrading from version 1.12 or earlier, you will need to migrate your data one last time.

If you're coming from 1.13, you don't need to migrate your data but you may need to recreate your indexes.

Upgrading on Ubuntu? If you're upgrading from 1.12 or earlier, first set up the new RethinkDB PPA.

Using geospatial queries

Let's insert a couple of locations into RethinkDB:

> r.table('geo').insert([
  {
    'id': 1,
    'name': 'San Francisco',
    'location': r.point(-122.423246, 37.779388)
  },
  {
    'id': 2,
    'name': 'San Diego',
    'location': r.point(-117.220406, 32.719464)
  }
]).run(conn)

Throughout RethinkDB, all coordinates are entered as longitude/latitude to be consistent with GeoJSON.

In order for geospatial queries to return these points as results, we need to create a geospatial index:

> r.table('geo').createIndex('location', geo=True).run(conn)

Now, let's find which of these cities is nearest to a given point — for example, Santa Maria, CA:

> r.table('geo').get_nearest(
    r.point(-120.4333, 34.9514),  # Santa Maria's long/lat
    index='location',
    max_dist=300,
    unit='mi',
    max_results=1).run(conn)

[{"doc": {
    "id": 1,
    "name": "San Francisco",
    "location": {
      "$reql_type$": "GEOMETRY",
      "type": "Point",
      "coordinates": [-122.423246, 37.779388] }},
  "dist": 224.34241555826364 }]

We see that Santa Maria is about 224 miles from San Francisco. Note that RethinkDB returns the matched document, as well as the distance to the original point.

We can also find all geometric shapes that intersect with a polygon. This is useful when you're given a viewing window, and need to return all geometry that's inside the window:

def query_view_window(top, bottom, left, right):
    # top and bottom are latitudes, left and right are longitudes
    bounding_box = r.polygon(
        r.point(left, top),
        r.point(right, top),
        r.point(right, bottom),
        r.point(left, bottom))
    return r.table('geo').get_intersecting(bounding_box, index='location').run(conn)

Going further

For the full details, read the in-depth article on geospatial support by Watts Martin.

In addition, check out an example web application that uses RethinkDB to dynamically load street maps and points of interest.

Faster queries

Prior to the 1.15 release, every time a query touched a document RethinkDB would pull the entire document from disk and deserialize it into a full ReQL data structure in memory.

In RethinkDB 1.15, the database intelligently deserializes only portions of the document when they become necessary. If a field isn't required by the query, RethinkDB no longer spends time looking at it. This speeds up queries that only need part of a document, most notoriously count.

You should see performance increases for:

  • analytic queries which only need summary information
  • queries which don't touch every part of a document.

In our (unscientific) tests, we saw performance improvements of around 15% for simple read queries, ×2 for analytic queries, and ×50 for count queries on tables. We'll be publishing scientific benchmarks soon, but in the meantime, enjoy the better performance!

Server-side UUIDs

The new r.uuid command lets you generate server-side UUIDs wherever you like.

Let's say that when you create a new player they get a default item in their inventory. Additionally, each item needs a unique identifier:

> r.table("player").insert({
      "name": player_name,
      "inventory": [{
          "item_type": "potion",
          "item_id": r.uuid(),
      }]
  }, non_atomic=True, return_changes=True).run(conn)

This will return:

{ "inserted": 1,
  "changes": [
    {
      "new_val": {
        "id": "063ab596-543e-45a7-904f-c3fafa96bf42",
        "name": "for_my_friends",
        "inventory": [{
          "item_type": "potion",
          "item_id": "e985d732-c2ac-40a4-bf19-9b4946632859",
        }]
      },
      "old_val": null
    }
}

RethinkDB has always created a UUID automatically for the primary key if it isn't specified in the inserted document, but now we can generate UUIDs for embedded documents as well. You can get the generated keys by using return_changes=True.

Since UUID generation is random (and therefore can't be done atomically), you'll need to add the non_atomic=True flag to any update or insert that uses r.uuid.

Next steps

See the full list of enhancements, and take the new release for a spin!

The team is already hard at work on the upcoming 1.16 release that will focus on more flexible changefeeds. As always, if there is something you'd like us to prioritize or if you have any feedback on the release, please let us know!

Help work on the 1.16 release: RethinkDB is hiring.