RethinkDB 1.12: simplified map/reduce, ARM port, new caching infrastructure

Today, we’re happy to announce RethinkDB 1.12 (The Wizard of Oz). Download it now!

With over 200 enhancements, the 1.12 release is one of the biggest releases to date. This release includes:

  • Dramatically simplified map/reduce and aggregation commands.
  • Big improvements to caching that do away with long-standing stability and performance limitations.
  • A port to the ARM architecture.
  • Four new ReQL commands for object and string manipulation.
  • Dozens of bug fixes, stability enhancements, and performance improvements.

Upgrading to RethinkDB 1.12? Make sure to migrate your data before upgrading to RethinkDB 1.12.

Please note a breaking change: the 1.12 release replaces the commands group_by and grouped_map_reduce with a single new command group. You will have to adapt your applications to this change when you upgrade. See the 1.12 migration guide for details.

Simplified map/reduce and aggregation

Let’s say you have a table plays where you keep track of gameplay outcomes for users of your game:

[{ play_id: 1, player: 'coffeemug', score: 100 },
 { play_id: 2, player: 'mlucy', score: 1000 },
 { play_id: 3, player: 'mlucy', score: 1200 },
 { play_id: 4, player: 'coffeemug', score: 200 }]

In RethinkDB, you could always count the number of games in the table by running a count command:

> r.table('plays').count().run(conn)
4

The built-in count command is a shortcut for a map/reduce query:

> r.table('plays').map(lambda x: 1).reduce(lambda x, y: x + y).run(conn)
4

The new release removes the old group_by and grouped_map_reduce commands, and replaces them with a single, much more powerful new command called group. This command breaks up a sequence of documents into groups. Any commands chained after group are called on each group individually, rather than all the documents in the sequence.

Let’s say we want to count the number of games for each player:

> r.table('plays').group('player').count().run(conn)
{ 'mlucy': 2, 'coffeemug': 2 }

Of course instead of using the shortcut, you could write out the full map/reduce query with the group command:

> r.table('plays').group('player').map(lambda x: 1).reduce(lambda x, y: x + y).run(conn)
{ 'mlucy': 2, 'coffeemug': 2 }

In addition to the already available aggregators like count, sum, and avg, the 1.12 release adds new aggregators min and max. You can now run all five aggregators on any sequence of documents or on groups, resulting in a unified, powerful API for data aggregation.

Chaining after group isn’t limited to built-in aggregators. We can chain any command, or series of commands after the group command. For example, let’s try to get a random sample of two games from each player:

> r.table('plays').group('player').sample(2).run(conn)

These examples only scratch the surface of what’s possible with group. Read more about the group command]group-api and the new map/reduce infrastructure.

Big improvements to caching

1.12 includes a lot of improvements to the caching infrastructure. The biggest user-facing change is that you no longer have to manually specify cache sizes for tables to prevent running over memory and into swap. Instead RethinkDB will adjust cache sizes for you on the fly, based on usage statistics for different tables and the amount of memory available on your system.

We’ve also made a lot of changes under the hood to help with various stability problems users have been reporting. Strenuous workloads and exotic cluster configurations are much less likely to cause stability problems in 1.12.

A port to ARM

Four months ago David Thomas (@davidthomas426 on GitHub) contributed a pull request with the changes necessary to compile and run RethinkDB on ARM. After months of testing and various additional fixes, the ARM port has been merged into RethinkDB mainline.

You shouldn’t have to do anything special. Just run ./configure and make as you normally would:

$ ./configure --allow-fetch
$ make

Note that ARM support is experimental, and there are still some issues (such as #239) to work out.

Special thanks to David for the port, and to the many folks who did the testing that made the merge possible!

Object and string manipulation commands

The 1.12 release includes new commands for string manipulation and object creation.

Firstly, ReQL now includes commands for changing the case of strings:

> r.expr('Hello World').downcase().run(conn)
'hello world'

> r.expr('Hello World').upcase().run(conn)
'HELLO WORLD'

We also added a split command for breaking up strings, which behaves similarly to the native Python split:

> r.expr('Hello World').split().run(conn)
['Hello', 'World']

> r.expr('Hello, World').split(',').run(conn)
['Hello', ' World']

Finally, the 1.12 release includes an object command that allows programmatically creating JSON objects from key-value pairs:

> r.object('a', 1, 'b', 2).run(conn)
{ 'a': 1, 'b': 2 }

You can learn more about the commands in the API documentation.

Performance and stability improvements

In addition to stability work by almost everyone on the RethinkDB team, for the past four months @danielmewes dedicated his time almost entirely to stability and performance improvements. He uncovered and fixed dozens of latency and memory problems, stability issues with long running clusters, and slowdowns during highly concurrent workloads.

Here is a very small sample of the stability fixes that ship with the 1.12 release:

See the full list of enhancements, and take the new release for a spin!

Help work on the 1.13 release: RethinkDB is hiring.