RethinkDB 1.11: query profiler, new streaming algorithm, devops enhancements

Today, we’re happy to announce RethinkDB 1.11 (Breakfast at Tiffany’s), which improves the experience of operating live RethinkDB deployments. Download it now!

The 1.11 release features more than 70 enhancements, including:

  • A new query profiler to analyze the performance of ReQL queries.
  • An improved streaming algorithm that reduces query latency.
  • DevOps enhancements, including new ReQL commands designed for operations

Upgrading to RethinkDB 1.11? Make sure to migrate your data before upgrading to RethinkDB 1.11.

Query profiler

Prior to RethinkDB 1.11, there was no way to analyze the performance of queries. Optimizing queries was a process of trial and error. As of the 1.11 release, RethinkDB includes a developer preview of a query profiler that will make this process a lot easier.

You can enable the query profiler on a given query by passing the profile=True option to run, or by using the Profile tab in the Data Explorer.

r.table('foo').sample(1).run(profile=True)

When you run a query with profiling, we return the query’s result, along with a trace of its execution.

[
  {
    "description": "Evaluating sample.",
    "duration(ms)": 1.320703,
    "sub_tasks": [
      {
        "description": "Evaluating datum.",
        "duration(ms)": 0.001529,
        "sub_tasks": []
      },
      {
        "description": "Evaluating table.",
        "duration(ms)": 0.097089,
        "sub_tasks": [
            ...
        ]
      },
      {
        "description": "Sampling elements.",
        "mean_duration(ms)": 0.160003,
        "n_samples": 7
      }
    ]
  }
]

The trace includes a breakdown of operations performed on the cluster, the time for each operation, and information about which parts of the query were performed in parallel.

The query profiler is included in this release as a developer preview, so it is limited in scope. Coming releases will add important metrics like memory and disk usage, and will improve readability for complex ReQL commands.

Latency improvements

One of the goals of the 1.11 release was to reduce query latency. Much work has gone into identifying, understanding, and removing the sources of slowdowns. To better understand the behavior of the system under load we ran dozens of benchmarks, implemented a new coroutine profiler, and expanded backtraces to span over coroutine boundaries.

We were able to greatly improve the responsiveness of the server in many situations, such as while creating new indexes or during periods of high cluster traffic. Below are some of the important changes we made:

New streaming algorithm

Prior to version 1.11, RethinkDB used a fixed batch size for all types of documents. The batch size was fixed at 1MB for communication between the nodes in the cluster, and 1000 documents between the the server and the client. This implementation skewed the system towards high throughput at a significant cost to realtime latency.

In the 1.11 release, the batching is significantly improved. The new batching algorithm adjusts the size of each batch dependening on the document size and the query latency. In practice, this results in significantly speedups. Latency for queries returning very large documents is often reduced by more than 100x.

Improved write operations

In addition to the new streaming algorithm, RethinkDB 1.11 includes many other changes that improve performance for various workloads.

  • In previous versions, every write transaction caused at least three separate disk writes. For most transactions, we reduced the number of writes to two.
  • We added a new algorithm that merges certain separate writes (such as index writes) into a single operation. The algorithm is designed to improve the performance of RethinkDB on rotational drives, but it also improves latency and throughput on SSDs.
  • At the disk level, we introduced more parallelism that allows RethinkDB to read more data at once from the disk.
  • Clients can now request that rows be returned as JSON to bypass slow protobuf implementations (the official clients for Python, JavaScript, and Ruby now do this).

DevOps enhancements

One of the immediate goals for the development team is to improve the experience of running RethinkDB on live deployments. Our work toward this goal is guided by two simple principles:

  • The administrator should always be able to answer any question about the state of the cluster.
  • No single query or set of queries should be able to monopolize cluster resources.

We’ve added the following enhancements to the 1.11 release in pursuit of this goal.

Determining secondary index status

As of the 1.11 release, you no longer have to guess whether a newly created secondary index is ready for use. We added two new commands for observing the status of index creation: indexStatus and indexWait. As the names suggest, indexStatus allows determining the status of a newly created secondary index, and indexWait allows the client to wait until the secondary index is successfully created. See the API reference for more details.

Better control over soft durability

Prior to the 1.11 release users who chose to use soft durability (off by default, of course), had no way to ensure that the data they’d inserted in soft durability mode had been committed to disk. We’ve now added a new sync command for flushing soft durability writes to disk. Calling the sync command ensures that any data written in soft durability mode on a given table has been flushed to disk before the command returns. See the API reference for more details.

Getting to an LTS release

We’re still hard at work on our first LTS release, due in early 2014. In pursuit of that, our next few releases will continue to focus on performance and stability.

As previously mentioned, here is how the LTS release will be different from the beta releases we’ve been shipping until now:

  • It will go through a longer QA process that will include rigorous automated and manual testing
  • All known high impact bugs and performance issues will be solved
  • We’ll publish results of our tests on high demand, large scale workloads in a clustered environment
  • The LTS release will have an additional margin of safety since it will first be battle tested by our pilot customers
  • We will be offering commercial support, training, and consulting options

If you have feedback or questions about this process, we’d love to hear from you!

Help work on the 1.13 release: RethinkDB is hiring.