RethinkDB performance data.

It’s been a busy and exciting week since we announced RethinkDB. Of all the feedback we received, the most common request was for performance numbers. Before the launch our top priority was correctness. We spent most of our time testing RethinkDB with Wordpress and adding the missing features. As a result, performance suffered. In the past week we tuned the engine back up to high performance. We’re still far from finished with the improvements we want to make, but we feel that we’ve reached a level of performance we can be proud to display.

We wrote our original benchmarking tool in Python, but during our latest benchmarks, we noticed that it was taking about as much time as the engine itself, hiding our real performance numbers. We now have a very small Objective-C program (<900 lines) that uses prepared statements in a tight loop, and times only across the mysql_stmt_execute() call.  For inserts, the benchmark creates a table with three INT columns, two being indexed, and performs N random (non-duplicate) INSERTs (k,k,k) in a loop. For selects, it performs N random indexed point queries. An optional number of SELECT threads run as well, each thread doing repeated indexed point queries throughout execution of the main timed thread.

The benchmarks were run on a 2.5 GHz Pentium Core 2 Duo machine with 2 GB RAM, on a 16 GB SUPER TALENT MasterDrive, an MLC solid state drive, connected via a 3 GB SATA II bus. RethinkDB and MyISAM were run with the stock config options.  We ran the InnoDB test by starting the server with --innodb_flush_log_at_trx_commit=0 --innodb_support_xa=0 --innodb_buffer_pool_size=1536M.

Here are the results:

Insert benchmark with no readers.

For insert performance, RethinkDB maintains a 10x improvement in throughput over MyISAM, with an average of 24534.597 rows/sec up to 2,000,000 rows, while InnoDB handles 8527.424 rows/sec, and MyISAM manages only 2483.277 rows/sec. With more frequent measurements, we can see that InnoDB and MyISAM maintain generally high throughput, but pause periodically for long stretches of time. We believe that this is due to their B-tree structure, which need to expand once in a while, a time-consuming operation that greatly undermines their overall performance.

The threaded benchmark is a bit different:

Multi-threaded insert benchmark.

We’ve also benchmarked selects with no writers:

Single-threaded select graph.

RethinkDB’s select performance is on par with MyISAM and InnoDB for threaded and non-threaded benchmarks. The performance bottleneck for short selects is in the network stack, and while we have plans to tackle this problem, we won’t get to it for a while. However, our algorithms significantly improve RethinkDB performance on long selects and joins — we will write a blog post soon with more detailed results.

As always, comments and concerns are welcome, on our blog, twitter feed, or at info@rethinkdb.com.

9 Responses to “RethinkDB performance data.”

  1. A couple of comments:

    -rather than writing your own benchmark program, it’d be more informative to use one of the many standard DB benchmark tools available such as dbSTRESS or tpcc-mysql

    -benchmarking with 1 insert thread and 2 select threads isn’t very useful if you’re targeting the web app market (which you seem to be since you mentioned you spent most of your time testing RethinkDB with Wordpress). A web app with only 3 concurrent users has minimal performance demand anyway. A far more realistic use case would be benchmarking with a high number of concurrent users (32, 64, 256…) each doing select or insert or update.

    Basically something like this: http://dimitrik.free.fr/blog/archives/08-01-2009_08-31-2009.html

  2. Very interesting. What language did you use to implement RethinkDB? C? C++? Erlang?

  3. Andy: We’ve been using our own benchmarking tool for a while, because we engineered it to record system information as well (%cpu, page cache info, block transfers, etc.). We noticed that it was taking too long to compute queries before executing them, so we decided to write a new low-overhead version. This is just for internal metrics, so we can target our optimizations.

    These graphs are essentially just snapshots of what we’re doing internally, not metrics we’d use to convince anybody to switch, and they’re just meant to give a general idea of how we’re doing.

    We’re eager to run “real” benchmarks, like TPC, as you suggested, but that isn’t necessary until we’re ready to offer a production package. For now, these benchmarks tell us where to focus, and that’s enough.

    Sam: The MySQL API is C++, so we’re using that, although most of our engine code is closer to straight C.

  4. Good to see numbers! Was hoping to see better results for select. What are you guys planning to do?

  5. Very impressive numbers guys! I think you have a real winner on your hands!

    Looking forward to your next development!

  6. Do you have select throughput results, with writers, by any chance?

  7. Juan: We need to reduce network latency in the select timing. We’ve got ideas for how to do this, and will be writing about that soon.

    Michael: Watch for our next post.

  8. I would seriously look at using an Intel SSD …. all these other SSD drives are just really toys.

  9. Impressive numbers that definitely look like you are on the right track. Obviously the truth will reveal itself as you progress. I will not be surprised to see them scale, very, very well. (Much to big irons chagrin)

    Thank goodness Linux and Unix give us the monitoring tools to honestly look at every link in the chain and take appropriate action where action is needed.

    I like that you are using the “right” tools to overcome the bottlenecks (as they arise) and not limiting your success by attempting to force the use of this one toolkit or that one framework, or one language, etc… Very smart. If C is the solution, use it. If C++ is the solution use it. If PHP is the solution, use it, etc, etc, etc… If Python presents a bottleneck, what might remove it. KISS. The right tool is whatever solves the problem, brilliant and comically simple.

    I was remembering when a power user development machine was an 8086 without a hard drive and so little memory it was a joke, but a state of the art PC for the time. Heck I remember when you could only transfer data in 2MB chunks via a 3289 adapter from an IBM PC (4 mhz) to the mainframe, where you had to dutifully and carefully re-combine the data before you could use a relational database to manipulate it. And later when a power user development machine was a 286 processor (20 mhz or slower), 120MB or less harddrive, 16MB or less of RAM, yet it was a power user’s development machine. I am sure many of us remember those days.

    Well thanks to those days, I have often wondered why more people did not divide and conquer with multiple boxes (simplifying the problem) rather than just jumping to “big iron”. I believe many companies jump prematurely to big iron, artificially increasing their Total Cost of Ownership (TCO) when a simpler (KISS) solution is available to them.

    Some might say adding boxes to hold different pieces of the databases (divided in an intelligent meaningful way given what the data is used for) adds complexity, I would suggest that it increases the robustness of the solution. Just as a RAID system is better than a single hard drive. Just makes sense.

    Heck in the mid 80s I had students working for me that were using 100 networked PCs to battle a Cray computer (or whatever students had at their university and could bet access too) playing Othello. Later it was used to do rapid image recognition. They used a Master / Slave setup with min/max algorithms that allowed the processing to finish within an allotted time limit. A IBM 50Z could do 7X the work of an IBM PC(8088), so 7X the work was sent to the 286 IBM 50Z. (Whille the Cray beat them in one round, they later found a problem in their algorithm that they believed might have allowed them to beat the Cray, sadly we will never know.)

    Why not take a quad processor and dedicate each of those processors for specific purposes. Couldn’t two of those processors (actually probably 1 of the 4) process graphics fast enough to make GPUs on an adapter or even on the motherboard antiquated? After all the bus between the processors and the adapter cards is a potential source of bottlenecks. Add in four 1000 (or faster) Network Interrface Cards (NIC) and you increase the number of access to the points for the portion of the database sitting on that machine in its memory.

    Put multiple proxies in front of those machines, it seems like you could design a system that would be all but impervious to DDOS attacks. A solution with more boxes, not fewer seems like it would be more robust against that type of attack.

    What you guys are doing is exciting and seems logical. I look forward to seeing, reading and eventually implementing your success. I predict you will be very successful.

Leave a Reply