Technical comparison: RethinkDB and MongoDB
Interested in a more personal perspective? Read our take on what makes RethinkDB different.
This document is our attempt at an unbiased comparison between RethinkDB and MongoDB (for a more partisan view, take a look at the biased comparison written by @coffeemug). We tried to be spartan with our commentary to allow the reader to form their own opinion. Whenever possible, we provide links to the original documentation for further details.
The document is organized by four main categories:
|Platforms||Linux, OS X||Linux, Windows, OS X, Solaris|
|Data model||JSON documents||BSON documents|
|Data access||Unified chainable dynamic query language||Dynamic rich query language|
|Indexing||Multiple types of indexes (primary key, compound, secondary, arbitrarily computed)||Multiple types of indexes (unique, compound, secondary, sparse, geospatial)|
|Cloud deployment||AWS, dotCloud||MongoDB is available on many cloud platforms|
MongoDB has binary distributions for:
- Linux 32-bit and 64-bit
- Windows 32-bit and 64-bit
- OS X 64-bit
- Solaris 64-bit
RethinkDB has binary packages available for:
- Ubuntu 10.04 and higher 32-bit/64-bit
- OS X 64-bit (>= 10.7)
- CentOS 6, 32-bit/64-bit
MongoDB uses BSON for storing data. The BSON protocol, a custom extension of JSON, supports additional data types (e.g. ObjectId, timestamp, datetime, etc.) that are not part of the JSON specification.
RethinkDB stores JSON documents with a binary on disk serialization. The data types supported by JSON and implicitly RethinkDB are: number (double precision floating-point), string, boolean, array, object, null.
Accessing data in MongoDB can be done using:
- CRUD operations using BSON objects for inserting, bulk inserting, filtering, and updating documents
- aggregations using map/reduce or the aggregation framework (starting with ver.2.2)
RethinkDB provides a unified chainable query language supporting:
- CRUD operations
- aggregations (including map/reduce and the more advanced group/map/reduce)
- full sub-queries
MongoDB has 13 official and many community supported libraries. MongoDB's wire protocol is TCP based and uses BSON.
MongoDB supports unique, compound, secondary, sparse, and geospatial indexes. All MongoDB indexes use a B-tree data structure. Every MongoDB query, including update operations, uses one and only one index.
RethinkDB supports primary key, compound, secondary, and arbitrarily computed indexes stored as B-trees. Every RethinkDB query, including update operations, uses one and only one index.
MongoDB can be manually deployed on the majority of cloud platforms (AWS, Joyent, Rackspace, etc.). MongoDB hosting is also available from a wide range of providers either as a dedicated service (MongoHQ, MongoLab, etc.) or as an add-on on Platform-as-a-Service solutions (dotCloud, Heroku, etc.).
RethinkDB can be manually deployed on cloud platforms like AWS or as a custom service on dotCloud using rethinkdb-dotcloud.
|UI tools||Web-based admin UI||Simple HTTP interface|
|Failover||1-click replication with customizable per-table acknowledgements||Replica-sets with auto primary re-election|
RethinkDB has an administration CLI that can be attached to any node in the cluster and provides fine grained administrative control of the cluster resources. The command-line client offers integrated help and auto-completion.
RethinkDB has a web-based admin UI accessible on every node of a cluster that provides high level and guided support for operating the cluster. The admin UI also includes the Data Explorer for experimenting, tuning, and manipulating data.
The 3 main components of a MongoDB cluster (
mongod, and the 3
config servers) are highly
For servers storing data, MongoDB allows setting up replica sets with automatic
For failover, RethinkDB supports setting up 1-click replication with custom per-table acknowledgements. RethinkDB doesn't yet support primary auto re-elections.
MongoDB provides different mechanisms for backing up data:
mongodumputility can perform a live backup of data.
- disk/block level snapshots can be used to backup a MongoDB instance when journaling is enabled. When journaling is disabled, snapshots are possible after flushing all writes to disk and locking the database.
RethinkDB supports hot backup on a live cluster via
|Sharding||Guided range-based sharding
|Automatic range-based sharding|
|Replication||Sync and async replication||Replica-sets with log-shipping|
|Multi datacenter||Multiple DC support with per-datacenter replication and write acknowledgements||Supports different options for multi DC|
|MapReduce||Multiple MapReduce functions
|Performance||No published results||No official results|
|Concurrency||Event-based and coroutines
Asynchronous block-level MVCC
MongoDB supports automatic range-based sharding using a shard key. A sharded
MongoDB cluster requires 3 config servers and 1 or more
RethinkDB supports 1-click sharding from the admin UI. Sharding can be configured also from the CLI which also supports manual assignments of shards to specific machines. Rebalancing the shards can be done through the admin UI.
MongoDB replication is based on replica sets which use a master-slave log-shipping asynchronous approach. MongoDB replica sets are configured using the interactive shell.
RethinkDB allows setting up replication using the 1-click admin web UI or from the CLI. RethinkDB supports both sync and async replication by specifying the per-table number of write acknowledgements. RethinkDB replication is based on B-Tree diff algorithms and doesn't require log-shipping.
Multi Datacenter Support
MongoDB can be configured to run in multiple datacenters via different mechanisms:
- assigning priorities to members of replica-sets
- support for nearby replication
- tagging (version 2.0+)
RethinkDB supports grouping machines into datacenters with per datacenter replication and write acknowledgement settings through either the admin web UI or CLI. RethinkDB immediate consistency based reads and writes do not require a special protocol for multi DC replication.
tasks through the
mapReduce command or from the interactive shell. MongoDB MapReduce allows
pre-filtering and ordering the data for the map phase. It also allows storing
the results in a new collection. The various phases of the MongoDB MapReduce
executed in a single
RethinkDB supports MapReduce with the
reduce commands, as
well as grouped MapReduce with the
group command. MapReduce queries
are transparently and fully distributed. None of these operations
require any locks. RethinkDB MapReduce functions can be part of
chained queries, by preceding, following, or being sub-queries of
Neither MongoDB nor RethinkDB support incremental MapReduce by default.
MongoDB doesn't publish any official performance numbers.
RethinkDB's performance has degraded significantly after the addition of the clustering layer, but we hope we'll be able to restore it over the next several releases.
MongoDB uses locks at various levels for ensuring data consistency. In MongoDB, v2.2 writes and MapReduce require write locks at the database level. MongoDB uses threads for handling client connections.
RethinkDB implements block-level multiversion concurrency control . In case multiple writes are performed on documents that are close together in the B-Tree, RethinkDB does take exclusive block-level locks, but reads can still proceed.
|Consistency model||Immediate/strong consistency with support for out of date reads||Immediate/strong consistency with support for reading from replicas|
|Atomicity||Document level||Document level|
|Durability||Durable||Durable only with journaling enabled|
|Storage engine||Log-structured B-tree serialization
with incremental, fully concurrent garbage compactor
|Memory mapped files|
|Query distribution engine||Transparent routing, distributed and parallelized||Transparent routing requires additional
|Caching engine||Custom per-table configurable B-tree aware caching||OS-level memory mapped files LRU|
MongoDB has a strong consistency model where each document has a master server at a given point in time. Until recently MongoDB client libraries had by default a fire-and-forget behavior.
In RethinkDB data always remains immediately consistent and conflict-free, and a read that follows a write is always guaranteed to see the write. This is accomplished by always assigning every shard to a single authoritative master. All reads and writes to any key in a given shard always get routed to its respective master where they're ordered and evaluated.
Both MongoDB and RethinkDB allow out-of-date reads.
Note: While MongoDB and RethinkDB docs refer to their consistency models as strong and respectively immediate, we think the behavior of the two databases is equivalent.
MongoDB supports atomic document-level updates that add, remove, or set attributes to constant values.
RethinkDB comes with strict write durability out of the box inspired by BTRFS inline journal and is identical to traditional database systems in this respect. No write is ever acknowledged until it's safely committed to disk.
MongoDB uses memory mapped files where the OS is in control of flushing writes and paging data in and out.
In RethinkDB, data is organized into B-Trees and stored on disk using a log-structured storage engine built specifically for RethinkDB and inspired by the architecture of BTRFS. RethinkDB's engine includes an incremental, fully concurrent garbage compactor and offers full data consistency in case of failures.
Query distribution engine
MongoDB clients connect to a cluster through separate
mongos processes which
are responsible for transparently routing the queries within the cluster.
RethinkDB clients can connect to any node in the cluster and queries will be automatically routed internally. Both simple (such as filters, joins) and composed queries (chained operations) will be broken down, routed to the appropriate machines, and executed in parallel. The results will be recombined and streamed back to the client.
MongoDB's storage engine uses memory mapped files which also function as an OS-level LRU caching system. MongoDB can use all free memory on the server for cache space automatically without any configuration of a cache size.
RethinkDB implements a custom B-tree aware caching mechanism. The cache size can be configured on a per-machine basis.