Intro FAQ | Architecture FAQ | Pragmatic FAQ
Intro FAQ
Want to learn more about the internals? Read the architecture FAQ for programmers familiar with distributed systems. Or jump into the pragmatic FAQ if you need to do something specific with RethinkDB.
RethinkDB overview
What languages can I use to work with RethinkDB?
You can use Ruby, Python, and Javascript (via the browser or via Node.js) to write RethinkDB queries. In addition, all RethinkDB queries can be freely intermixed with Javascript code because the server supports native Javascript execution using the V8 engine.
What are the system requirements?
RethinkDB server is written in C++ and currently runs on 32-bit and 64-bit Linux systems, as well as OS X 10.7 and above. Ruby, Python, and Javascript client drivers can run on any platform where these languages are supported.
RethinkDB doesn't have other strict requirements. It has a custom caching engine and can run on low-memory nodes with large amounts of on-disk data, Amazon EC2 instances, etc. It also has specialized support for high-end hardware and does a great job on high-memory nodes with many cores, solid-state storage, and high-throughput network hardware.
Does RethinkDB support SQL?
No, but RethinkDB supports a very powerful, expressive, and easy to learn query language that can do almost anything SQL can do (and many things SQL can't do, such as mixing queries with Javascript expressions and Hadoop-style map/reduce).
How is RethinkDB licensed?
RethinkDB server is licensed under GNU AGPL v3.0. The client drivers are licensed under Apache License v2.0.
We wanted to pick a license that balances the interests of three parties — our end users, our company, and the software development community at large. When picking a license, we decided that these interests can be expressed via three simple goals:
- Allow anyone to download RethinkDB, examine the source code, and use it for free (as in speech and beer) for any purpose.
- Require users that choose to modify RethinkDB to fit their needs to release the patches to the software development community.
- Require users that are unwilling to release the patches to the software development community to purchase a commercial license.
Given that an enormous number of software is offered as a service via the network and isn't actually distributed in binary form, the most effective license to fulfill all three goals is GNU AGPL.
We chose to release the client drivers under Apache License v2.0 to remove any ambiguity as to the extent of the server license — you do not have to license any software that uses RethinkDB under AGPL and are completely free to use any licensing mechanism of your choice.
Advantages and disadvantages
What are some use cases where RethinkDB shines?
RethinkDB is a great choice if you need flexible schemas, value ease of use and data consistency, and are planning to run anywhere from a single node to sixteen node clusters.
If you find yourself writing a lot of client-side code to do data manipulation, you'll probably find RethinkDB a joy to use because its query language can often cut the amount of code you write by an order of magnitude.
If you periodically copy your data into a separate system to do analytics (such as Hadoop for map/reduce jobs), but your analytics aren't incredibly computationally intensive, you can significantly simplify things by running your analytical queries in RethinkDB directly.
Finally, if you're already running a database cluster and are inundated by cluster administration and the complexities of sharding, replication, and failover, you'll likely find RethinkDB will make your life a whole lot easier.
When is RethinkDB not a good choice?
RethinkDB isn't a good choice if you need full ACID support or strong schema enforcement -- in this case you're better off using a relational database such as MySQL. If you're doing deep, computation-intensive analytics you're better off using a system like Hadoop or a column-oriented store like Vertica.
In some cases RethinkDB trades off write availability in favor of data consistency and ease of use, so if you absolutely need high write availability and don't mind dealing with conflicts, you're better off with a Dynamo-style system like Riak.
Finally, while RethinkDB has been in development for three years, it is a very new piece of software by database standards, so if stability is your primary concern, you're better off going with a more mature system.
How is RethinkDB different from other NoSQL databases?
Check out the point-by-point technical overview on how RethinkDB compares to MongoDB, and see our personal take on what makes RethinkDB different from other NoSQL systems.
Sharding and replication
How do I add a node to a RethinkDB cluster?
Adding a node to a RethinkDB cluster is as easy as starting a new RethinkDB process and pointing it to an existing node in the cluster. Everything else is handled by the system without any additional effort required from the user.
How do I shard and replicate data in RethinkDB?
RethinkDB supports per-table sharding and replication settings. Sharding a table is as easy as typing the number of shards you'd like in the web admin and clicking 'Rebalance'. Similarly, to change the number of replicas you just set the number in the web UI and hit 'Save'. RethinkDB rebalances the table, creates copies in the cluster, and moves necessary data around behind the scenes without any additional work from the user. Of course, sharding and replication settings can also be controlled from a powerful command line UI.
Running queries
How do queries get routed in a RethinkDB cluster?
You can connect your clients to any node in the cluster, and all the queries will automatically be routed to their destination. Advanced queries (such as joins, filters, etc.) will be broken up and routed to the appropriate machines, executed in parallel, the resultset will be recombined, and streamed back to the client. The user never has to worry about sending queries to specific nodes— everything happens automatically behind the scenes.
How can I understand the performance of slow queries?
Understanding query performance currently requires a pretty deep understanding of the system. For the moment, the easiest way to get an idea of why your query isn't performing well is to ask us.
How does RethinkDB handle write durability?
RethinkDB comes with strict write durability out of the box and is identical to traditional database systems in this respect. No write is ever acknowledged until it's safely committed to disk.
What's next?
Want to learn more about the internals? Read the architecture FAQ for programmers familiar with distributed systems.