<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: RethinkDB performance data.</title>
	<atom:link href="http://www.rethinkdb.com/blog/2009/08/rethinkdb-performance-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rethinkdb.com/blog/2009/08/rethinkdb-performance-data/</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Sun, 05 Sep 2010 13:51:34 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: cbemerine</title>
		<link>http://www.rethinkdb.com/blog/2009/08/rethinkdb-performance-data/comment-page-1/#comment-26</link>
		<dc:creator>cbemerine</dc:creator>
		<pubDate>Mon, 24 Aug 2009 21:25:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=43#comment-26</guid>
		<description>Impressive numbers that definitely look like you are on the right track.  Obviously the truth will reveal itself as you progress.  I will not be surprised to see them scale, very, very well. (Much to big irons chagrin)

Thank goodness Linux and Unix give us the monitoring tools to honestly look at every link in the chain and take appropriate action where action is needed.  

I like that you are using the &quot;right&quot; tools to overcome the bottlenecks (as they arise) and not limiting your success by attempting to force the use of this one toolkit or that one framework, or one language, etc...    Very smart.  If C is the solution, use it.  If C++ is the solution use it.  If PHP is the solution, use it, etc, etc, etc...  If Python presents a bottleneck, what might remove it.  KISS.  The right tool is whatever solves the problem, brilliant and comically simple.

I was remembering when a power user development machine was an 8086 without a hard drive and so little memory it was a joke, but a state of the art PC for the time.  Heck I remember when you could only transfer data in 2MB chunks via a 3289 adapter from an IBM PC (4 mhz) to the mainframe, where you had to dutifully and carefully re-combine the data before you could use a relational database to manipulate it.  And later when a power user development machine was a 286 processor (20 mhz or slower), 120MB or less harddrive, 16MB or less of RAM, yet it was a power user&#039;s development machine.  I am sure many of us remember those days.

Well thanks to those days, I have often wondered why more people did not divide and conquer with multiple boxes (simplifying the problem) rather than just jumping to &quot;big iron&quot;.  I believe many companies jump prematurely to big iron, artificially increasing their Total Cost of Ownership (TCO) when a simpler (KISS) solution is available to them. 

Some might say adding boxes to hold different pieces of the databases (divided in an intelligent meaningful way given what the data is used for) adds complexity, I would suggest that it increases the robustness of the solution.  Just as a RAID system is better than a single hard drive.  Just makes sense. 

Heck in the mid 80s I had students working for me that were using 100 networked PCs to battle a Cray computer (or whatever students had at their university and could bet access too) playing Othello.  Later it was used to do rapid image recognition.  They used a Master / Slave setup with min/max algorithms that allowed the processing to finish within an allotted time limit.  A IBM 50Z could do 7X the work of an IBM PC(8088), so 7X the work was sent to the 286 IBM 50Z.  (Whille the Cray beat them in one round, they later found a problem in their algorithm that they believed might have allowed them to beat the Cray, sadly we will never know.)

Why not take a quad processor and dedicate each of those processors for specific purposes.  Couldn&#039;t two of those processors (actually probably 1 of the 4) process graphics fast enough to make GPUs on an adapter or even on the motherboard antiquated? After all the bus between the processors and the adapter cards is a potential source of bottlenecks.  Add in four 1000 (or faster) Network Interrface Cards (NIC) and you increase the number of access to the points for the portion of the database sitting on that machine in its memory. 

Put multiple proxies in front of those machines, it seems like you could design a system that would be all but impervious to DDOS attacks.  A solution with more boxes, not fewer seems like it would be more robust against that type of attack.

What you guys are doing is exciting and seems logical.  I look forward to seeing, reading and eventually implementing your success. I predict you will be very successful.</description>
		<content:encoded><![CDATA[<p>Impressive numbers that definitely look like you are on the right track.  Obviously the truth will reveal itself as you progress.  I will not be surprised to see them scale, very, very well. (Much to big irons chagrin)</p>
<p>Thank goodness Linux and Unix give us the monitoring tools to honestly look at every link in the chain and take appropriate action where action is needed.  </p>
<p>I like that you are using the &#8220;right&#8221; tools to overcome the bottlenecks (as they arise) and not limiting your success by attempting to force the use of this one toolkit or that one framework, or one language, etc&#8230;    Very smart.  If C is the solution, use it.  If C++ is the solution use it.  If PHP is the solution, use it, etc, etc, etc&#8230;  If Python presents a bottleneck, what might remove it.  KISS.  The right tool is whatever solves the problem, brilliant and comically simple.</p>
<p>I was remembering when a power user development machine was an 8086 without a hard drive and so little memory it was a joke, but a state of the art PC for the time.  Heck I remember when you could only transfer data in 2MB chunks via a 3289 adapter from an IBM PC (4 mhz) to the mainframe, where you had to dutifully and carefully re-combine the data before you could use a relational database to manipulate it.  And later when a power user development machine was a 286 processor (20 mhz or slower), 120MB or less harddrive, 16MB or less of RAM, yet it was a power user&#8217;s development machine.  I am sure many of us remember those days.</p>
<p>Well thanks to those days, I have often wondered why more people did not divide and conquer with multiple boxes (simplifying the problem) rather than just jumping to &#8220;big iron&#8221;.  I believe many companies jump prematurely to big iron, artificially increasing their Total Cost of Ownership (TCO) when a simpler (KISS) solution is available to them. </p>
<p>Some might say adding boxes to hold different pieces of the databases (divided in an intelligent meaningful way given what the data is used for) adds complexity, I would suggest that it increases the robustness of the solution.  Just as a RAID system is better than a single hard drive.  Just makes sense. </p>
<p>Heck in the mid 80s I had students working for me that were using 100 networked PCs to battle a Cray computer (or whatever students had at their university and could bet access too) playing Othello.  Later it was used to do rapid image recognition.  They used a Master / Slave setup with min/max algorithms that allowed the processing to finish within an allotted time limit.  A IBM 50Z could do 7X the work of an IBM PC(8088), so 7X the work was sent to the 286 IBM 50Z.  (Whille the Cray beat them in one round, they later found a problem in their algorithm that they believed might have allowed them to beat the Cray, sadly we will never know.)</p>
<p>Why not take a quad processor and dedicate each of those processors for specific purposes.  Couldn&#8217;t two of those processors (actually probably 1 of the 4) process graphics fast enough to make GPUs on an adapter or even on the motherboard antiquated? After all the bus between the processors and the adapter cards is a potential source of bottlenecks.  Add in four 1000 (or faster) Network Interrface Cards (NIC) and you increase the number of access to the points for the portion of the database sitting on that machine in its memory. </p>
<p>Put multiple proxies in front of those machines, it seems like you could design a system that would be all but impervious to DDOS attacks.  A solution with more boxes, not fewer seems like it would be more robust against that type of attack.</p>
<p>What you guys are doing is exciting and seems logical.  I look forward to seeing, reading and eventually implementing your success. I predict you will be very successful.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin Burton</title>
		<link>http://www.rethinkdb.com/blog/2009/08/rethinkdb-performance-data/comment-page-1/#comment-22</link>
		<dc:creator>Kevin Burton</dc:creator>
		<pubDate>Thu, 13 Aug 2009 08:14:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=43#comment-22</guid>
		<description>I would seriously look at using an Intel SSD .... all these other SSD drives are just really toys.</description>
		<content:encoded><![CDATA[<p>I would seriously look at using an Intel SSD &#8230;. all these other SSD drives are just really toys.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Leif</title>
		<link>http://www.rethinkdb.com/blog/2009/08/rethinkdb-performance-data/comment-page-1/#comment-21</link>
		<dc:creator>Leif</dc:creator>
		<pubDate>Wed, 12 Aug 2009 22:01:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=43#comment-21</guid>
		<description>&lt;strong&gt;Juan:&lt;/strong&gt; We need to reduce network latency in the select timing.  We&#039;ve got ideas for how to do this, and will be writing about that soon.

&lt;strong&gt;Michael:&lt;/strong&gt; Watch for our next post.</description>
		<content:encoded><![CDATA[<p><strong>Juan:</strong> We need to reduce network latency in the select timing.  We&#8217;ve got ideas for how to do this, and will be writing about that soon.</p>
<p><strong>Michael:</strong> Watch for our next post.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael</title>
		<link>http://www.rethinkdb.com/blog/2009/08/rethinkdb-performance-data/comment-page-1/#comment-20</link>
		<dc:creator>Michael</dc:creator>
		<pubDate>Wed, 12 Aug 2009 21:19:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=43#comment-20</guid>
		<description>Do you have select throughput results, with writers, by any chance?</description>
		<content:encoded><![CDATA[<p>Do you have select throughput results, with writers, by any chance?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sam Kerr</title>
		<link>http://www.rethinkdb.com/blog/2009/08/rethinkdb-performance-data/comment-page-1/#comment-19</link>
		<dc:creator>Sam Kerr</dc:creator>
		<pubDate>Wed, 12 Aug 2009 20:57:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=43#comment-19</guid>
		<description>Very impressive numbers guys! I think you have a real winner on your hands!

Looking forward to your next development!</description>
		<content:encoded><![CDATA[<p>Very impressive numbers guys! I think you have a real winner on your hands!</p>
<p>Looking forward to your next development!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Juan Sequeda</title>
		<link>http://www.rethinkdb.com/blog/2009/08/rethinkdb-performance-data/comment-page-1/#comment-18</link>
		<dc:creator>Juan Sequeda</dc:creator>
		<pubDate>Wed, 12 Aug 2009 20:18:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=43#comment-18</guid>
		<description>Good to see numbers! Was hoping to see better results for select. What are you guys planning to do?</description>
		<content:encoded><![CDATA[<p>Good to see numbers! Was hoping to see better results for select. What are you guys planning to do?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Leif</title>
		<link>http://www.rethinkdb.com/blog/2009/08/rethinkdb-performance-data/comment-page-1/#comment-17</link>
		<dc:creator>Leif</dc:creator>
		<pubDate>Wed, 12 Aug 2009 18:50:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=43#comment-17</guid>
		<description>&lt;strong&gt;Andy:&lt;/strong&gt; We&#039;ve been using our own benchmarking tool for a while, because we engineered it to record system information as well (%cpu, page cache info, block transfers, etc.).  We noticed that it was taking too long to compute queries before executing them, so we decided to write a new low-overhead version.  This is just for internal metrics, so we can target our optimizations.

These graphs are essentially just snapshots of what we&#039;re doing internally, not metrics we&#039;d use to convince anybody to switch, and they&#039;re just meant to give a general idea of how we&#039;re doing.

We&#039;re eager to run &quot;real&quot; benchmarks, like TPC, as you suggested, but that isn&#039;t necessary until we&#039;re ready to offer a production package.  For now, these benchmarks tell us where to focus, and that&#039;s enough.

&lt;strong&gt;Sam:&lt;/strong&gt; The MySQL API is C++, so we&#039;re using that, although most of our engine code is closer to straight C.</description>
		<content:encoded><![CDATA[<p><strong>Andy:</strong> We&#8217;ve been using our own benchmarking tool for a while, because we engineered it to record system information as well (%cpu, page cache info, block transfers, etc.).  We noticed that it was taking too long to compute queries before executing them, so we decided to write a new low-overhead version.  This is just for internal metrics, so we can target our optimizations.</p>
<p>These graphs are essentially just snapshots of what we&#8217;re doing internally, not metrics we&#8217;d use to convince anybody to switch, and they&#8217;re just meant to give a general idea of how we&#8217;re doing.</p>
<p>We&#8217;re eager to run &#8220;real&#8221; benchmarks, like TPC, as you suggested, but that isn&#8217;t necessary until we&#8217;re ready to offer a production package.  For now, these benchmarks tell us where to focus, and that&#8217;s enough.</p>
<p><strong>Sam:</strong> The MySQL API is C++, so we&#8217;re using that, although most of our engine code is closer to straight C.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sam</title>
		<link>http://www.rethinkdb.com/blog/2009/08/rethinkdb-performance-data/comment-page-1/#comment-16</link>
		<dc:creator>Sam</dc:creator>
		<pubDate>Wed, 12 Aug 2009 18:48:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=43#comment-16</guid>
		<description>Very interesting. What language did you use to implement RethinkDB? C? C++? Erlang?</description>
		<content:encoded><![CDATA[<p>Very interesting. What language did you use to implement RethinkDB? C? C++? Erlang?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andy</title>
		<link>http://www.rethinkdb.com/blog/2009/08/rethinkdb-performance-data/comment-page-1/#comment-15</link>
		<dc:creator>Andy</dc:creator>
		<pubDate>Wed, 12 Aug 2009 18:14:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=43#comment-15</guid>
		<description>A couple of comments:

-rather than writing your own benchmark program, it&#039;d be more informative to use one of the many standard DB benchmark tools available such as dbSTRESS or tpcc-mysql

-benchmarking with 1 insert thread and 2 select threads isn&#039;t very useful if you&#039;re targeting the web app market (which you seem to be since you mentioned you spent most of your time testing RethinkDB with Wordpress). A web app with only 3 concurrent users has minimal performance demand anyway. A far more realistic use case would be benchmarking with a high number of concurrent users (32, 64, 256...) each doing select or insert or update.

Basically something like this: http://dimitrik.free.fr/blog/archives/08-01-2009_08-31-2009.html</description>
		<content:encoded><![CDATA[<p>A couple of comments:</p>
<p>-rather than writing your own benchmark program, it&#8217;d be more informative to use one of the many standard DB benchmark tools available such as dbSTRESS or tpcc-mysql</p>
<p>-benchmarking with 1 insert thread and 2 select threads isn&#8217;t very useful if you&#8217;re targeting the web app market (which you seem to be since you mentioned you spent most of your time testing RethinkDB with Wordpress). A web app with only 3 concurrent users has minimal performance demand anyway. A far more realistic use case would be benchmarking with a high number of concurrent users (32, 64, 256&#8230;) each doing select or insert or update.</p>
<p>Basically something like this: <a href="http://dimitrik.free.fr/blog/archives/08-01-2009_08-31-2009.html" rel="nofollow">http://dimitrik.free.fr/blog/archives/08-01-2009_08-31-2009.html</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>
