<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Rethinking B-tree block sizes on SSDs</title>
	<atom:link href="http://www.rethinkdb.com/blog/2009/10/rethinking-b-tree-block-sizes-on-ssds/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rethinkdb.com/blog/2009/10/rethinking-b-tree-block-sizes-on-ssds/</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Sat, 06 Mar 2010 12:56:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Iosif</title>
		<link>http://www.rethinkdb.com/blog/2009/10/rethinking-b-tree-block-sizes-on-ssds/comment-page-1/#comment-5342</link>
		<dc:creator>Iosif</dc:creator>
		<pubDate>Sun, 11 Oct 2009 16:12:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=153#comment-5342</guid>
		<description>Hi. I&#039;m great follower of http://www.defmacro.org/ and just found this blog.

Maybe you are interested on this:
http://www.igvita.com/2009/02/13/tokyo-cabinet-beyond-key-value-store/

Bye,
Iosif.</description>
		<content:encoded><![CDATA[<p>Hi. I&#8217;m great follower of <a href="http://www.defmacro.org/" rel="nofollow">http://www.defmacro.org/</a> and just found this blog.</p>
<p>Maybe you are interested on this:<br />
<a href="http://www.igvita.com/2009/02/13/tokyo-cabinet-beyond-key-value-store/" rel="nofollow">http://www.igvita.com/2009/02/13/tokyo-cabinet-beyond-key-value-store/</a></p>
<p>Bye,<br />
Iosif.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daily Links #109 &#124; CloudKnow</title>
		<link>http://www.rethinkdb.com/blog/2009/10/rethinking-b-tree-block-sizes-on-ssds/comment-page-1/#comment-4832</link>
		<dc:creator>Daily Links #109 &#124; CloudKnow</dc:creator>
		<pubDate>Thu, 08 Oct 2009 16:40:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=153#comment-4832</guid>
		<description>[...] RethinkDB: Rethinking B-tree block sizes on SSDs [...]</description>
		<content:encoded><![CDATA[<p>[...] RethinkDB: Rethinking B-tree block sizes on SSDs [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: RethinkDB - The database for solid state drives.</title>
		<link>http://www.rethinkdb.com/blog/2009/10/rethinking-b-tree-block-sizes-on-ssds/comment-page-1/#comment-4784</link>
		<dc:creator>RethinkDB - The database for solid state drives.</dc:creator>
		<pubDate>Thu, 08 Oct 2009 11:44:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=153#comment-4784</guid>
		<description>[...] Rethinking B-tree block sizes on SSDs [...]</description>
		<content:encoded><![CDATA[<p>[...] Rethinking B-tree block sizes on SSDs [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Slava</title>
		<link>http://www.rethinkdb.com/blog/2009/10/rethinking-b-tree-block-sizes-on-ssds/comment-page-1/#comment-4744</link>
		<dc:creator>Slava</dc:creator>
		<pubDate>Thu, 08 Oct 2009 09:03:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=153#comment-4744</guid>
		<description>We did much more alignment related testing. I&#039;ll post the results soon, but in general they&#039;re consistent with these numbers. It might be that the SUPER TALENT MasterDrive OCX we&#039;re testing on behaves differently from other drives. We&#039;ll retest on Intel drives soon.</description>
		<content:encoded><![CDATA[<p>We did much more alignment related testing. I&#8217;ll post the results soon, but in general they&#8217;re consistent with these numbers. It might be that the SUPER TALENT MasterDrive OCX we&#8217;re testing on behaves differently from other drives. We&#8217;ll retest on Intel drives soon.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Colin Percival</title>
		<link>http://www.rethinkdb.com/blog/2009/10/rethinking-b-tree-block-sizes-on-ssds/comment-page-1/#comment-4572</link>
		<dc:creator>Colin Percival</dc:creator>
		<pubDate>Wed, 07 Oct 2009 05:43:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=153#comment-4572</guid>
		<description>Looking closely at those performance numbers, I think glugglug may be right -- either you&#039;re making unaligned requests, or the disk is optimized for unaligned requests.  I&#039;m getting a very strong fit to a model with 16 kiB internal blocks and a shuffling cost for misaligned reads.

Could you measure E(time to read 16 kiB from offset (16 kB) * A + B for random positive integer A) for B = 0, 0.5 kiB, 1 kiB, ... 15.5 kiB?  Mean and standard deviation would both be useful.</description>
		<content:encoded><![CDATA[<p>Looking closely at those performance numbers, I think glugglug may be right &#8212; either you&#8217;re making unaligned requests, or the disk is optimized for unaligned requests.  I&#8217;m getting a very strong fit to a model with 16 kiB internal blocks and a shuffling cost for misaligned reads.</p>
<p>Could you measure E(time to read 16 kiB from offset (16 kB) * A + B for random positive integer A) for B = 0, 0.5 kiB, 1 kiB, &#8230; 15.5 kiB?  Mean and standard deviation would both be useful.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Slava</title>
		<link>http://www.rethinkdb.com/blog/2009/10/rethinking-b-tree-block-sizes-on-ssds/comment-page-1/#comment-4522</link>
		<dc:creator>Slava</dc:creator>
		<pubDate>Tue, 06 Oct 2009 22:15:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=153#comment-4522</guid>
		<description>gugglug: all tests were performed on the raw block device, so I don&#039;t think they&#039;re affected by partitioning artifacts. In this context we only tested reads, so the 128KB erase cycles aren&#039;t relevant either. It&#039;s very difficult to obtain information on what happens within the drives themselves, so unfortunately we have to resort to building up models of how they operate based on extensive testing. We&#039;re talking with a couple of manufacturers, so hopefully we&#039;ll have more definitive information soon.</description>
		<content:encoded><![CDATA[<p>gugglug: all tests were performed on the raw block device, so I don&#8217;t think they&#8217;re affected by partitioning artifacts. In this context we only tested reads, so the 128KB erase cycles aren&#8217;t relevant either. It&#8217;s very difficult to obtain information on what happens within the drives themselves, so unfortunately we have to resort to building up models of how they operate based on extensive testing. We&#8217;re talking with a couple of manufacturers, so hopefully we&#8217;ll have more definitive information soon.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: originalgeek</title>
		<link>http://www.rethinkdb.com/blog/2009/10/rethinking-b-tree-block-sizes-on-ssds/comment-page-1/#comment-4511</link>
		<dc:creator>originalgeek</dc:creator>
		<pubDate>Tue, 06 Oct 2009 21:01:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=153#comment-4511</guid>
		<description>Hey glugglug, this test and analysis has nothing to do with disk formatting.  It has to do with the choice of page size for storing btree indices.  This is a variable chosen by the database server, and is independent of the cluster size on the disk.
&lt;em&gt;Admin: Edited to tone down a little.&lt;/em&gt;
</description>
		<content:encoded><![CDATA[<p>Hey glugglug, this test and analysis has nothing to do with disk formatting.  It has to do with the choice of page size for storing btree indices.  This is a variable chosen by the database server, and is independent of the cluster size on the disk.<br />
<em>Admin: Edited to tone down a little.</em></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: glugglug</title>
		<link>http://www.rethinkdb.com/blog/2009/10/rethinking-b-tree-block-sizes-on-ssds/comment-page-1/#comment-4508</link>
		<dc:creator>glugglug</dc:creator>
		<pubDate>Tue, 06 Oct 2009 20:52:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=153#comment-4508</guid>
		<description>Your graph of the SSD performance reflects non-optimal partitioning.

Every modern OS uses a cluster size that is a multiple of 4KB, so in theory there shouldn&#039;t be a difference in IOPS between a 512B write and a 4KB write.

SSDs also have internal clusters that are multiples of 4KB, and the blocks that get erased for read/erase/write cycles are at least 128KB.  But for the sake of backwards compatibility, 512B sectors are still emulated.

Unfortunately, most partitioning tools still start the 1st partition on sector 63 (counting 512B sectors), which is not a multiple of 8 to have it 4KB aligned.  If you partition using Windows 7 or other smarter tool to keep your partitions 4KB aligned, you will not see a difference in IOPS between 512B operations and 4KB operations. (and since 4KB will be higher, the dropoff afterwards will be steeper).</description>
		<content:encoded><![CDATA[<p>Your graph of the SSD performance reflects non-optimal partitioning.</p>
<p>Every modern OS uses a cluster size that is a multiple of 4KB, so in theory there shouldn&#8217;t be a difference in IOPS between a 512B write and a 4KB write.</p>
<p>SSDs also have internal clusters that are multiples of 4KB, and the blocks that get erased for read/erase/write cycles are at least 128KB.  But for the sake of backwards compatibility, 512B sectors are still emulated.</p>
<p>Unfortunately, most partitioning tools still start the 1st partition on sector 63 (counting 512B sectors), which is not a multiple of 8 to have it 4KB aligned.  If you partition using Windows 7 or other smarter tool to keep your partitions 4KB aligned, you will not see a difference in IOPS between 512B operations and 4KB operations. (and since 4KB will be higher, the dropoff afterwards will be steeper).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gregory Gross</title>
		<link>http://www.rethinkdb.com/blog/2009/10/rethinking-b-tree-block-sizes-on-ssds/comment-page-1/#comment-4476</link>
		<dc:creator>Gregory Gross</dc:creator>
		<pubDate>Tue, 06 Oct 2009 18:53:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=153#comment-4476</guid>
		<description>Thanks for the write-up. Very informative.</description>
		<content:encoded><![CDATA[<p>Thanks for the write-up. Very informative.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
