<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Page alignment on SSDs</title>
	<atom:link href="http://www.rethinkdb.com/blog/2009/10/page-alignment-on-ssds/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rethinkdb.com/blog/2009/10/page-alignment-on-ssds/</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Sat, 06 Mar 2010 12:56:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: RethinkDB - The database for solid state drives.</title>
		<link>http://www.rethinkdb.com/blog/2009/10/page-alignment-on-ssds/comment-page-1/#comment-6944</link>
		<dc:creator>RethinkDB - The database for solid state drives.</dc:creator>
		<pubDate>Tue, 20 Oct 2009 10:20:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=223#comment-6944</guid>
		<description>[...] Page alignment on SSDs [...]</description>
		<content:encoded><![CDATA[<p>[...] Page alignment on SSDs [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Slava</title>
		<link>http://www.rethinkdb.com/blog/2009/10/page-alignment-on-ssds/comment-page-1/#comment-6663</link>
		<dc:creator>Slava</dc:creator>
		<pubDate>Mon, 19 Oct 2009 10:30:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=223#comment-6663</guid>
		<description>In this context stride is simply the alignment of the read offset. So, if the stride is 4k, all reads will be performed at a 4k boundary (i.e. an offset into the drive divisible by 4096).</description>
		<content:encoded><![CDATA[<p>In this context stride is simply the alignment of the read offset. So, if the stride is 4k, all reads will be performed at a 4k boundary (i.e. an offset into the drive divisible by 4096).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gabor</title>
		<link>http://www.rethinkdb.com/blog/2009/10/page-alignment-on-ssds/comment-page-1/#comment-5878</link>
		<dc:creator>Gabor</dc:creator>
		<pubDate>Fri, 16 Oct 2009 02:30:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=223#comment-5878</guid>
		<description>Great blog guys! I&#039;m not familiar with the term &quot;stride&quot; in the context of SSDs.

I thought stride was the size of an individual chunk striped acruss multiple RAID disks.

What does stride mean in the context of DBs on SSDs?

Thanks, Gabor</description>
		<content:encoded><![CDATA[<p>Great blog guys! I&#8217;m not familiar with the term &#8220;stride&#8221; in the context of SSDs.</p>
<p>I thought stride was the size of an individual chunk striped acruss multiple RAID disks.</p>
<p>What does stride mean in the context of DBs on SSDs?</p>
<p>Thanks, Gabor</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian Smith</title>
		<link>http://www.rethinkdb.com/blog/2009/10/page-alignment-on-ssds/comment-page-1/#comment-5126</link>
		<dc:creator>Brian Smith</dc:creator>
		<pubDate>Sat, 10 Oct 2009 03:09:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=223#comment-5126</guid>
		<description>I read that minimum_io_size and others were added in 2.6.31. I&#039;m not sure how you retrieve those values in previous kernels.

[1] http://www.redhat.com/archives/dm-devel/2009-June/msg00297.html</description>
		<content:encoded><![CDATA[<p>I read that minimum_io_size and others were added in 2.6.31. I&#8217;m not sure how you retrieve those values in previous kernels.</p>
<p>[1] <a href="http://www.redhat.com/archives/dm-devel/2009-June/msg00297.html" rel="nofollow">http://www.redhat.com/archives/dm-devel/2009-June/msg00297.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Slava</title>
		<link>http://www.rethinkdb.com/blog/2009/10/page-alignment-on-ssds/comment-page-1/#comment-5094</link>
		<dc:creator>Slava</dc:creator>
		<pubDate>Fri, 09 Oct 2009 23:39:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=223#comment-5094</guid>
		<description>Brian, which kernel version are you using? minimum_io_size isn&#039;t exposed by sysfs on our 2.6.28-14-server #47-Ubuntu SMP UTC 2009 x86_64 (at least it&#039;s not in /sys/block/sdb/queue/minimum_io_size).

You&#039;re right about the non-zero origin, it can be a bit confusing. You get a slightly better resolution this way, so we chose to use a more close range to present the data.</description>
		<content:encoded><![CDATA[<p>Brian, which kernel version are you using? minimum_io_size isn&#8217;t exposed by sysfs on our 2.6.28-14-server #47-Ubuntu SMP UTC 2009 x86_64 (at least it&#8217;s not in /sys/block/sdb/queue/minimum_io_size).</p>
<p>You&#8217;re right about the non-zero origin, it can be a bit confusing. You get a slightly better resolution this way, so we chose to use a more close range to present the data.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian Smith</title>
		<link>http://www.rethinkdb.com/blog/2009/10/page-alignment-on-ssds/comment-page-1/#comment-5004</link>
		<dc:creator>Brian Smith</dc:creator>
		<pubDate>Fri, 09 Oct 2009 14:34:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=223#comment-5004</guid>
		<description>I woke up this morning and realized that what I wrote above didn&#039;t really make sense. First of all, SSDs can read individual NAND pages--now usually 4K, in the near future 8K--even though they cannot write individual pages. So, the block size shouldn&#039;t really matter much. 

The first thing I would check what Linux is using as minimum_io_size for the drive. If minimum_io_size is 512 then that would explain why 512-byte reads are faster than 4KB reads. [1] That shouldn&#039;t be the case because there&#039;s no reason (AFAICT) for a SSD to return a minimum_io_size less than 4096, even in 512/4096 emulation mode [1]. 

The second graph has a non-zero origin. So, graphically it looks like 4K reads are twice as slow as 512 byte reads, but really they are only ~20% slower. And, all SSD drive makers (AFAIK) recommend doing all I/O in blocks the size of a NAND page (4K or 8K). I&#039;d really like to see why there seems to be this 20% performance difference. I look forward to seeing your future posts with results for server-class SSDs.

[1] http://mkp.net/pubs/storage-topology.pdf</description>
		<content:encoded><![CDATA[<p>I woke up this morning and realized that what I wrote above didn&#8217;t really make sense. First of all, SSDs can read individual NAND pages&#8211;now usually 4K, in the near future 8K&#8211;even though they cannot write individual pages. So, the block size shouldn&#8217;t really matter much. </p>
<p>The first thing I would check what Linux is using as minimum_io_size for the drive. If minimum_io_size is 512 then that would explain why 512-byte reads are faster than 4KB reads. [1] That shouldn&#8217;t be the case because there&#8217;s no reason (AFAICT) for a SSD to return a minimum_io_size less than 4096, even in 512/4096 emulation mode [1]. </p>
<p>The second graph has a non-zero origin. So, graphically it looks like 4K reads are twice as slow as 512 byte reads, but really they are only ~20% slower. And, all SSD drive makers (AFAIK) recommend doing all I/O in blocks the size of a NAND page (4K or 8K). I&#8217;d really like to see why there seems to be this 20% performance difference. I look forward to seeing your future posts with results for server-class SSDs.</p>
<p>[1] <a href="http://mkp.net/pubs/storage-topology.pdf" rel="nofollow">http://mkp.net/pubs/storage-topology.pdf</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Slava</title>
		<link>http://www.rethinkdb.com/blog/2009/10/page-alignment-on-ssds/comment-page-1/#comment-4983</link>
		<dc:creator>Slava</dc:creator>
		<pubDate>Fri, 09 Oct 2009 08:15:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=223#comment-4983</guid>
		<description>Sure, here&#039;s a graph for block sizes and alignments up to 250KB:

http://www.rethinkdb.com/blog/wp-content/uploads/2009/10/graph-blk-alg-25k.png

The highest performance still comes from 512B block sizes aligned to 4K boundaries (it gets just a little higher for larger boundaries, but the delta is so small it&#039;s uninteresting). After 4K, the optimal alignment is the block size itself, but the number of IOPS never comes close to 512B block sizes.

The B-tree block size post did use unaligned reads (well, aligned to 512K). The idea was to show the process we go through, not to provide actual numbers for people to use. The numbers are too dependent on the drive, the workload, and the database to give a single useful result.</description>
		<content:encoded><![CDATA[<p>Sure, here&#8217;s a graph for block sizes and alignments up to 250KB:</p>
<p><a href="http://www.rethinkdb.com/blog/wp-content/uploads/2009/10/graph-blk-alg-25k.png" rel="nofollow">http://www.rethinkdb.com/blog/wp-content/uploads/2009/10/graph-blk-alg-25k.png</a></p>
<p>The highest performance still comes from 512B block sizes aligned to 4K boundaries (it gets just a little higher for larger boundaries, but the delta is so small it&#8217;s uninteresting). After 4K, the optimal alignment is the block size itself, but the number of IOPS never comes close to 512B block sizes.</p>
<p>The B-tree block size post did use unaligned reads (well, aligned to 512K). The idea was to show the process we go through, not to provide actual numbers for people to use. The numbers are too dependent on the drive, the workload, and the database to give a single useful result.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: פורומים</title>
		<link>http://www.rethinkdb.com/blog/2009/10/page-alignment-on-ssds/comment-page-1/#comment-4893</link>
		<dc:creator>פורומים</dc:creator>
		<pubDate>Thu, 08 Oct 2009 21:41:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=223#comment-4893</guid>
		<description>I third (?) Brian’s request for larger block sizes.</description>
		<content:encoded><![CDATA[<p>I third (?) Brian’s request for larger block sizes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Colin Percival</title>
		<link>http://www.rethinkdb.com/blog/2009/10/page-alignment-on-ssds/comment-page-1/#comment-4843</link>
		<dc:creator>Colin Percival</dc:creator>
		<pubDate>Thu, 08 Oct 2009 17:20:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=223#comment-4843</guid>
		<description>You noticed that the IOPS numbers from your last post match up exactly against the numbers for *unaligned* reads from this post, right?

BTW, I second Brian&#039;s request for larger block sizes.</description>
		<content:encoded><![CDATA[<p>You noticed that the IOPS numbers from your last post match up exactly against the numbers for *unaligned* reads from this post, right?</p>
<p>BTW, I second Brian&#8217;s request for larger block sizes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian Smith</title>
		<link>http://www.rethinkdb.com/blog/2009/10/page-alignment-on-ssds/comment-page-1/#comment-4838</link>
		<dc:creator>Brian Smith</dc:creator>
		<pubDate>Thu, 08 Oct 2009 17:03:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.rethinkdb.com/blog/?p=223#comment-4838</guid>
		<description>Why do you only test up to a 4kb block size and stride? It is pretty common for databases even on hard disks to use block sizes of 16kb. Since SSDs normally have to read 128kb at a time for read/erase/write, I&#039;d guess they&#039;d be very good at 128kb aligned reads.

Also, is your test utilizing NCQ? I would guess that if you quckly sent 32 4K read requests (on the same aligned 128kb block), you would get a nice boost over sending a 4K read, waiting for a response, sending the next one, etc.</description>
		<content:encoded><![CDATA[<p>Why do you only test up to a 4kb block size and stride? It is pretty common for databases even on hard disks to use block sizes of 16kb. Since SSDs normally have to read 128kb at a time for read/erase/write, I&#8217;d guess they&#8217;d be very good at 128kb aligned reads.</p>
<p>Also, is your test utilizing NCQ? I would guess that if you quckly sent 32 4K read requests (on the same aligned 128kb block), you would get a nice boost over sending a 4K read, waiting for a response, sending the next one, etc.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
