Strange performance results with diskperf and 15K drives...

Discussion:

(too old to reply)

m***@yahoo.com

2006-10-16 08:57:15 UTC

I've been getting some rather odd performance data from single disks
and XLV arrays using 15K rpm drives, wondered if anyone had some
idea why.

The setup is a dual-600 Octane2 with PCI cage containing a dual-
channel QLA12160 card, connected to an external JBOD RAID. The RAID
unit has two separate 4-bay arrays. hinv enclosed below after sig.

For testing, I've been comparing various 10k and 15k rpm drives, 73GB
and 146GB. I tested both individual disks and arrays consisting of 2
or 4 disks, on one controller and spread across two controllers.
Note that the 15K 146GB disks I'm using were new, freshly removed
from original Maxtor antistatic bags. I also tested using differnent
systems (dual-600 vs. single-400), differnent PCI cages and different
QLA12160 cards, just to be sure.

The mystery: why does the write speed for 15K drives completely suck
for smaller block sizes? It only improves once the block size becomes
large, and even then does not seem to be as good as 10K drives. In
the diskperf results tables, this shows up via the numbers in the 1st
column (fwd_wt) being nowhere near as high as those in the 2nd column
(fwd_rd) for the first few rows.

By contrast, 10k drives show almost identical high bandwidths for
both columns, easily 10X better for the first half-dozen rows; there
isn't any huge gradual increase from small block size to large.

Here's a simple example, a single 146GB 15K Maxtor. In all these
diskperf outputs, I'll exclude the diskperf header lines in order to
show more clearly the data of interest. Also, in all cases, the file
system is mkfs'd with a block size of 16384, and the arrays are always
constructed in the same way.

octane# scsicontrol -i /dev/scsi/sc5d4l0 | grep scsi
/dev/scsi/sc5d4l0: Disk MAXTOR ATLAS15K2_146SCAJT00
octane# df | grep 4s7
/dev/dsk/dks5d4s7 xfs 143355192 680512 142674680 1 /0
octane# pwd
/0
octane# ls -l
total 1024000
-rw------- 1 root sys 524288000 Oct 3 20:17 testfile
octane# diskperf -W -D testfile
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
16384 1.69 56.28 2.35 12.71 2.03 2.16
32768 3.26 80.38 5.19 19.25 3.80 4.03
65536 6.44 94.42 10.79 20.42 6.37 6.71
131072 11.70 94.46 18.05 18.44 10.63 10.64
262144 20.12 95.82 23.09 23.08 18.37 17.03
524288 35.05 44.17 30.07 40.36 27.74 33.33
1048576 52.55 53.23 52.45 55.41 52.52 53.75
2097152 53.40 70.81 54.07 70.84 53.91 69.98
4194304 69.26 81.11 68.61 78.68 69.95 78.11

Notice the forward write speed is massively lower compared to the
forward read speed. Even at high block sizes, it's still not as
good, though comes close briefly at 2^20.

Now compare the above to a 73GB 10K Maxtor:

octane# scsicontrol -i /dev/scsi/sc5d2l0 | grep scsi
/dev/scsi/sc5d2l0: Disk MAXTOR ATLAS10K5_73WLS JNZH
octane# df | grep 2s7
/dev/dsk/dks5d2s7 xfs 71822280 844080 70978200 2 /0
octane# pwd
/1
octane# ls -l
-rw------- 1 root sys 524288000 Oct 4 01:20 testfile
octane# diskperf -W -D testfile
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
16384 53.16 59.43 46.29 27.14 8.17 3.47
32768 62.90 85.08 66.44 33.94 15.00 6.63
65536 73.15 87.79 68.33 33.55 23.82 11.99
131072 81.08 87.62 68.00 34.33 34.83 20.22
262144 85.65 87.42 65.17 34.68 45.09 32.05
524288 86.17 42.30 71.61 48.46 55.72 44.92
1048576 86.41 55.76 68.71 63.80 62.17 56.82
2097152 85.76 67.88 68.51 68.09 67.94 66.42
4194304 85.68 76.47 78.63 78.29 77.21 76.66

What a difference! Not only do the numbers in the forward write speed
column start off much higher, they stay that way, though strangely
the forward read column varies a bit as the block size increases.
Plus, the final fwd_wt numbers are better than the 15K results.

So what is happening here? Why are the fwd_wt numbers for the
15K drive so bad?

I tried various 15K and 10K drives (Maxtor, Seagate, Hitachi, etc.),
the results were always the same.

And note that even a simple test of creating the initial test file
using mkfile showed huge differences in speed for both single drives
and arrays, ie. the command:

timex mkfile 500m testfile

run on a single disk always gave a time of around 14 seconds for 15K
drives (34MB/sec), compared to only 7 or 8 seconds for 10K drives
(60MB/sec or better; infact, Maxtor Atlas 10K V drives gave 89MB/sec
for this test with a single disk).

Moving onto XLV arrays, the same differences occur (hey, not often
one gets to use that expression Very Happy)

Here are the results for an array of 4 x 15K 146GB drives, 2 on each
controller, running diskperf on an 1800MB file:

# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
1327104 101.02 145.11 98.56 110.62 102.64 107.82
2654208 122.56 137.31 122.94 122.08 122.41 123.12
5308416 137.77 141.45 139.73 135.50 138.44 134.42
10616832 145.68 144.42 148.00 141.09 145.88 141.63
21233664 150.97 147.08 151.80 145.77 151.99 144.89
42467328 153.42 146.20 153.86 147.43 154.71 146.19
84934656 154.45 147.69 156.49 148.26 156.66 146.83

Note that, once again, the smaller block sizes show numbers much
lower for fwd_wt than for fwd_rd, only matching them for larger block
sizes.

Now compare to an array of 4 x 10K 146GB drives, in this case using
Seagate Cheetah ST3146807LCs:

# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
1327104 166.45 170.10 123.81 112.26 116.14 104.30
2654208 171.77 174.90 151.04 131.40 151.82 133.54
5308416 174.30 177.74 157.09 156.61 162.11 156.49
10616832 175.27 178.94 175.30 166.41 175.30 166.53
21233664 175.61 179.61 176.04 174.29 176.06 173.66
42467328 175.07 179.92 176.30 177.11 176.30 177.28
84934656 173.89 179.94 176.33 178.63 176.35 178.54

Again, the smaller block size numbers are already high, but even more
bizarre, the overall results are more than 10% faster than the 15K
drives! Why?

And btw, the timex mkfile result for the 10K array was also
impressive:

octane 19# timex mkfile 1800m testfile

real 15.01
user 0.01
sys 1.16

That's 120MB/sec. By contrast, the array of four 15Ks was less
than half as fast for this initial test.

I'm confused to say the least...

Any ideas anyone? I'm sure it's something simple, but can't figure
out what.

Naturally, I can run any further tests that may be suggested.

Ian.

SGI Depot: http://www.futuretech.blinkenlights.nl/sgidepot/
Email: ***@yahoo.com (eBay ID: mapesdhs)
Backup email (send copy to this too): ***@blueyonder.co.uk
Home: +44 (0)131 476 0796

**********************************************************************

Typical system hinv (this unit has a MENET XIO option).

2 600 MHZ IP30 Processors
CPU: MIPS R14000 Processor Chip Revision: 2.4
FPU: MIPS R14010 Floating Point Chip Revision: 0.0
Main memory size: 3584 Mbytes
Xbow ASIC: Revision 1.4
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 2 Mbytes
Integral SCSI controller 0: Version QL1040B (rev. 2), single ended
Disk drive: unit 1 on SCSI controller 0
Integral SCSI controller 1: Version QL1040B (rev. 2), single ended
Integral SCSI controller 4: Version QL12160, low voltage differential
Disk drive: unit 1 on SCSI controller 4
Disk drive: unit 2 on SCSI controller 4
Integral SCSI controller 5: Version QL12160, low voltage differential
Disk drive: unit 3 on SCSI controller 5
Disk drive: unit 4 on SCSI controller 5
IOC3/IOC4 serial port: tty1
IOC3/IOC4 serial port: tty2
IOC3/IOC4 serial port: tty3
IOC3/IOC4 serial port: tty4
IOC3/IOC4 serial port: tty5
IOC3/IOC4 serial port: tty6
IOC3/IOC4 serial port: tty7
IOC3/IOC4 serial port: tty8
IOC3 parallel port: plp1
Graphics board: V12
Integral Fast Ethernet: ef0, version 1, pci 2
Fast Ethernet: ef1, version 1, pci 0
Fast Ethernet: ef2, version 1, pci 1
Fast Ethernet: ef3, version 1, pci 2
Fast Ethernet: ef4, version 1, pci 3
Iris Audio Processor: version RAD revision 12.0, number 1

Tony 'Nicoya' Mantler

2006-10-17 01:47:55 UTC

Permalink

In article <***@f16g2000cwb.googlegroups.com>,
***@yahoo.com wrote:

: I've been getting some rather odd performance data from single disks
: and XLV arrays using 15K rpm drives, wondered if anyone had some
: idea why.

It could be that the 15k drives are tuned to not combine as many writes.
Remember that 15k drives are for markets where scattered accesses and latency
have the largest impact, rather than streaming read/write performance where 10k
and 7.2k drives tend to be seen more often.

To narrow things down, I'd try re-testing the drives in a different OS with the
same card/drives, and also try with a different type of card. It's possible that
there's some tuning assumptions in the card/driver/OS that don't hold up with
15k drives.

Cheers - Tony 'Nicoya' Mantler :)

--
Tony 'Nicoya' Mantler - Master of Code-fu
-- ***@ubb.ca -- http://www.ubb.ca/ --

m***@yahoo.com

2006-10-24 14:11:03 UTC

Permalink

Post by Tony 'Nicoya' Mantler
It could be that the 15k drives are tuned to not combine as many writes.

It turned out to be the write cache. This is usually disabled by
default
on 15Ks. Turning it on with fx fixed the write-speed discrepancy
compared to read speed.

However, the overall stripe bandwidth figures are still higher for the
10K
models.

Post by Tony 'Nicoya' Mantler
Remember that 15k drives are for markets where scattered accesses and latency
have the largest impact, rather than streaming read/write performance where 10k
and 7.2k drives tend to be seen more often.

Even so, I don't see why that would mean a stripe array of 15Ks would
be
slower.

Post by Tony 'Nicoya' Mantler
To narrow things down, I'd try re-testing the drives in a different OS with the
same card/drives, and also try with a different type of card.

I doubt testing with a different OS will make any difference. I did
test with
2 completely different Octanes though (CPU, PCI card, card cage, etc.)
and
it made no difference. Besides, others have seen the same write-speed
loss when a 15K is used just internally. But this still doesn't explain
the
overall lower speed of a 15K stripe. A 10K stripe is consistently about
15% faster (or more).

Post by Tony 'Nicoya' Mantler
... It's possible that
there's some tuning assumptions in the card/driver/OS that don't hold up with
15k drives.

Any idea what they might be? I already tried one of the systune
suggestions
given in the Lurkers Guide, but it made no difference.

Cheers! :)

Ian.

Tony 'Nicoya' Mantler

2006-10-24 14:48:17 UTC

Permalink

In article <***@i3g2000cwc.googlegroups.com>,
***@yahoo.com wrote:

: > ... It's possible that
: > there's some tuning assumptions in the card/driver/OS that don't hold up
: > with
: > 15k drives.
:
: Any idea what they might be? I already tried one of the systune
: suggestions given in the Lurkers Guide, but it made no difference.

To be honest, I haven't a clue. :)

Cheers - Tony 'Nicoya' Mantler :)

--
Tony 'Nicoya' Mantler - Master of Code-fu
-- ***@ubb.ca -- http://www.ubb.ca/ --

Silvo Bozovicar

2006-10-25 08:03:06 UTC

Permalink

Post by m***@yahoo.com

Even so, I don't see why that would mean a stripe array of 15Ks would
be
slower.

They shouldn't be slower, so i think we should look at different things,
like file system tuning, FS block size, stripe chunk size, stripe
tuning, etc.

Also don't forget what happens if we have less platers in 15k hard disk
than in 10k drives.

Paul

2006-11-07 16:38:59 UTC

Permalink

Haven't tested recently, but I seem to remember in my mucking about that 15k
drives in general show more overall speed with larger transfer sizes than
10k drives, much like jumbo frames for gigE.
Also, there may be some tweaking available with cache balancing on the
drives reservered (R cache vs W cache)

Post by m***@yahoo.com

Post by Tony 'Nicoya' Mantler
It could be that the 15k drives are tuned to not combine as many writes.

It turned out to be the write cache. This is usually disabled by
default
on 15Ks. Turning it on with fx fixed the write-speed discrepancy
compared to read speed.
However, the overall stripe bandwidth figures are still higher for the
10K
models.

Even so, I don't see why that would mean a stripe array of 15Ks would
be
slower.

Post by Tony 'Nicoya' Mantler
To narrow things down, I'd try re-testing the drives in a different OS with the
same card/drives, and also try with a different type of card.

Post by Tony 'Nicoya' Mantler
... It's possible that
there's some tuning assumptions in the card/driver/OS that don't hold up with
15k drives.

Any idea what they might be? I already tried one of the systune
suggestions
given in the Lurkers Guide, but it made no difference.
Cheers! :)
Ian.