m***@yahoo.com
2006-10-16 08:57:15 UTC
I've been getting some rather odd performance data from single disks
and XLV arrays using 15K rpm drives, wondered if anyone had some
idea why.
The setup is a dual-600 Octane2 with PCI cage containing a dual-
channel QLA12160 card, connected to an external JBOD RAID. The RAID
unit has two separate 4-bay arrays. hinv enclosed below after sig.
For testing, I've been comparing various 10k and 15k rpm drives, 73GB
and 146GB. I tested both individual disks and arrays consisting of 2
or 4 disks, on one controller and spread across two controllers.
Note that the 15K 146GB disks I'm using were new, freshly removed
from original Maxtor antistatic bags. I also tested using differnent
systems (dual-600 vs. single-400), differnent PCI cages and different
QLA12160 cards, just to be sure.
The mystery: why does the write speed for 15K drives completely suck
for smaller block sizes? It only improves once the block size becomes
large, and even then does not seem to be as good as 10K drives. In
the diskperf results tables, this shows up via the numbers in the 1st
column (fwd_wt) being nowhere near as high as those in the 2nd column
(fwd_rd) for the first few rows.
By contrast, 10k drives show almost identical high bandwidths for
both columns, easily 10X better for the first half-dozen rows; there
isn't any huge gradual increase from small block size to large.
Here's a simple example, a single 146GB 15K Maxtor. In all these
diskperf outputs, I'll exclude the diskperf header lines in order to
show more clearly the data of interest. Also, in all cases, the file
system is mkfs'd with a block size of 16384, and the arrays are always
constructed in the same way.
octane# scsicontrol -i /dev/scsi/sc5d4l0 | grep scsi
/dev/scsi/sc5d4l0: Disk MAXTOR ATLAS15K2_146SCAJT00
octane# df | grep 4s7
/dev/dsk/dks5d4s7 xfs 143355192 680512 142674680 1 /0
octane# pwd
/0
octane# ls -l
total 1024000
-rw------- 1 root sys 524288000 Oct 3 20:17 testfile
octane# diskperf -W -D testfile
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
16384 1.69 56.28 2.35 12.71 2.03 2.16
32768 3.26 80.38 5.19 19.25 3.80 4.03
65536 6.44 94.42 10.79 20.42 6.37 6.71
131072 11.70 94.46 18.05 18.44 10.63 10.64
262144 20.12 95.82 23.09 23.08 18.37 17.03
524288 35.05 44.17 30.07 40.36 27.74 33.33
1048576 52.55 53.23 52.45 55.41 52.52 53.75
2097152 53.40 70.81 54.07 70.84 53.91 69.98
4194304 69.26 81.11 68.61 78.68 69.95 78.11
Notice the forward write speed is massively lower compared to the
forward read speed. Even at high block sizes, it's still not as
good, though comes close briefly at 2^20.
Now compare the above to a 73GB 10K Maxtor:
octane# scsicontrol -i /dev/scsi/sc5d2l0 | grep scsi
/dev/scsi/sc5d2l0: Disk MAXTOR ATLAS10K5_73WLS JNZH
octane# df | grep 2s7
/dev/dsk/dks5d2s7 xfs 71822280 844080 70978200 2 /0
octane# pwd
/1
octane# ls -l
-rw------- 1 root sys 524288000 Oct 4 01:20 testfile
octane# diskperf -W -D testfile
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
16384 53.16 59.43 46.29 27.14 8.17 3.47
32768 62.90 85.08 66.44 33.94 15.00 6.63
65536 73.15 87.79 68.33 33.55 23.82 11.99
131072 81.08 87.62 68.00 34.33 34.83 20.22
262144 85.65 87.42 65.17 34.68 45.09 32.05
524288 86.17 42.30 71.61 48.46 55.72 44.92
1048576 86.41 55.76 68.71 63.80 62.17 56.82
2097152 85.76 67.88 68.51 68.09 67.94 66.42
4194304 85.68 76.47 78.63 78.29 77.21 76.66
What a difference! Not only do the numbers in the forward write speed
column start off much higher, they stay that way, though strangely
the forward read column varies a bit as the block size increases.
Plus, the final fwd_wt numbers are better than the 15K results.
So what is happening here? Why are the fwd_wt numbers for the
15K drive so bad?
I tried various 15K and 10K drives (Maxtor, Seagate, Hitachi, etc.),
the results were always the same.
And note that even a simple test of creating the initial test file
using mkfile showed huge differences in speed for both single drives
and arrays, ie. the command:
timex mkfile 500m testfile
run on a single disk always gave a time of around 14 seconds for 15K
drives (34MB/sec), compared to only 7 or 8 seconds for 10K drives
(60MB/sec or better; infact, Maxtor Atlas 10K V drives gave 89MB/sec
for this test with a single disk).
Moving onto XLV arrays, the same differences occur (hey, not often
one gets to use that expression Very Happy)
Here are the results for an array of 4 x 15K 146GB drives, 2 on each
controller, running diskperf on an 1800MB file:
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
1327104 101.02 145.11 98.56 110.62 102.64 107.82
2654208 122.56 137.31 122.94 122.08 122.41 123.12
5308416 137.77 141.45 139.73 135.50 138.44 134.42
10616832 145.68 144.42 148.00 141.09 145.88 141.63
21233664 150.97 147.08 151.80 145.77 151.99 144.89
42467328 153.42 146.20 153.86 147.43 154.71 146.19
84934656 154.45 147.69 156.49 148.26 156.66 146.83
Note that, once again, the smaller block sizes show numbers much
lower for fwd_wt than for fwd_rd, only matching them for larger block
sizes.
Now compare to an array of 4 x 10K 146GB drives, in this case using
Seagate Cheetah ST3146807LCs:
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
1327104 166.45 170.10 123.81 112.26 116.14 104.30
2654208 171.77 174.90 151.04 131.40 151.82 133.54
5308416 174.30 177.74 157.09 156.61 162.11 156.49
10616832 175.27 178.94 175.30 166.41 175.30 166.53
21233664 175.61 179.61 176.04 174.29 176.06 173.66
42467328 175.07 179.92 176.30 177.11 176.30 177.28
84934656 173.89 179.94 176.33 178.63 176.35 178.54
Again, the smaller block size numbers are already high, but even more
bizarre, the overall results are more than 10% faster than the 15K
drives! Why?
And btw, the timex mkfile result for the 10K array was also
impressive:
octane 19# timex mkfile 1800m testfile
real 15.01
user 0.01
sys 1.16
That's 120MB/sec. By contrast, the array of four 15Ks was less
than half as fast for this initial test.
I'm confused to say the least...
Any ideas anyone? I'm sure it's something simple, but can't figure
out what.
Naturally, I can run any further tests that may be suggested.
Ian.
SGI Depot: http://www.futuretech.blinkenlights.nl/sgidepot/
Email: ***@yahoo.com (eBay ID: mapesdhs)
Backup email (send copy to this too): ***@blueyonder.co.uk
Home: +44 (0)131 476 0796
**********************************************************************
Typical system hinv (this unit has a MENET XIO option).
2 600 MHZ IP30 Processors
CPU: MIPS R14000 Processor Chip Revision: 2.4
FPU: MIPS R14010 Floating Point Chip Revision: 0.0
Main memory size: 3584 Mbytes
Xbow ASIC: Revision 1.4
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 2 Mbytes
Integral SCSI controller 0: Version QL1040B (rev. 2), single ended
Disk drive: unit 1 on SCSI controller 0
Integral SCSI controller 1: Version QL1040B (rev. 2), single ended
Integral SCSI controller 4: Version QL12160, low voltage differential
Disk drive: unit 1 on SCSI controller 4
Disk drive: unit 2 on SCSI controller 4
Integral SCSI controller 5: Version QL12160, low voltage differential
Disk drive: unit 3 on SCSI controller 5
Disk drive: unit 4 on SCSI controller 5
IOC3/IOC4 serial port: tty1
IOC3/IOC4 serial port: tty2
IOC3/IOC4 serial port: tty3
IOC3/IOC4 serial port: tty4
IOC3/IOC4 serial port: tty5
IOC3/IOC4 serial port: tty6
IOC3/IOC4 serial port: tty7
IOC3/IOC4 serial port: tty8
IOC3 parallel port: plp1
Graphics board: V12
Integral Fast Ethernet: ef0, version 1, pci 2
Fast Ethernet: ef1, version 1, pci 0
Fast Ethernet: ef2, version 1, pci 1
Fast Ethernet: ef3, version 1, pci 2
Fast Ethernet: ef4, version 1, pci 3
Iris Audio Processor: version RAD revision 12.0, number 1
and XLV arrays using 15K rpm drives, wondered if anyone had some
idea why.
The setup is a dual-600 Octane2 with PCI cage containing a dual-
channel QLA12160 card, connected to an external JBOD RAID. The RAID
unit has two separate 4-bay arrays. hinv enclosed below after sig.
For testing, I've been comparing various 10k and 15k rpm drives, 73GB
and 146GB. I tested both individual disks and arrays consisting of 2
or 4 disks, on one controller and spread across two controllers.
Note that the 15K 146GB disks I'm using were new, freshly removed
from original Maxtor antistatic bags. I also tested using differnent
systems (dual-600 vs. single-400), differnent PCI cages and different
QLA12160 cards, just to be sure.
The mystery: why does the write speed for 15K drives completely suck
for smaller block sizes? It only improves once the block size becomes
large, and even then does not seem to be as good as 10K drives. In
the diskperf results tables, this shows up via the numbers in the 1st
column (fwd_wt) being nowhere near as high as those in the 2nd column
(fwd_rd) for the first few rows.
By contrast, 10k drives show almost identical high bandwidths for
both columns, easily 10X better for the first half-dozen rows; there
isn't any huge gradual increase from small block size to large.
Here's a simple example, a single 146GB 15K Maxtor. In all these
diskperf outputs, I'll exclude the diskperf header lines in order to
show more clearly the data of interest. Also, in all cases, the file
system is mkfs'd with a block size of 16384, and the arrays are always
constructed in the same way.
octane# scsicontrol -i /dev/scsi/sc5d4l0 | grep scsi
/dev/scsi/sc5d4l0: Disk MAXTOR ATLAS15K2_146SCAJT00
octane# df | grep 4s7
/dev/dsk/dks5d4s7 xfs 143355192 680512 142674680 1 /0
octane# pwd
/0
octane# ls -l
total 1024000
-rw------- 1 root sys 524288000 Oct 3 20:17 testfile
octane# diskperf -W -D testfile
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
16384 1.69 56.28 2.35 12.71 2.03 2.16
32768 3.26 80.38 5.19 19.25 3.80 4.03
65536 6.44 94.42 10.79 20.42 6.37 6.71
131072 11.70 94.46 18.05 18.44 10.63 10.64
262144 20.12 95.82 23.09 23.08 18.37 17.03
524288 35.05 44.17 30.07 40.36 27.74 33.33
1048576 52.55 53.23 52.45 55.41 52.52 53.75
2097152 53.40 70.81 54.07 70.84 53.91 69.98
4194304 69.26 81.11 68.61 78.68 69.95 78.11
Notice the forward write speed is massively lower compared to the
forward read speed. Even at high block sizes, it's still not as
good, though comes close briefly at 2^20.
Now compare the above to a 73GB 10K Maxtor:
octane# scsicontrol -i /dev/scsi/sc5d2l0 | grep scsi
/dev/scsi/sc5d2l0: Disk MAXTOR ATLAS10K5_73WLS JNZH
octane# df | grep 2s7
/dev/dsk/dks5d2s7 xfs 71822280 844080 70978200 2 /0
octane# pwd
/1
octane# ls -l
-rw------- 1 root sys 524288000 Oct 4 01:20 testfile
octane# diskperf -W -D testfile
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
16384 53.16 59.43 46.29 27.14 8.17 3.47
32768 62.90 85.08 66.44 33.94 15.00 6.63
65536 73.15 87.79 68.33 33.55 23.82 11.99
131072 81.08 87.62 68.00 34.33 34.83 20.22
262144 85.65 87.42 65.17 34.68 45.09 32.05
524288 86.17 42.30 71.61 48.46 55.72 44.92
1048576 86.41 55.76 68.71 63.80 62.17 56.82
2097152 85.76 67.88 68.51 68.09 67.94 66.42
4194304 85.68 76.47 78.63 78.29 77.21 76.66
What a difference! Not only do the numbers in the forward write speed
column start off much higher, they stay that way, though strangely
the forward read column varies a bit as the block size increases.
Plus, the final fwd_wt numbers are better than the 15K results.
So what is happening here? Why are the fwd_wt numbers for the
15K drive so bad?
I tried various 15K and 10K drives (Maxtor, Seagate, Hitachi, etc.),
the results were always the same.
And note that even a simple test of creating the initial test file
using mkfile showed huge differences in speed for both single drives
and arrays, ie. the command:
timex mkfile 500m testfile
run on a single disk always gave a time of around 14 seconds for 15K
drives (34MB/sec), compared to only 7 or 8 seconds for 10K drives
(60MB/sec or better; infact, Maxtor Atlas 10K V drives gave 89MB/sec
for this test with a single disk).
Moving onto XLV arrays, the same differences occur (hey, not often
one gets to use that expression Very Happy)
Here are the results for an array of 4 x 15K 146GB drives, 2 on each
controller, running diskperf on an 1800MB file:
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
1327104 101.02 145.11 98.56 110.62 102.64 107.82
2654208 122.56 137.31 122.94 122.08 122.41 123.12
5308416 137.77 141.45 139.73 135.50 138.44 134.42
10616832 145.68 144.42 148.00 141.09 145.88 141.63
21233664 150.97 147.08 151.80 145.77 151.99 144.89
42467328 153.42 146.20 153.86 147.43 154.71 146.19
84934656 154.45 147.69 156.49 148.26 156.66 146.83
Note that, once again, the smaller block sizes show numbers much
lower for fwd_wt than for fwd_rd, only matching them for larger block
sizes.
Now compare to an array of 4 x 10K 146GB drives, in this case using
Seagate Cheetah ST3146807LCs:
# req_size fwd_wt fwd_rd bwd_wt bwd_rd rnd_wt rnd_rd
# (bytes) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s) (MB/s)
#---------------------------------------------------------
1327104 166.45 170.10 123.81 112.26 116.14 104.30
2654208 171.77 174.90 151.04 131.40 151.82 133.54
5308416 174.30 177.74 157.09 156.61 162.11 156.49
10616832 175.27 178.94 175.30 166.41 175.30 166.53
21233664 175.61 179.61 176.04 174.29 176.06 173.66
42467328 175.07 179.92 176.30 177.11 176.30 177.28
84934656 173.89 179.94 176.33 178.63 176.35 178.54
Again, the smaller block size numbers are already high, but even more
bizarre, the overall results are more than 10% faster than the 15K
drives! Why?
And btw, the timex mkfile result for the 10K array was also
impressive:
octane 19# timex mkfile 1800m testfile
real 15.01
user 0.01
sys 1.16
That's 120MB/sec. By contrast, the array of four 15Ks was less
than half as fast for this initial test.
I'm confused to say the least...
Any ideas anyone? I'm sure it's something simple, but can't figure
out what.
Naturally, I can run any further tests that may be suggested.
Ian.
SGI Depot: http://www.futuretech.blinkenlights.nl/sgidepot/
Email: ***@yahoo.com (eBay ID: mapesdhs)
Backup email (send copy to this too): ***@blueyonder.co.uk
Home: +44 (0)131 476 0796
**********************************************************************
Typical system hinv (this unit has a MENET XIO option).
2 600 MHZ IP30 Processors
CPU: MIPS R14000 Processor Chip Revision: 2.4
FPU: MIPS R14010 Floating Point Chip Revision: 0.0
Main memory size: 3584 Mbytes
Xbow ASIC: Revision 1.4
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 2 Mbytes
Integral SCSI controller 0: Version QL1040B (rev. 2), single ended
Disk drive: unit 1 on SCSI controller 0
Integral SCSI controller 1: Version QL1040B (rev. 2), single ended
Integral SCSI controller 4: Version QL12160, low voltage differential
Disk drive: unit 1 on SCSI controller 4
Disk drive: unit 2 on SCSI controller 4
Integral SCSI controller 5: Version QL12160, low voltage differential
Disk drive: unit 3 on SCSI controller 5
Disk drive: unit 4 on SCSI controller 5
IOC3/IOC4 serial port: tty1
IOC3/IOC4 serial port: tty2
IOC3/IOC4 serial port: tty3
IOC3/IOC4 serial port: tty4
IOC3/IOC4 serial port: tty5
IOC3/IOC4 serial port: tty6
IOC3/IOC4 serial port: tty7
IOC3/IOC4 serial port: tty8
IOC3 parallel port: plp1
Graphics board: V12
Integral Fast Ethernet: ef0, version 1, pci 2
Fast Ethernet: ef1, version 1, pci 0
Fast Ethernet: ef2, version 1, pci 1
Fast Ethernet: ef3, version 1, pci 2
Fast Ethernet: ef4, version 1, pci 3
Iris Audio Processor: version RAD revision 12.0, number 1