I know, the title is a bit of a teaser but it makes sense. I am at HP Discover Las Vegas this week and HP revealed a few new of its storage enhancements. A very important one is the next step with HP StoreServ (fka 3PAR).
When HP bought 3PAR in 2010 ($2.3b) I was really happy. I really liked how the architecture has been designed from the ground for up and it was an acquisition HP needed. There was only one big problem: the 3PAR “green zone” that HP created (not 3PAR!) – all designs needed to be approved by 3PAR engineers – was not easy to sell and they lost a lot of the granularity which doesn’t really make it a competitive product. THAT has changed in December last year when HP announced the new HP StoreServ 7000 series. In the hallways we call it “the array of sunlight”. I wrote a small post on it as soon as it came out.
And then this thing happened at HP Discover Frankfurt that week:
bold statement of Tom Joyce at #HPdiscover: “HP will NOT buy a flash company”. breaking the dreams of some startups 🙂— Hans De Leenheer (@hansdeleenheer) December 5, 2012
554000 IOPS benchmarked:
Everyone knows I rant on IOPs benchmarks because 1) most of the time they do not tell you the IO pattern that has been used and 2) a benchmark still is an artificial result that has no connection to real environments. YOu will never get those max IOPs out of a real environment becasue of the write patterns and the IO-blender (watch this VMUG video from Stephen Foskett on the IO-blender).
So here are a few details on the StoreServ 7450 benchmark of 554.000 IOPs:
- They topped the benchmark at 0.7 ms latency. They could have gone even a little further but that was the point they wanted to proof.
- The IO pattern was 4k random IO with disabled caching
- The configuration was a 4 controller system (full mesh) with only 48 disks. note: I am hoping to get a configuration paper from HP with street price.
HP redesigns 3PAR OS for flash:
This title does not entirely cover what happened but I’ll elaborate. First of all there are some simple hardware differences between the 7400 they released in december and the 7450 of today which are towards 8-core CPU’s instead of 6-core and double the amount of cache per controller.
But the changes go way beyond that. The HP 3PAR team went back in the trenches, analysed their own IO patterns and what they needed to change in the code to make it work natively for flash. First of all a big difference between a flash components and rotational disk is that you don’t have seek time waiting. So they don’t have to read bigger blocks into the cache than necessary. Previously if you needed q 4k block you are reading a 16k block from the array just becasue chances could be that you needed the rest of the block. Now it only read what is necessary. Priyadarshi called this adaptive read.
A second thing is that the cache has 16block pages but if a 4k block comes in, only a 4k block will get acknowledged to the disks. The SSD’s will then also internally fill those 4k blocks into its own “write page size” before offloading it to the actual flash component. This an be called adaptive write. To make these two features work well together they have a bitmap cache of all those variable blocks so that they know which block had what size. A final point I want to address here is that although I said it’s writing just pieces of the pages down, there will still be RAID coalescence in the cache. If some 4k blocks come in together, they will be cached and flushed to the backend together.
The last point is how the cache actually works: there are 3 queues in the cache:
- 1Q: the dat ahas been touched once
- nQ: the dat ahas been touched multiple times
- offload: the data has only been touched once so lets offload. if it is here and it gets touched again it will move to nO.
Now comes a really deep tech part: the 3PAR technology has both an ASIC and an Intel CPU for handling the workloads. One of the really nifty tricks they have added is in the high availability of the full mesh (multi controller system). When an IO comes in the ASIC, it has to mirror the IO to one of the other controllers before they can give the ack up the stack to the client. Instead of having the ASIC 1 waiting for the ack of ASIC 2 to come back, they just write the IO mirror on the PCIe bus in the ASIC 2’s memory. Because PCIe doesn’t know ack they just write it and read it back. By doing this you do not interrupt the ASIC of controller 2 but you still are able secure high availability. The results the 3PAR labs found was a 15% win on lower latency.
Multi-tenancy for IO workloads:
3PAR has been designed from the ground up for multi-tenancy. They already had multi-tenancy from a management perspective but now they are adding multi-tenancy from a QOS perspective (Quality of Service). Remember that I didn’t like the fact that benchmarks on IOPs are not necessarily taking care of the IO pattern? Here the team does QOS baed on both IOPs and Bandwith. Good thing!
If you think again about the IO-blender you know that you can have at the same time one application writing 4k random blocks (database) and another application reading 128 streaming IO (backup). What would happen if you don’t do something about it is that the 4k blocks would be waiting becasue the 128k blocks take all the bandwith away. What HP did here is splitting the 128k block in the cache and writing down 32k blocks. this allows a more granular interaction for the 4k blocks. Now why not making everything in 4k block you’d say? They tested all sizes in the labs and when they went smaller than 32k the actual 128k block would suffer from that. When you do 32k you only need 4 acks, when you do 4k blocks, you’ll need 32 acks before the actual 128k is committed.
I will probably update this page if more information comes in.
HP StoreServ 7450: product landing page
Robin Harris: HP’s big transition
Enrico Signoretti: HP Announces a flashy 3PAR
Ernesto Pellegrino: HP Storage 7450 Flash anouncement
Bart Heungens: HP 3PAR goes flash all the way
Chris Evans: 3PAR Continues to be HP Storage Cornerstone
Justin Vashisht: VIDEO – HP/3PAR StoreServ 7450 SSD Array architecture