This is a very lengthy post; for that I apologise in advance! But this is something I feel very passionately about...
It's been a very rowdy few days in the world of Enterprise Storage and technology announcements!
Typically July and August are very quiet times in IT mostly due to Summer Holidays/Vacations inside IT departments, resellers, ISVs and vendors alike (certain countries in mainland Europe even go so far as to shut down completely for 4-6 weeks during this time!).
However this year something strange has happened; a wave of mystical product launches, PR statements and marketing/blogpost overdrive has been visible from lots of the start-up or hyper-growth storage companies eyeing your business and storage projects.
The theme seems to be consistent across the board;
- We are far cheaper than everyone else
- We bring MILLIONZ more IOPS/performance than everyone else
- Why would you buy anything else, when we're around?
From reading these PR statements and blog posts from multiple vendors it suddenly dawned on me that if I was slightly bemused by the statements, how on earth would potential customers and partners be able to differentiate and sift through the marketing statements to relate the product to their environment or requirements?
So, without further ado - here are some of my top tips to help navigate through the realms of marketing claims for end-user environments:
1. Ensure your requirements & needs are documented and ranked in importance from 1-10.
When I meet customers they typically have an immediate need for a prospective storage solution (i.e. storage for VMware vSphere and Microsoft SQL, for example). However other nice-to-have requirements may be to use application consistent snapshots for backups, or replication for disaster recovery, but perhaps a feature such as NAS connectivity is not something on the agenda right now.
Recently there are a few products coming to the table with products that literally have EVERY feature you could ever want or think of with no additional cost. Whilst this sounds great on the datasheet, in the PR statement or blogposts, often these solutions have a poor implementation of these features as development may not allow for deep engineering and full Quality Assurance testing. It truly is the idea of perhaps offering too much (and thus rarely excelling at not a lot).
By ranking by importance your requirements and features you need in a prospective solution it will allow you to move on to number 2:
2. Always trial, test (and score!) your prospective vendors before you make a purchase decision.
Once you've been able to rank your requirements and feature sets, this then allows you to form your testing criteria for any Proof of Concept evaluation you decide to run.
Time and time again I'm asked to run vendor bake-offs / POCs against competitors, but the end-user has not been able to draw up any set pass/fail criteria for the bake-off testing; opting to trust the vendor with their generic (and more often bias) test plan and criteria for any POC testing.
This is a very dangerous situation as it allows the hardware/software provider to shape and manipulate your tests to suit their best features whilst hiding all problematic or half-baked feature sets which may be a show-stopper if it were to be exposed.
Being able to rank and understand requirements from a potential storage environment upfront will allow the creation of a non-bias POC test-plan document when the time comes, which ultimately ensures that the test is 100% fair, and does not allow any vendor to influence and manipulate said testing in a way that they are almost guaranteed to win hands-down in most tests.
This also allows you to move on to number 3:
3. Never, EVER 100% trust performance tests run by vendors - run your own!
When running performance bake-offs every vendor will be cunning with how the performance test is derived.
Most typically a tool such as IOMeter or SQLIO will be used to demonstrate performance of the product as the first thing after the storage array is installed in the environment. Whilst these tools are great (and will almost always report great results) it's often run when the array has NO data on it whatsoever, so the array is 100% optimised to deliver the best possible result as a performance test (for example, ZFS based arrays will dramatically degrade in performance once data capacity starts to be used but will show mind-blowing IO results when empty).
Another thing to consider is that a production environment will require a mixture of IO block sizes for various applications, whilst IO profiles (reads or writes, random or sequential) are often a variable mix every single day. It's a true cliche, but there's something called the IO Blender (especially with virtualised workloads) where one cannot guarantee whether the environment may drive random or sequential reads or writes, and the profile of the IO may change without warning.
99% of all storage array vendors will NOT take the above into account as part of a POC, opting to show purely random reads or writes using small (4k) blocks as a single request to the array to give the best result.
So it's very important to mandate that you run performance tests with YOUR data and yours alone. This ensures that you can understand the performance of the test today (on a current solution) and would be able to quantify the increase/decrease in performance once it's on a new solution. This is a far better test than being shown an IOMeter screen pushing 100,000 IOPS and running 20,000 more than the competitor's box.
4. Flash storage for $/GB less than Spinning Disk? Yeah Right.
Another interesting statistic that's been observed recently from some of the All Flash Array (AFA) vendors is the claim of having lower cost per Gigabyte ($/GB) than spinning disk media.
Everyone knows one day the above will be true, however i'm afraid right now this statement is another pure-marketing claim.
When digging more into this claim i've noticed that these claims are only true when taking into account large compression and de-duplication savings of the data set vs uncompressed & un-deduplicated data on the spinning media.
This is a) assuming that the current spinning disk array does not have any compression or de-dupe running on it today, but more naively b) assumes that the dataset is highly compressible and would be susceptible to large de-reduplication reductions - which a large amount of production environments are not.
This is a) assuming that the current spinning disk array does not have any compression or de-dupe running on it today, but more naively b) assumes that the dataset is highly compressible and would be susceptible to large de-reduplication reductions - which a large amount of production environments are not.
For example; I recently encountered the above claim in a recent POC bake-off between a series of storage vendors. The customer had their head turned by the claims of major cost reduction savings of an All-Flash Array, promising that it would be the same cost (or even less than) the spinning disk or hybrid storage guys. However the customer was wise in their POC criteria; who mandated they would copy THEIR production data and working set to the array and determine exactly the saving in primary space we will see for de-dupe and compression on the array.
It was observed that whilst the AFA vendor promised reductions of 6-10:1, the real-world testing only delivered 2-3:1 reduction of data (with compression bringing almost 2x more reduction in space than the de-dupe on the system, which is not uncommon to see) - meaning the end result was savings were not near the marketing claims and thus being almost 2x more expensive than the competition.
Storage array vendors call the reduction in primary data needs as "Effective Capacity", and are starting to size solutions based on these figures. Whilst it sounds great in a presentation (who wouldn't want to get 300TB from 30TB of disk!?) it's very often the case that in true environments the figures are wildly off.
The moral of this point is simple; Never, EVER size to Effective Capacity (regardless of what your storage array vendor tells you) as it's never a guaranteed figure that you'll get. And so ALWAYS use Usable Capacity, and any primary storage reductions seen from compression and/or de-duplication is an additional bonus.
I feel there could be a future blog post in this topic...
Storage array vendors call the reduction in primary data needs as "Effective Capacity", and are starting to size solutions based on these figures. Whilst it sounds great in a presentation (who wouldn't want to get 300TB from 30TB of disk!?) it's very often the case that in true environments the figures are wildly off.
The moral of this point is simple; Never, EVER size to Effective Capacity (regardless of what your storage array vendor tells you) as it's never a guaranteed figure that you'll get. And so ALWAYS use Usable Capacity, and any primary storage reductions seen from compression and/or de-duplication is an additional bonus.
I feel there could be a future blog post in this topic...
I hope some of the above tips are useful and have provided food for thought. I have more recommendations on how to navigate through the minefields of Enterprise storage today, but hopefully the above is a good starting point.
No comments:
Post a Comment