All things being Equal(logic)
From the Recliner...
Calendar
|
|
September '10 | |||||
| Mo | Tu | We | Th | Fr | Sa | Su |
| 1 | 2 | 3 | 4 | 5 | ||
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| 20 | 21 | 22 | 23 | 24 | 25 | 26 |
| 27 | 28 | 29 | 30 | |||
Quicksearch
Other Blogs
My Photo's
Categories
All things being Equal(logic)
OK, the title for this is cheesy, so shoot me.
A few months ago we decided to upgrade our SAN from an EMC Clariion fiber channel solution to a Equallogic iSCSI based solution. We've always been high on iSCSI and Equallogic seemed to have to right idea by actually designing their storage from the ground up as an iSCSI solution. We had used other vendors iSCSI solutions (an older NetApp and an EMC AX150i) but they felt like iSCSI was being bolted on, especially the EMC implementation which seemed to basically be "wherever there are FC ports, replace them with iSCSI ports, but use our same old technology for load balancing and failover." We actually purchased an Equallogic array for one of our smaller sites so that we could put it through it's paces and we really thought it performed well, was easy to configure, and seem like the best storage solution we had ever seen. Well, after a forklift upgrade and living with EQL as our primary solution for several months, our enthusiasm has somewhat abated. We still think it's an excellent product, but just like all products, it has it's shortcomings as well.
A few months ago we decided to upgrade our SAN from an EMC Clariion fiber channel solution to a Equallogic iSCSI based solution. We've always been high on iSCSI and Equallogic seemed to have to right idea by actually designing their storage from the ground up as an iSCSI solution. We had used other vendors iSCSI solutions (an older NetApp and an EMC AX150i) but they felt like iSCSI was being bolted on, especially the EMC implementation which seemed to basically be "wherever there are FC ports, replace them with iSCSI ports, but use our same old technology for load balancing and failover." We actually purchased an Equallogic array for one of our smaller sites so that we could put it through it's paces and we really thought it performed well, was easy to configure, and seem like the best storage solution we had ever seen. Well, after a forklift upgrade and living with EQL as our primary solution for several months, our enthusiasm has somewhat abated. We still think it's an excellent product, but just like all products, it has it's shortcomings as well.
First, let me give you a little background about our previous setup. Our storage environment consisted of a 2Gb Fiber Channel SAN utilizing EMC Clariion storage, a couple of CX400 arrays at our remotes sites and a CX700 in the datacenter. We were using Cisco MDS92xx series switches for our SAN fabric, and leveraged the IP services blade available in these devices to allow for minimal amount of iSCSI connectivity from a few hosts and FCIP connectivity between our sites. We then used EMC Snapview for snapshots and Mirrorview/A to asynchronously replicate critical data from our ERP system to one of our secondary sites 700+ miles away via the WAN.
For the most part the setup worked well. It cost a good bit of money, and was somewhat complicated to administer, but, once setup and working, it pretty much just worked and didn't really need that much attention. We did have a few problems with the EMC array, a storage processor failure which eventually required a complete array reboot (and thus system downtime) to recover from, and we had a shelf of ATA drives which constantly failed, included a couple of data loss events. EMC support was usually very weak. After our third data loss event on the ATA drives in a span of 8 weeks or so -- each time blamed on firmware, or the version of FLARE code we were running, or some other random reason -- we told them we wanted all of the drives replaced. Then they told us the ATA drives simply aren't as reliable as FC drives and that we should expect to have data loss if we stored data on them. Now, that wasn't what we were sold. We of course knew that the ATA drives wouldn't perform as well, especially for systems with high IOPS requirements, but we didn't expect them to randomly loose data. We purchased the shelf as a "second tier" of storage for bulk data that needed to be online but didn't have high IO requirements but it wasn't much good if we randomly lost the data we stored there.
So our experience with EMC was already not that great, and the sales side of EMC was even worse. When we approached them about upgrades we were shocked at the prices, asking nearly for the original purchase price just to upgrade to newer head units. We decided to look elsewhere. We had always been high on iSCSI and didn't really have the bandwidth requirements to justify expenditures on 4Gb Fiber Channel so iSCSI storage seemed like a good fit. When the Equallogic guys came to demo their product we were impressed, when we read reviews showing performance numbers we were more impressed. Simple management, great performance, excellent feature set (snapshot and replication built in). What else could you ask for in an iSCSI array?
So now we have a complete Equallogic SAN, a PS400E and PS3900XV in the datacenter, and a PS300E and PS100E at our two remote sites. We're making use of snapshots for backup, and replication for disaster recovery. Performance in most cases is still excellent, but we've hit a few performance bumps, and it's become more obvious that Equallogic does not quite have the "enterprise" feel that the EMC array had, especially with their tools. Here's my personal quick rundown of the good and the not so good that we've experience so far:
The Good
- Installation -- I don't think I've ever installed a storage system that is easier than the Equallogic boxes to get up and running, although the NetApp boxes I've dealt with are pretty close. Take it out of the box, plug it in to the network and either run the Windows tool to configure the name and IP or (my choice) use the console cable to give the box a name, group name, and IP. Pretty simple.
- Configuration and Management Interface -- The Equallogic configuration GUI is simple, responsive, and fairly intuitive and most people will take only a few minutes to feel comfortable. The command line interface (available via telnet or ssh) doesn't seem to be as well thought out, and presents information in confusing layouts, but is still quite functional.
- General Performance -- Out-of-box performance for most applications is especially impressive. Using multiple NICs and/or iSCSI HBA's in concert with load balancing software like Microsoft MPIO with Equallogic specific host integration tools or Linux native dm-multipath can easily provide 200-300MB/s throughput performance from a single host. Write performance is especially good, but there are a few corner cases where EQL suffers, we'll look at those in the next section.
Please note: This section is long, but overall many of the gripes are not really that critical, I just think they are things you should be aware of before diving both feet into a complete Equallogic solution, and it's information that their sales team isn't likely to share.
- Performance Monitoring -- The Equallogic "Performance Monitor" built into the GUI borders on uselessness and doesn't support any volume level monitoring at all. Monitoring via SNMP is not as easy as it could be. They do include a tool that makes it fairly easy to setup an MRTG page with performance information for your volumes, but there appear to be some bugs in the SNMP data which cause graphed data to sometimes be inaccurate. I've been told EQL is aware that the SNMP data can sometimes be less than accurate and that they are working to fix that issue. A nice template for Cacti, or just a truly functional, cross-platform web application that could be run on a server and provide a web page to view historical performance data would be nice.
- Performance Tuning -- The Equallogic is a great performing array, however, there are a few cases where it is less than ideal. For example, try running a benchmark with multiple parallel small reads, a fairly common scenario with large, transactional databases. Since the typically becomes an IOP bound operation most arrays throughput will degrade significantly in these cases, but the EQL dropped 90-95% of it's throughput performance which is a tremendous amount. It appears that in these cases the array makes poor decisions about what to cache and what not to cache and the optimal read-ahead values for such patterns. Of course our EMC array also dropped about 75-80% in this same scenario, but it allowed us to manually tweak settings for individual LUN's. By tweaking the settings for LUN's with these types of access patterns to use static prefetch values and balance the caching behavior to favor keeping data in cache we were able to significantly improve the performance of the EMC array at these tasks. We suspect that such manual tuning would be just as beneficial to the Equallogic array, but we'll never know because they don't let you have access to those things.
- Management Interface -- Now I'm sure your asking yourself "Wasn't this in the 'Good' section?" and you would be correct, however, even the good parts of the Equallogic have some bad. The management interface is fast and clean, but has glaring usability issues. For example, using the GUI there is no easy way to identify a host and determine what volumes it has access to. In the Clariion arrays management (Navisphere) you can group a bunch of fiber channel WWID's, iSCSI IP addresses,or initiator names into a logical group which represents that host, then you can select that host and see what volumes and snapshots it can access. We attempted to use CHAP names to accomplish this same goal with the Equallogic, creating CHAP names that matched our hostnames. This works well for assigning volume access to hosts, however, there is no view within the Equallogic GUI where you can select a CHAP username and see what volumes/snapshots it can access. You have to highlight each and every volume and look at the access tab to determine this information. Not too bad if you have only a few volumes, but gets annoying as your volume count approaches dozens, or even hundreds.
Also, the management interface allows you to do things like delete a volume that has active connections to it. At pretty much any moment you are two clicks away from deleting a volume forever. As a somewhat paranoid storage admin this seems a little scary to me. The Clariion would not allow you to delete a volume without first removing all host access from the volume. This was simple to do, but was a very deliberate step and provided some extra protection from the possibility of an accidental deletion since you first had to sever host connections and only then could you delete the volume. Being two clicks away from deleting a volume, much less a 2TB volume accessed by a half dozen VMware servers with a dozen or so VM's active on it, just seems too risky for an enterprise storage solution.
It's also not possible to manage more than a single "group" of Equallogic arrays with the management interface. With the Clariion we could see all of our arrays across all sites in a single console, now we need three consoles. Not a huge issue, but it would be certainly be nice to manage all of your groups from a single console interface. - Snapshot Management -- The Equallogic management interface makes creating snapshots a breeze, and you can schedule snapshot creation easily enough as well. The host integration toolkit also makes VSS aware snapshots quite easy from a Windows host. However, if your on a platform like Linux, or if your just wanting manual snapshots with your Windows systems, life might not be quite so good. Snapshots on the Clariion were relatively "static" devices with granular access. Really they were "containers" which you could present to hosts as a static volume and simply make the data appear or disappear within this "container" at any time. You could take a volume, make two snapshots of it, assign one snapshot to a backup server, a second snapshot to a development or training server. The backup server would not see, or be able to access the development snapshots, and visa-versa. If you wanted to refresh the snapshot data on the backup server your didn't have to delete the snapshot "container" rediscover a new "container" etc, you simply unmounted the snapshot, refeshed the snapshot image in the container, and remounted the snapshot on the host. The host thought it was the same device just with different data, similar to taking a USB key, putting it in a another system, putting some new files back on it, and then bringing it back to the first system. The system will know it's the same USB key, but will still see the new data. Well, don't expect this level of flexibility in your Equallogic snapshots. If you need to refresh the snapshot on the backup server you have to delete and recreate the snapshot. The new snapshot has a completely new UUID and looks like a completely different disk from a SAN perspective meaning that hosts like linux will create a new multipath map for each and every snapshot refresh.
You can choose to either give a host access to all of a volume snapshots or none, but nothing in between. If you have a volume that gets snaps hourly, but then you create a special daily snapshot to mount on your backup server, the backup server will see the hourly snapshots as well. Even worse, it may be virtually impossible to determine which snapshots are which from the host side as the only unique identifier is the cryptic iSCSI target name. Equallogic creates iSCSI target names with the date and time DOWN TO THE MILLISECOND which means determining which volume is which from a host (rather than in the GUI) requires you to parse out dates and times. Technically there is a way to "name" the snapshot but this appears to be only usable within the EQL management interface and is not used as part of the iSCSI name for the snapshot (although that would sure be nice). You could probably figure out some way to parse out the "snapshot name" and correlate that to an iSCSI name but trying to script this is miserable.
Also, Snapshots on the Equallogic array are the most space hungry snapshots I have ever seen in my life. With our EMC array a 500GB volume containing our ERP database would use about 10-20GB of data per 24-hour period, however, with the Equallogic this same setup balooned to 60-70GB per 24-hours or more. Why the difference? Well Equallogic support says that they allocate snapshot space in 16MB pages, so if one single byte of data in a "page" changes, the snapshot allocates 16MB. With a database that is likely to change many small blocks all over the filesystem this causes significant snapshot bloat.
As expected they also use snapshots for the built in replication function. When I asked why my local snapshot would balloon to 70+GB while the replication would replicate <20GB for a typical day (much more in-line with the space used by the Clariion snapshots) they explained that the snapshots used by their replication engine allocated data in 256KB pages rather than the local snapshots 16MB, thus not using so much unneeded space. Why they wouldn't use the same "page size" for both types of snapshots is left as a mystery, but my guess is that using excessive snapshot space on the local disk might lead to selling more/larger disk arrays but if they used that same approach on the replication feature they would be significantly inferior to their competitors. - CLI and Storage Automation/Scripting -- And here we come to the biggest failing of them all, host based scripting of storage actions. We were a heavy user of navicli and admsnap, the command line utilities provided by EMC for host based scripting of commands on the Clariion. We used these tools to make consistent snapshots of our linux hosts running Oracle and for various other activities. We were told by Equallogic that they had the "Host Scripting Tools" which was a set of perl wrappers that make it easy to run commands on the Equallogic array from the host. Our first thought was "We love perl, that sounds great!" Oh if only we had looked at this "toolkit" closer. While it's true that the "Host Scripting Tools" exist, they are basically just a very few functions that act as a wrapper around telnet to assist you in logging in to an array. There are no true functions at all. There's no function to list volumes, create volumes, list snapshots of a volume, create snapshots, delete snapshots, well, you get the idea. There are no functions to do anything except log on to the box via telnet. You then issue commands exactly as you would via the CLI interface and they return their results in the same unusable formats that you then have to try to parse out. They offer a sample script that creates a snapshot but it doesn't even work right without a minor modification. It's a complete joke that they even have such a thing.
We tried to be positive and use this "toolkit" to create some scripts which created snapshots, activated snapshots, deleted snapshots, etc, but, since all the commands to list anything require parsing the horrible output that comes from the CLI interface (fixed-width, indentation wrapped, columnar output with crazy multi-line headers) we found this exceedingly frustrating. Equallogic support eventually provided me with some scripts they used internally which parsed the horrible output into something more usable but if your CLI output is this unusable how can you not include functions to parse the output in something you call a "scripting toolkit". A toolkit should do more that wrap telnet.
We eventually overcame this and managed to script all of our required functions anyway, so the ability is there, but don't let Equallogic fool you into thinking there's some usable toolkit sitting on their website. We're currently developing our own true "toolkit" which will include simple functions for managing snapshots, listing volumes and access and perhaps other functions. When/if it gets into a decent, publishable state we'll make it available for download.
Now, after reading all of this you may think that I'm not very happy with the Equallogic arrays. Amazingly, you'd be wrong. I still really like the Equallogic arrays and think a strong option worth considering, especially in the iSCSI space, but make sure you think about everything and understand their limitations before purchasing. Also, being happy with a product is not the same as being completely satisfied. Every product I've ever used has room for improvement and it's good and bad points, and the Equallogic is no exception, but dealing with Equallogic has generally been a good experience and they seem to have a genuine interest in hearing from customers and improving their products in areas that they can. I guess we'll see if that continues now that the Dell acquisition is showing it's colors all over the place, but I'm hopeful.
Trackbacks
Three years of Equallogic
OK, so suddenly I have several emails and a couple of comments wanting followups to my Equallogic blog post...
OK, so suddenly I have several emails and a couple of comments wanting followups to my Equallogic blog post...
Weblog: From the Recliner...
Tracked: Jun 01, 15:08
Tracked: Jun 01, 15:08

Friday, February 8. 2008 at 16:46 (Reply)
Monday, April 7. 2008 at 17:13 (Reply)
Monday, April 7. 2008 at 22:55 (Reply)
Of course, you can choose to not use hot spares (not really good practice unless you have onsite spares and near 24x7 onsite personnel), or use RAID5 with only a single hot spare, which lower the overhear at the cost of slightly higher risk of data loss and perhaps slightly worse performance.
Other vendors would have nearly identical overhead if configured with RAID50 and two hot spares per shelf. Of course, some arrays allow a more flexible layout of RAID groups and spares, but most vendors seem to suggest at least one hot spare per shelf and, assuming two RAID groups per shelf would have similar overhead.
On our older EMC equipment we had three shelves and usually two RAID groups per shelf, with a single hot spare per shelf so that mean 78% of available space was usable, but the EMC also had overhead from vault drives (the first 4 drives of the array which held mirrored groups of the array's OS), statically allocated reserve pool (a fixed pool of space required for snapshots, the EQL allocates snap space based on a percentage per volume) so the overall overhead was very similar.
If I had a vendor that was attempting to use this as a mark against EQL I'd question how exactly they an do better and ask for them to provide their best practice data for allocating RAID groups and hot spares. I bet if you follow their own best practices their overhead would be nearly identical to EQL.
Later,
Tom
Sunday, July 6. 2008 at 11:26 (Reply)
Because of the page size, snapshot reserve often is 100% of the volume. Because of the replication schema, the 'Local Reserve' s/b 100%, to be in EqualLogics words, 'reliable'.
So a 200GB LUN that you want to snapshot and replicate will require as much as 600GBs of that remaining 62%.
A PS-5000X with 16 x 400GB SAS = 6.25 TB reduced by overhead down to 4.05TBs. If you want to either snap or replicate using 'reliable' Best Practice settings, you would have ~2 TBs for production data.
Otherwise a good array. $60K for each 2TBs of data....... And if you want to replicate as well.....Now you can see why Dell paid so much for EQL....a stream of high income from upgrades for additional usable capacity. A very good deal for Dell.....
Sunday, July 6. 2008 at 12:25 (Reply)
It's true that Equallogic's replication and snapshot implementation uses far too much overhead, just as I noted in my review above, however, in actual practice I've found that I can get by with much less snapshot and replication reserve than the "best practice" amounts. These "best practice" amounts will cover you even if a block in every page is changed, which is pretty uncommon for most workloads. Still, a smaller page size for snapshots would sure be nice.
Also, several other vendors seem to use a "pool" of shared storage for their snapshots which can lead to significant space savings over requiring a separate reserve space for each volume.
There are always many things to include when determining the costs of storage, including overhead, ongoing costs of support, the costs of administration, and cost of growing the storage. As a current owner of Equallogic and EMC, and a prior owner of Netapp, all I can say is that so far, in our use case, Equallogic is a clear winner in all categories except maybe the costs of expansion.
Later,
Tom
Friday, May 9. 2008 at 15:24 (Reply)
The latest Host Integration Tool Kit for Windows maps host disks to EqualLogic volumes. For example, you'll see the Microsoft Exchange Writer tree with your Exchange Server underneath. Then, you'll see the Exchange Storage Groups underneath the server. Click on them and you'll see a lot of array info, etc.
Not perfect but pretty good for a free tool.
Thanks for the feedback.
Friday, May 16. 2008 at 09:45 (Reply)
In the end we have to use completely different scripts and procedures for Windows from what we use for Linux, with our EMC box the scripts for creating, deleting, activating, and deactivating snapshots on the storage array were virtually identical. Only the mount and unmount sections were platform specific, now the scripts are completely different. That's a little frustrating, but admittedly not unworkable.
Friday, May 16. 2008 at 05:00 (Reply)
Friday, May 16. 2008 at 09:51 (Reply)
I've since heard many good things about Falconstor and their products from friend who have used them.
Friday, May 22. 2009 at 19:01 (Reply)
I was wondering how your experiences are today with the EQL boxes, one year later and probably a few updates later.
Thanks for sharing!
Monday, June 1. 2009 at 15:03 (Reply)
Monday, June 1. 2009 at 16:19 (Reply)
Monday, June 29. 2009 at 11:53 (Reply)
we are evaluating now two PS5000. Are you still happy with your EQUALLOGIC ? Is the support good enougth for you ? best regards
Daniel
Wednesday, July 1. 2009 at 21:21 (Reply)
1. Overall still happy, but will still look for alternatives when the time comes.
2. Support has actually been very good, what little we have needed of it. The hardware has been rock solid so nothing there, but support has helped us a little with minor issues like scripting and a few minor performance problems.
Tuesday, September 15. 2009 at 10:45 (Reply)
I'm also astonished by the amount of snapshot space consumption - it's easily four times as much as snapping the same volumes was on the Netapp. We've got an older PS400 providing iSCSI storage for our ESX cluster and snapshot deletion on that is starting to cause problems if done during production hours.
Overall, I'm happy with EQL. Not coming from a storage background I find it much easier to deal with than Netapp - though that may be because we couldn't afford the NetApp bells and whistles and I pretty much had to use the CLI for everything :o)
Performance could be better, snapshotting could be better, scripting could be MUCH better, but we spent probably half as much on a solution which gives pretty much identical performance to a similar tier Netapp, so I can't really grumble about it!
Monday, March 8. 2010 at 06:11 (Reply)
Any expert advise is appreciated