Storage Soup - A SearchStorage.com blog

Storage Soup:

 

A SearchStorage.com blog


A data storage blog offering commentary on the storage industry, as well as a behind-the-scenes look at developments in storage management, SAN, NAS, backup, disaster recovery and storage strategy.

What’s up with CDP for 2008

Some analysts touted CDP as being the dark horse technology for corporate adoption in 2007. As we all know, that didn’t occur and the multitude of CDP technologies ended up confusing analysts, press and IT alike as they each tried to sort out the differences between available CDP products and what CDP’s true value proposition was. All of these factors contributed to spoiling CDP’s debut.

However, I anticipate CDP will make a comeback in 2008 for two reasons: corporate needs for data replication and higher availability. Data replication has been around for a long time (only recently under the moniker of continuous), so it is a mature technology and well understood by storage professionals in the field.

“Higher availability” is the more important feature of CDP. Companies now must choose between high availability and semi-availability. High availability is associated with synchronous replication software and provides application recoveries in seconds or minutes but at an extremely high cost. At the other extreme, is backup software that only delivers semi-availability so it can take hours, days or even weeks to recover data. CDP delivers higher availability which is an acceptable compromise between these two extremes as it can quickly recover data (typically under 30 minutes) to any point in time and at a price that is competitive with backup software.

CDP also compliments deduplication. While some may view CDP and deduplication as competing technologies (and in some respects they are), the real goal of data protection is data recovery.

This is where CDP and deduplication part ways. CDP captures all changes to data but keeps the data for shorter periods of time, typically 3 to 30 days, to minimize data stores. Deduplication’s primary objective is data reduction, not data recovery. Faster recoveries may be a byproduct of deduplication since the data is kept on disk but it is not the focus of deduplication so recoveries from deduplicated data do not approach the granularity that CDP provides.

So what’s in store for CDP in 2008? The staying power of new data protection technologies is now largely determined by whether it is adopted by small and midsize businesses. If it’s practical and works there, it will find its way into the enterprises because more and more enterprises work as a conglomeration of small businesses despite corporate consolidations. So, it is not a matter of if CDP will gain momentum in 2008, it is a question of how quickly it will become the predomimant technology that companies use to protect all of their application data.

2008 recommendations for deduplication, encryption and VMware

As 2007 draws to a close, there are three technologies that appear near the top of many storage managers’ priority lists going into 2008.

· Deduplication

· Tape encryption

· VMware

The mix of old and new technologies is intriguing. One would think that as deduplication and VMware rise in importance, more companies would start to abandon storing data on tape. Yet that does not appear to be the case. Symantec’s Director of Product Marketing, Marty Ward, recently told me that the new encryption features in NetBackup 6.5 are its #2 most sought-after feature (deduplication is #1).

Don’t rush into a deduplication purchase decision. I have yet to talk to a user who doesn’t report faster backup times using a deduplicating backup appliance or backup software and ensuing reductions in data stores. However, I sense that users are rushing into purchasing decisions and not stepping back to look at what other options they have available.

ExaGrid System’s CEO, Bill Andrews, told me this past week that in 50% of its customer deals, the company is seeing no competition. I suspect this percentage probably holds true for Data Domain and Quantum as well. But storage managers should avoid rushing out and buying a deduplicating product to solve their backup problems. Taking just a few extra days to check out what other products are available, how each product adds more capacity and performance, and how viable the company behind the product is can save you some management headaches.

The big cautionary note with tape encryption is to verify how encryption keys are created and managed. So, I recommend using a third-party appliance to create and manage the encryption keys. Though appliances can encrypt the data, more are starting to work in conjunction with backup software and tape drives to provide encryption keys. When companies encrypt data stored to tape, most are hoping they never to access the data again. So managers need to think in terms of how best to manage the recovery of data in five years, not five days. Encryption appliances create highly secure encryption keys, manage the keys long-term, and give companies assurance that they can manage the encryption keys and then recover the data years later.

Storage companies also need to account for the very real storage problems that server virtualization creates. One of the best things you can do in 2008 to prevent VMware from negatively impacting your environment is to change the way you back up VMware virtual machines (VMs). One approach is to use the latest versions of backup software that support the VMware Consolidated Backup (VCB) framework, which back up just the VMDK file which contains the data for all VMs on a VMware server. The other is to install a host-based CDP or dedupe agent on each VM. This eliminates the overhead that backup software agents introduce on each VM. I recommend using CDP. If you are going to change your backup approach anyway, choose the one that gives you the most granular recovery options.

EMC Clariion firmware upgrades include hidden gems

The glitz and glamour of new product releases tend to overshadow the rather mundane task of performing firmware upgrades on storage systems. However, administrators who take the time to keep their storage systems up-to-date with the latest and greatest patches for their storage system may find they can avoid some FC SAN “gotchas” as well as find some hidden gems that vendors are packaging in their latest firmware releases.

Prompting my thoughts on this topic was a recent conversation I had on this topic with a storage architect. He recently inherited a FC SAN where the firmware releases on the storage systems were two major releases back. The older code on these storage systems was becoming a problem since other devices on the SAN (switches, virtual tape libraries, and servers) had newer firmware with new features, but in order to take full advantage of these newer features, the storage systems also needed newer code.

I discussed this topic with EMC partly because the storage systems in question were EMC Clariion, but also because I know from personal experience that EMC releases firmware updates a fairly regular basis.

In the case of its Clariions, EMC comes out with a major release every 9 to 12 months that includes major new functions. For instance, its December 2006 code release for the Clariion included a new proactive hot spare feature for improved high availability and a Quality of Service feature as a licensable add-on. Its August 2007 Clariion major release added new security features as well as iSCSI enhancements like native replication.

Another interesting feature included in the update is the Software Assistant. This tool scans the Clariion prior to starting a firmware upgrade and provides recommendations as to which code an administrator should load on the system. The Software Assistant also does a high availability check prior to actually starting the upgrade to confirm that firmware upgrade can be completed without unexpectedly taking the system offline.

EMC recommends to customers that they install major firmware releases for its Clariions shortly after they are released (within 3 to 4 months).

However, there is a more pressing reason to ensure that firmware code is current. When doing firmware upgrades one must apply them sequentially.  If a Clariion system is two generations old, customers may need to upgrade to the intermediate release before upgrading to the newest release. Though this is generally not a big deal, it does add to the length of the time needed to perform the firmware upgrade and makes it more difficult to back out of an upgrade should something go awry.

Kiss storage heterogeneity goodbye if HP-Symantec merger occurs

Over the last few weeks, storage insiders have been abuzz with speculation that a merger between HP and Symantec is imminent. Whether such talks are occurring, I can not definitively say, but if it does occur, the whole corporate world might as well kiss goodbye any hopes it had of creating and managing a heterogeneous storage environment.

Obviously, I’m exaggerating a bit. Kissing heterogeneity goodbye won’t happen the day such a deal is signed (if it occurs), and it probably won’t ever completely happen. HP and Symantec will likely both pledge that heterogeneous support will remain part of their product roadmaps. And, it’s likely that is true. However, one can almost bet that when it comes time to prioritize which storage products are tested first in conjunction with future releases of Symantec’s Veritas storage software that HP’s storage products will find their way to the head of the line.

More disconcerting is what Symantec’s acquisition by HP (or whoever they are acquired by or merge with) would mean for the future of heterogeneous storage environments in general. At one time, Symantec was on the vanguard of supporting an enterprise heterogeneous storage environment. Yet, now no one is really shocked or even appears overly concerned when Symantec is mentioned as a candidate for an acquisition or merger by what is traditionally considered a storage hardware vendor.

This mindset is testimony to changing user concerns and priorities. It used to be that storage hardware was the primary cost in user data center. Not anymore. Now, it is the management of the storage hardware — even if a user buys all of the hardware from the same storage vendor.

The complexity associated with managing storage hardware from multiple different vendors has become a mind-boggling exercise. While at one time it may have been worthwhile to spend the extra time and money to verify if an HP-UX server worked with an IBM storage system, now it is questionable if that is still the case. Instead I sense an increased willingness on the part of users to pay a premium to buy all of their storage hardware and software from one vendor and avoid checking multiple different support matrixes that using heterogeneous environments requires.

The looming acquisition or merger of Symantec, regardless of by who, signals the re-emergence of an old systems management philosophy. Companies no longer want a one trick pony for their storage management needs, even if that one trick pony manages heterogeneous storage environments. Instead more companies appear to want a return to simpler times where they buy all of their storage hardware and software from one vendor that all work nicely together. Let’s just hope that if companies have to revert back to this philosophy, that it works better this time than it did in the past.

Tape is the only option

Just when I think that I have heard every reason for keeping data on tape, new arguments keep emerging. Now the latest is that tape is more energy efficient than disk.

My first real insight into this came a few weeks ago when I was speaking to Spectra Logic’s director of technical marketing, Molly Rector, who had just returned to Denver after meetings with Spectra Logic channel partners, resellers and users in the New York and Boston area. The feedback that she received from her meetings was that some data centers in the Northeast were running low on power and no longer able to obtain new power. In these cases, the shortage of power was forcing their customers to choose tape because it was more energy efficient than disk even though they wanted to buy disk for their backup environment.

While it may be true that tape consumes less power than disk, it is disconcerting that some companies find themselves in this predicament of needing to choose tape over disk because of something as seemingly preventable as an inadequate supply of power.

Keeping data on tape costs businesses in ways that are sometimes hard to measure. Legal discoveries, the personnel needed to manage tape and moving and storing tapes offsite all add to the costs of tape management and also consume power in more subtle ways. To somehow conclude that the choice between disk and tape somehow needs to stop and start with a company’s rate of energy consumption seems a bit archaic to me.

Tape may consume less power than disk, but that does not necessarily make tape a better choice. Disk and tape are both choices that companies need to have available to them and either one, if managed properly and looked at from a total cost of ownership, can save companies money and cut energy consumption in the process.

Companies in this situation are obviously looking at some hard choices in the near term as their choices are less about the choice between disk and tape but if it is time to change how and even where they manage their data. In the Northeast, it appears some companies have already waited too long to make a decision because when the number of outlets left in the wall dictates what storage media they need to buy, the only choices left are unpleasant ones.

Beware of old disk drives

In doing some research recently on the problems associated with recovering data from old tapes, I found out that a similar set of problems exist when trying to recover data stored on old disks. This problem becomes especially pronounced if a company unplugs an old disk drive and puts it on the shelf or keeps it in production too long.

The problem that companies are more likely to encounter when storing a disk drive on the shelf is not necessarily data degradation on the disk drive platter but mechanical failures of the parts within the disk drive itself. Greg Schulz, the lead analyst with Minneapolis-based StorageIO, finds that the lubricants of the mechanical parts within the disk drive can settle. This can cause the disk drive to malfunction when the company attempts power it up again for the first time in a long time.

Jim Reinert, VP of disaster recovery for Kroll Ontrack, a worldwide provider of data recovery services, says that the largest problem Kroll encounters with trying to recover data from old disk drives is repairing and replacing defective mechanical parts inside the disk drive. Motors failing and electronic circuit boards going bad are just some of the components Kroll has had to repair before it can recover the data from the drive. This situation requires Kroll to find an exact match for the defective part, usually on the used market.

Of course, mechanical problems can also occur while the computer system is still in use. Reinert finds that some of the toughest data to recover is found on older, proprietary computer systems that are in use but break. Typically found in manufacturing and production environments, these are older computer systems that control a piece of equipment that everyone uses but no one manages. As a result, the data is not backed up nor does anyone know who created the application or how it runs.

So, what’s the best way to protect data on old disk drives? The best and simplest way is to avoid keeping data on old disk drives and migrate data to newer disk drives. Kroll Ontrack classifies disk drives over five years in age as “old” since by this time disk drive warranties have usually expired and parts for the disk drive are out of production.

Schulz is a little less dogmatic about the five year cut-off. He finds that disk drives that are up to seven to eight years in age are probably OK depending on what condition in which they were stored or how they are used in production. He suggests spinning them up on a regular basis (once every 3 to 6 months), though he agrees that as disk drives age, administrators should migrate the data to newer drives.

If a disk drive has already failed or you come across one of indeterminate age or condition and you don’t know what data is on it or its value to your business, your best bet is probably to send it to a data recovery specialist and keep your fingers crossed.

Three emerging storage ‘mega-trends’

It may be a little early in 2007 to start prognosticating about what is going to occur in 2008 and beyond. However, there are some major trends — I would almost classify them as “mega-trends” — that I see taking shape. These trends indicate that, at a higher level, storage management is shifting from managing bits and bytes to treating storage as a cheap, abundant commodity that can be used to solve specific business problems.

Nowhere is this more evident than in the increasing number of small and midsize businesses (SMBs) who are switching to online backup. Though this trend started some time ago (some vendors noticed a serious up tick in business about 18 months ago), this trend should only accelerate in 2008.

Backup is strategic to SMBs only in the sense that SMBs recognize they need to do it and that they need help doing it. If they can outsource it for about the same cost or slightly more than they are paying now with a high level of assurance that it will work, most will do it.

Contributing to this trend is that backup service providers are maturing to become managed service providers (MSPs). They no longer provide just online backup and support user-initiated recoveries. They are diversifying to provide an entire range of data management services that SMBs need such as archiving, data classification and different tiers of disaster recovery services.

MSPs still are at different stages in providing these services and, for now, users should still view these new service offerings with a fair amount of skepticism. However, it is reasonable to assume that by 2009 MSPs should have many of the kinks worked out and will offer more robust data management services.

Another trend that is emerging is the need for storage managers to develop a close relationship with one’s legal department. This is significant because the way IT manages data going forward will be driven as much by their corporate legal departments as it is by internal business applications. “Just keep it all” or “Delete it after three years” may be good starting points for data management but the world has become much more complicated than that.

Andrew Cohen, who handles EMC’s legal department and corporate compliance, cites cost, legal statutes, defensible data management policies and e-discovery as the specific reasons that data management polices need to evolve and for IT and legal departments to work more closely together. Yet, for storage managers to focus on broader business and legal issues, they must put into place a storage infrastructure that doesn’t require their constant attention and is self-managing and self-healing.

That leads to the last major mega-trend I see emerging in storage: clustered storage. Anyone who deals with storage on a day-to-day basis knows that storage is anything but self-managing and self-healing — especially when used in a storage network. It anything, I would characterize most current storage network designs as exactly the opposite: self-destructing and self-defeating.

Clustered storage is shaping up to take one of two forms: clustered storage systems and virtualized storage. From a best practices point of view, clustered storage systems (sometimes called grid storage) from NEC, Isilon Systems and Panasas  can create one large logical storage pool that are probably the best option. However, that model often requires companies to standardize on a storage vendor’s product which may or may not fit with how companies procure their storage.

Virtualized storage is accomplished using a network based storage virtualization product such as EMC’s InVista or Incipient’s iNSP. These products aggregate existing storage systems to present one logical storage pool to the server infrastructure as well as creates a common console to perform common storage management functions such as data migrations and provisioning.

How soon these emerging mega-trends come to pass remains to be seen. But dropping storage costs, the need for tighter relationships between IT and legal, and maturing storage technologies are contributing to the likelihood of these trends getting a foothold in 2008 and accelerating from there.

SNW’s winners and losers

Last week, I met with more vendors and was briefed on more new technologies than I thought possible in a 3-day period at Storage Networking World (SNW) in Dallas, TX. However, now that I am back in the comforts of Omaha, NE, (if one can ever call Omaha comfortable), here are some of the briefings and interviews that I found to be the most interesting. And some that I thought were totally unremarkable.

Sun’s director of storage marketing, Dave Kenyon, and I met under the pretense of doing an interview for an upcoming article for Storage magazine on VTLs that manage disk and tape. But, whatever Dave was on during our interview, I need to get me some of that. I’m guessing Dave was up all night with the SNW crowd and his coffee was just kicking in when we sat down for our 9:00 am interview on Wednesday morning, because he let it rip. From blasting how backup software manages disk to wondering aloud why open systems vendors and users fail to learn the same lessons that the mainframe folks learned years ago, Dave solved backup’s problems (and most of the world’s) in the 30 minutes we met.

I also met with Isilon Systems’ director of marketing, Brett Goodwin. In the last year, Isilon Systems has gone from Wall Street darling and supposed NetApp-killer to a stock price collapse and whispers on the street that their product was having problems.

Brett explained that Isilon Systems had initially set earning expectations too high and then when Isilon System failed to meet lowered earnings expectations, they were promptly punished by Wall Street. As far as the rumors about their IQ product not working well, it was more a matter of Isilon’s VARs selling into accounts that they had little or no business selling into. Isilon Systems IQ series operates best when it is used in conjunction with video streaming applications, not in most business environments where random file access is the norm.

On the other end of the spectrum, I had a most unremarkable briefing with SeaNodes. SeaNodes provide clustered software that shares unused capacity on internal hard drives on Linux servers between Linux servers. Now, I thought this idea was dumb five years ago when a company named Monosphere attempted to do something similar for Windows servers. Monosphere has since seen the light and moved on to more intelligent pursuits, so I was dumbfounded that another company would try the same thing.

In SeaNode’s defense, at least they are just shooting for the clustered, high performance Linux server market that uses 500 and 750 GB internal drives where their aggregate of excess storage capacity on internal drives probably reaches the hundreds of TBs. However, users should only look at this technology if they are as geeky as the people who run clustered server computing farms and would rather be saving a few terabytes of storage than trying to figure out how they can squeeze time in their schedule to hit the golf course before the first snow of the season flies.

The budget category conundrum

Successful patterns of behavior are repeated. That adage is as good a reason as any as to why storage managers are reluctant to change their storage buying or management practices. Yet fundamental changes in how underlying data storage technologies work are forcing a subsequent change in storage management and procurement. Now is the time of year to lay the foundation for those changes.

The fourth quarter is typically when storage managers plan their budgets for 2008, but classifying new storage products is anything but cut and dry. The days of using budget categories like “backup software,” “disk” and “tape” are coming to an end as continuous data protection (CDP), data protection and recovery management (DPRM) software, disk cartridges, iSCSI storage systems and storage virtualization emerge. These technologies don’t quite fit into the tidy budget categories that storage managers have used over the years.

Storage managers are re-thinking and re-wording budget categories so category descriptions can be more inclusive of new storage technologies. For instance, ”backup software” and “tape” might become “data protection software” and “data protection hardware,” respectively, while the “disk” category may be described as a “storage network.” Simple wording changes like these can help storage managers prepare their management teams for the fact that new storage technologies are coming.

Bringing new storage technologies into a company is never an easy task and the larger the company, the more difficult it becomes. However, sticking to storage technologies that have worked in the past is increasingly the wrong way to manage storage. Using new storage technologies, companies stand to get more mileage out of their storage while becoming more efficient in how they manage it. Examining and changing the wording in your budget is a simple way to start the process of change without putting either yourself, or your company, at undue risk.

CDP’s evolution takes shape

The evolution of the use of continuous data protection in companies is taking shape. BakBone Software’s inclusion of CDP as a new feature in its NetVault:Backup 8.0 release puts it in the growing number of products such as Asigra’s TeleVaulting and InMage Systems DR-Scout that use CDP to protect Windows and Linux servers.

The rationale for including CDP in backup is simple. Easy backup and recovery of standalone Linux and Windows servers remains a significant challenge for administrators. Companies still have too many of this class of servers with too few administrators, who are struggling to provide a cost-effective means to backup and recover this class of servers.

Using CDP as part of the backup client addresses this issue on several fronts. It replicates data to disk locally and remotely; it provides for fast point-in-time recoveries at any past point-in-time (typically 3 - 30 days); and by creating and keeping a complete copy of the data on disk on another host, administrators can manipulate this copy of data in multiple ways.

Read more »