Storage Soup - A SearchStorage.com blog

Storage Soup:

 

A SearchStorage.com blog


A data storage blog offering commentary on the storage industry, as well as a behind-the-scenes look at developments in storage management, SAN, NAS, backup, disaster recovery and storage strategy.

Buying typical storage for video surveillance? Rethink that!

Up until now you (corporate IT) have not had to worry about video surveillance. That job was up to the security guys, those guys that wore uniforms and pretty much kept to themselves. But be prepared. If you are not already deeply involved in video surveillance equipment RFP creation, acquisition, installation and management, you will be very soon.

The world of video surveillance is changing so rapidly that the user and the traditional supplier are both in a state of frenzy. It is within this transformation that the role of IT is becoming increasingly critical. The reasons for the increase in video surveillance are pretty easy to understand. Post 9/11, enterprises as well as governments are all adding or increasing video surveillance to the security equation. Of course, casinos and banks have always been the leading users of video surveillance, but now everyone is in the game. On a typical day, a person living in a city may be videotaped five or more places, as he drives to work (and passes through specific traffic lights), parks his car in the company parking lot, enters the building, makes a trip to the bank at lunch, grabs a couple of items at the local K-mart and heads home. There are all kinds of privacy issues that can be debated, but I am staying away from that. At least for now. Right now, I am more interested in the technology and IT’s increasing role in video surveillance.

Traditional video surveillance equipment was not designed to deal with this onslaught and is gasping for air. It is being replaced almost completely with IP-based equipment.  That’s where you come in. Until now, most video surveillance equipment was based on CCTV (closed circuit TV), which basically meant the cameras, which recorded analog video, were hooked up via coaxial cable to the central point, where the video was taped on VCRs. Later, DVRs converted the analog signal to digital at the central location before storing it. But, these technologies cannot deal with the onslaught of data from more and more cameras and the fact that cameras are increasingly adding higher resolutions.

The latest crop of cameras records video in a digital format, and compresses it using MJPEG or MPEG before transmitting it over standard IP network to a central location that stores the data on scalable disk arrays. Once in the realm of IP, all the goodies we are used to in IT become available to an industry that still thinks of guards manning physical structures. Centralized management become feasible, data can be accessed asymmetrically, from multiple locations, replicated when appropriate. Another level of sophistication is being added at the end points. Now cameras can be activated when they detect motion or switch into a higher resolution if certain criteria are met. Video analytics allow software to recognize facial characteristics. Searches can be conducted for specific objects or people. You get the idea. It is like James Bond gadgetry becoming available to regular folks. But, that is reflective of the world we live in.

I think you (IT) need to be prepared to play a major role in this transformation that is occurring. You are the resident experts in storage and, at this point, pretty well up on IP technologies as well. Video surveillance simply becomes another application you have to support. So, if you are not already deeply involved in the selection and day to day management of the video surveillance equipment, it is only a matter of time. Security people who used to make decisions on such purchases without any consultation, will now insist on your involvement. You should gladly offer to help.

Another important thing to realize is that the type of storage you end up selecting for these applications will very likely be different than storage for other applications. For video surveillance the attributes that matter for storage include cost effectiveness (dirt cheap), highly scalable across both capacity and performance (cannot afford to create islands of storage), low entry price point, cost effective availability (mirroring may be too expensive), protection from disk drive or nodal failure and, most importantly, it needs to IP-based. Everything else in this environment is IP based, so making storage IP-based makes it easier to understand and manage. FC storage would bring in a level of complexity that is unnecessary here. Also, legacy architectures that have grafted an IP (iSCSI) interface would not cut it here, because they would not meet the other requirements above. Storage players that I believe merit consideration include Pivot3, Intransa, LeftHand Networks and to a lesser degree, EqualLogic (their price point may be too high for this application). There are other inexpensive storage offerings, such as from Nexsan or Xyratex but if an architecture does not allow clustering and presentation of a single system image, as it scales, it misses a criterion that I consider absolutely necessary for this application. However, you may want them in the initial mix as you start the evaluation process. I am sure you have enough on your plate without adding yet another storage-hungry application. But the way the winds are blowing, you either pro-actively plan on this or you will get pulled in pushing and screaming.

At this rate the world will be green in a decade

Not a day goes by that I do not hear from yet another storage or server vendor that their offering, whatever it is, is green. This mania started about a year ago in earnest. Prior to that, the green movement was pretty much restricted to organizations outside of the computer industry. So, what is really going on? What has caused every software and hardware company to suddenly formulate a green message.

I have a theory, and you heard it here first. I think there is a fundamental grassroots movement towards green that has started in the U.S. and it is picking up momentum like no other I have seen in thirty years. This moment is more powerful than the Presidential elections and other important matters facing the country. It is bigger than Exxon and Mobile. It is bigger than GM and Ford. For years, the debate has been raging about global warming. No matter which side of the debate you place yourself, the green movement has begun. And because now it is becoming fashionable, every company in every industry will feel the need to do something “green.” I believe we are now in the phase 1 of this movement. In phase 1, each company takes stock of what they have in their product line and extracts what they can of a green message. Granted most, if not all of these companies had never thought of any of their products in terms of green before. Not in product development and not in marketing. Of course, good design practices prevailed and many resulted in lower power usage or smaller packaging but they were hardly ever viewed within the context of green. So, in Phase 1, what we are seeing is a recasting of the company message incorporating green.

I see it every day. Sometimes I laugh when I see a storage company twisting and turning its message to incorporate green. Even to the point that more than one company has stated to me that they are so green that even their logo has green in it. Give me a break. The logo was done years ago when green was equated to the color of a person’s face when they saw a ghost.

But I frankly don’t care.

I am thrilled just to see the storage companies participate in the green movement. So what if 75% of what I see today is recasting of a message relating to an older product. So what if Manhattan’s energy and space crunch started the ball rolling. I think once the company is committed to the green message they will design their next product accordingly. They can’t escape it. That is why I believe the green movement will have a genuine impact in the next five years. Let the companies play the game. Play along with them. Give them slack for now. Because once they are in, they are in. I love it.

Before you think that there is nothing real in the products today let me restate something. There are technologies that have hit the market in the past three years that are making a serious green impact. Data deduplication is one such technology. It has hit the market on the secondary storage side first, that is, applied to backup/restore and archiving markets. When used in appropriate ways, one can reduce the amount of disk required by a factor of 20. No matter which way you look at it, 1 TB of storage uses a lot less power and requires a lot less cooling than 20 TB. Thin provisioning is another good example. I chose these examples to illustrate a point: it is not simply hardware technologies that deliver green. In fact, at Taneja Group we believe software will play a huge part in the greening of storage and servers. Not to say that hardware wouldn’t play a role. Look at Copan’s MAID technology, for instance. Or IBM and HP’s blade server technology. New techniques for airflow through racks, nanotechnology and new data center designs will all contribute. But, we believe the impact of software technologies will dominate, especially with installed hardware.

Green will soon become a competitive advantage. Because of the financial implications, real change will occur. Soon even the Mobiles and the Exxons will have to yield to the pressure. That is how strong grassroots movements are. I believe the time is here. And I couldn’t be happier.

Note: Recently Taneja Group wrote a Technology In Depth paper on this topic. If you would like a copy please send a request through www.tanejagroup.com.

Cross correlation engines reaching into primary storage

You have seen my writings on (and may even have heard me speak about) Cross Correlation (CC) analytics engine as a necessary part of a Data Protection Management (DPM) product. DPM products make your backup and restore environment work more efficiently. Recently, I have seen the application of CC techniques to solve problems on the primary storage side. And much to my pleasure, I have also seen the technique applied to manage application performance.

Several players are delivering products in the DPM market including Aptare, Bocada, Illuminator, Servergraph, Tek-Tools and WysDM, and most recently, Symantec, with their NetBackup Reporter product. These products, as a category, are delivering real value, based on my conversations with many of you. EMC, who resells WysDM as Backup Advisor, is apparently shipping in large quantities. All big data protection vendors have gotten religion on this recently, and they are all scrambling to add DPM functionality via in-house R&D or through a partnership.

To be sure, not all products are created equal in terms of the strength of the CC engine (or even the existence of one), which to me is the essence of the product. Without a sound CC engine, the best a product can do is rudimentary analysis and basically report on changes.

I have seen two new and interesting uses of CC recently. First, WysDM announced WysDM for File Servers. Essentially, that means the same CC engine is being used to look at NetApp filers (primary storage) to determine if the filer is behaving as it should. Much as before, the product gathers data from the application and through all hardware and software layers that reside between it and the filer, and applies analytics to determine if the system is behaving within acceptable boundaries. Are response times to file requests deteriorating? Is capacity being utilized efficiently? Is a file system ready to run out of storage? What needs to be done to solve the problem? Will an additional GE connection make a difference? You get the point.

I know you are probably saying to yourself, “I get some of that information from filer’s integral management tool?” Of course, you do. But, just like on the data protection side, the amount and type of information about the environment that was being delivered before this tool was available was rudimentary and static. Unless one escapes outside of the filer and looks at the entire picture from end-to-end it is hard to determine the root cause of a problem that exists or is in the making. That can only be done with a sophisticated CC tool. And only a sophisticated tool will give you predictive information with a high degree of confidence.

Another company that has applied CC to the primary storage is Illuminator Software, whose DPM product now includes functionality about snapshots and replication. But, the product is still true to its data protection roots. In this case, the product provides information on the readiness of volumes from a data recoverability point of view. Whether the volume is protected using snapshots or replication or secondary disk or tape, its recoverability is established and reported on. The product also offers advice on the actions necessary to improve recoverability.

The third company, Akorri Networks, has applied a CC engine for an entirely different purpose: to provide insight into application performance. Of course, application recoverability is improved when application availability is improved so there is an underlying connection here. But, the overt focus is to provide insight into how storage resources are being used to deliver a certain level of performance at the application level. In other words, given a particular SLA for an application, does one have adequate or inadequate storage resources applied? Would extra resources (higher throughput storage, more storage, another pipe to storage, etc.) help to bring application performance back into SLA boundaries? Or would it be a waste? What would help the most? With this kind of information the right type and quantity of resources can be applied thus saving time and resources.

The progress in these areas has been truly phenomenal in the last three years, and yet, we are still in infancy stages of utilizing these tools. Most of these technologies have become available from smaller companies, whose reach is limited. Given that your environment is only getting more complex it behooves you to check these out! Send me an email if you need any help.

At the brink of the data deduplication wars

OK folks, the data deduplication war has begun. In the center of the war are vendors such as Data Domain, Diligent Technologies, ExaGrid, FalconStor, Quantum and SEPATON. It is only a question of time before EMC, NetApp and Symantec join the fray. To understand what is happening let’s start with what has happened in the last five years relative to disk-to-disk technologies. The early pioneers included the six vendors mentioned above. Diligent, Quantum and SEPATON brought in their VTL products in the market approximately three years ago and started marketing the value of secondary disk for backup and restore. By making disk look like tape they correctly maintained that the backup procedures would require no alterations and yet you would see vast improvements in backup and restore speeds and reliability. I think all of them have adequately proved that value. Many of you have told me you have seen speed improvements of 3x in backups, with 30-50% improvements very common.

None of these vendors said a thing about data deduplication at the time they entered the market. Data Domain, on the other hand, took a very different tack. They came to market with a disk-based product targeting the same space but focused on data deduplication, front and center. Their premise from the beginning was that by eliminating duplication of data at a sub-file level one could keep months of backup data on disk and therefore have fast access to not just what was backed yesterday but data that was month’s, even years old. When viewed through the data deduplication lens, Data Domain took a lion’s share of the market with 53% of the storage with data deduplication in 2006, according to our estimates. Along with Avamar (now EMC), another data deduplication-centric backup software vendor, they presented an argument for changing the role of tape to that of very long term retention.

The lift off for Data Domain took some time in the market and they focused initially on the SMB market. This was no surprise to me because all paradigm shifting ideas take time to sink in. And frankly, you did the right thing in testing the waters before jumping in head first. But the idea made sense. If you could keeps months of backup data on disk but do it at prices that came close to tape, why wouldn’t you? Once the concept was validated and you built trust in the vendor you started buying hundreds of terabytes of secondary disk.

While Data Domain was pushing the data deduplication, they were also inherently pushing disk as a media for long term storage for backups. At the same time, others were presenting their VTL solutions and convincing you on the merits of secondary disk but without any data deduplication. But, behind the scenes, they all knew they had to add data deduplication as quickly as possible to compete in this nascent but $1B+ market. Each worked on different ways to squeeze redundancy out of backup data.

At the concept level, they all do the same thing. The way full and incremental backups have been done for years, there is a lot of redundancy built in. Take, for instance, the full backups that you typically do once a week. How much of that data is the same week to week? 90% would not be a bad guess. Why keep copying the same stupid thing again and again. Even with incremental backup, existing files that have even a single byte changed is backed up again. Why? It is best to not get me going on that front. I happen to think that the legacy backup vendors did a miserable job on that front. But, we will leave that aside for now. Back to data deduplication. So, the idea is to break the file into pieces and keep each unique piece only once, replacing redundant uses of it with small pointers that point to the original piece. As long as you keep doing full and incremental backups using legacy products from Symantec, EMC (Legato), IBM Tivoli, HP or CA, you will continue to see vast amounts of redundant data that can be eliminated. The value of eliminating this redundant data has been made abundantly clear in the past year by Data Domain customers.

2006 saw data deduplication offerings from the VTL players: Diligent Technologies, FalconStor, Quantum (via its acquisition of ADIC who had just acquired Rocksoft, an Australian vendor focused strictly on deduplication technologies). ExaGrid, an SMB player, uses the NAS interface and had deduplication integral to their product. Each does data deduplication differently. Some using hashing algorithms such as MD5 or SHA-1 or 2. Others use content awareness, versions or “similar” data patterns to identify byte-level differences. Each claims to get 20:1 data reduction ratios and more over time. Each presents its value proposition and achievable ROI, based on its internal testing. Some do inline data deduplication; others perform backups without deduplication first and then reduce the data in a separate process, after the backup is finished. Each presents its solution to be the best. Are you surprised? I am not.

What is clear to me is the following:

1. The value proposition of using disk for backup and restore is clear. No one can argue that anymore. The proof points are abundant and clear.

2. The merits of data deduplication are also abundantly clear.

3. However, the merits of various methods of data deduplication and the resultant reduction ratios achieved are not clear to you today (in general).

4. The market for these is huge (Taneja Group has projected $1022M for capacity optimized (i.e. with data deduplication) version of VTL and $1,615M for all capacity optimized version of disk-based products in 2010)

5. Both VTL and NAS interface will prevail. The battlefront is on data deduplication.

6. Vendors will do all they can in 2007 to convince you of their solution’s advantages. Hundreds of millions of dollars are at stake here.

7. By the end of this year we will see the separation between winners and losers. Of course, without de-duping I believe a product is dead in any case.

So, be prepared to see a barrage of data coming your way. I am suggesting to the vendor community that they run their products using a common dataset to identify the differences in approaches. I think you should insist on it. Without that, the onus is on you to convert their internal benchmarks to how it might perform in your environment. You may even need to try the system using your own data. This area of data protection is so important that I think we need some standard approach. We are doing our part in causing this to happen. You should do yours.

I think we have just seen the beginnings of a war between vendors on this issue alone. To make matters even more interesting we will see EMC apply the data deduplication algorithms from their Avamar acquisition to other data protection products, may be even the EMC Disk Library product (OEM’d from FalconStor). I expect NetApp to throw a volley out there soon. Symantec has data deduplication technology acquired from DCT a few years ago, but currently only applied to their PureDisk product. IBM and Sun, both OEMs of FalconStor may use Single Instance Repository (SIR) from FalconStor or something else, no one is sure. I certainly am not. But, I am certain that none of the major players in the data protection market dare stay out of this area.

Data deduplication is such a game changing technology that the smart ones know they have to play. What I can say to you is simple: Evaluate data deduplication technologies carefully before you standardize on one. Three years from now, you will be glad you did. Remember that for your environment whether you get 15:1 reduction ratio or 25:1 will translate into millions of dollars in terms of disk capacity purchased. I will be writing more about the subtle differences in these technologies. So stay tuned!