A funny thing happened on East Carolina University's journey to creating a data-retention strategy. As part of a compliance project launched one and a half years ago, Brent Zimmer, systems specialist at the university, was working with attorneys and archivists to determine which data was most important to keep and for how long. But it soon became clear that it was just as important to identify which data should be thrown away.
The issues associated with legal and compliance risks are often ambiguous, and few organizations have a process to accommodate a web of requirements for data retention.
Zimmer was aware of the importance of being able to quickly produce required information during litigation, "but the thing we never thought about was keeping data too long," he says. The risk is keeping data that you wouldn't otherwise be required to produce, but as long as it's discoverable, it could be used as evidence against you.
Like many organizations, East Carolina had its share of data to purge. "We never made anyone throw away anything unless they ran out of space on their quota," Zimmer says. Some users, he says, had e-mail dating back to 1996.
East Carolina is not unusual; many organizations hang on to more data than they need, for much longer than they should, according to John Merryman, services director at GlassHouse Technologies, a storage services provider. One reason is fear. "Companies are really sensitive because there's a perceived underhandedness to purging data," he says. "People might wonder, 'Why aren't you keeping all your records?'"
Another is the low cost of storage. Organizations have historically preferred to buy more disks than spend time and resources sorting through what they do and don't need. "Many people would prefer to throw technology at the problem than address it at a business level by making changes in policies and processes," says Kevin Beaver, founder of Principle Logic.
But thanks to e-discovery risk and burgeoning data volumes - 20 percent to 50 percent compound annual growth rate for some companies - the tide is starting to turn, according to Merryman. The average cost companies incur for electronic data discovery ranges from $1 million to $3 million (about Rs 5 crore to Rs 15 crore) per terabyte of data, according to GlassHouse.
A recent report from Gartner concurs. It states that the current explosion of data is outpacing the decline in storage prices, even before the resource costs for maintaining data are taken into account.
Estimating that the average employee might generate 10GB per year, at a cost of $5 (about Rs 250) per gigabyte to back it up, Gartner says a 5,000-worker company would face annual costs of $1.25 million (about Rs 6 crore) for five years of storage. And considering that many companies maintain multiple copies of data, thanks to test data, operational data and disaster recovery copies, not to mention back ups, "there's an explosion of data in most companies," Merryman says.
Aside from the costs, keeping all those records indefinitely is a gold mine for attorneys looking for evidence, he adds.
One way to address this problem is to set retention policies that reduce exposure to legal problems. But don't try to boil the ocean, Merryman advises. Instead, create policies from the application or business level down, rather than looking across the whole data landscape and letting policy bubble up. Also, create black-and-white rules that are easy to deal with.
Archiving On The Rise
For instance, roll all data types - such as e-mail, application and file data - into 10 to 30 categories of big-picture policies rather than hundreds of granular ones. "You need broader rules like 'accounting data needs to be retained six years,' not 'this annual report needs to be retained [for] five years,'" he says.
According to research from Enterprise Strategy Group, the average required retention period for files, e-mails and databases is on the rise. Most companies retain data for four to 10 years, says Brian Babineau, a senior analyst at ESG.
East Carolina University started with the low-hanging fruit, setting retention and purging policies for e-mail, medical records and security video. It archived that data on a new system based on Symantec's Enterprise Vault storage management software and EMC's Centera content-addressed storage (CAS) array. E-mails from the chancellor or dean are saved for seven years, Zimmer says, while faculty and staff e-mail gets purged after three years.
Meanwhile, security video is archived for 30 days - a good thing, since university police collect a terabyte per day. Patient records from
the medical school need to be kept for 20 years after the patient is deceased. Beyond that, the job will get more difficult, Zimmer acknowledges. "There's a lot of other stuff that we don't know the retention [requirements] for, so that will be trickier," he says.
The key to reducing data volumes, Gartner says, is a process called content valuation, which involves examining factors such as authorship authority, usage patterns, nature of content and business purpose.
According to Gartner, there are many ways to approach content valuation, including electronic records management, content management, enterprise search to identify what's a record and what's not, legal preservation software and policy management.
Archiving on the Rise
Partly because of increased data retention activity, companies are increasingly implementing disk-based archiving tiers in their storage architectures. This is a better place to retain data than tape back up systems, Babineau says, because the data is indexed, searchable and stored in single-instance format, all of which makes it easier to find what you need during e-discovery.
According to Robert Stevenson, managing director of storage research at The InfoPro in New York, archiving tiers have seen a 54 percent annual growth rate among users surveyed vs. 20 percent for tier 1 monolithic storage and 40 percent growth for tier 2 modular storage.
In the past three years, e-mail archiving has grown, with 48 percent of survey respondents saying they use it today vs. 39 percent two-and-a-half years ago. Database archiving is also up, with 36 percent using it vs. 21 percent two-and-a-half years ago.
Another reason for archiving growth is that companies are relying less on back up tapes for retention and more on disk-based storage. "Discovery is a difficult task, and if you have multiple copies in the back up environment, it's extremely expensive to retrieve, index, search and take it through the pre-production process of culling and narrowing down results," Merryman says. "It can turn discovery into a multi-million-dollar project."
The Urge to Purge
The Urge to Purge
The seemingly simplest way to reduce data volumes is to delete the data you don't need. But this is much more easily said than done. The fact is, according to Merryman, outside of e-mail, the status quo is to do nothing. "Most legacy applications have never purged data, and new applications are rarely designed to accommodate purging," he says.
Not to mention, he says, deleting production data is complicated. In addition, the issues associated with legal, compliance and operational risks are often ambiguous, and few organizations have a process to accommodate a web of requirements for data retention.
"If you look at legacy data outside the application world, a lot of people have no idea what it is, but they're scared of getting rid of it," he says.
At one large bank in New York, Merryman says, he ran across hundreds of file extensions that no one knew about, as well as data inaccessible by currently maintained applications or interfaces.
The important thing is to start setting purging policies now rather than trying to apply them to old data. "If you address high-risk, high-volume applications and databases, you'll address 90 percent of the risk," he says. "If you target all 700 applications in your environment, you'll never get it done."
In fact, in a tiered storage environment, Merryman says, the business case is much better when you purge data rather than simply archiving it on lower cost disk.
"The cost of perpetually managing and refreshing huge amounts of data that's never been culled or purged is extremely high," he says. "So if you come up with a strategy to tier 70 percent of your data to cheap storage, and then you factor in the cost of managing, backing up and protecting it for disaster recovery, it's expensive."
Unfortunately, he says, most companies that develop tiering strategies figure they'll purge at some time in the future. "But that's the problem with purge," he says. "It's always 'later,' like cleaning out the basement."
Another difficulty with purging is the lack of a guarantee that you've deleted all instances of the data set. You might think you deleted all your old e-mail, but it may be stored on tape from two years ago, so it still exists. "Some companies figure if you can't delete it consistently, don't delete it at all because it's probably somewhere that no one knows about," Babineau says.
Still, he says, "if you invest in technology that helps you retain data, why not invest in technology that helps expire data when you don't need it anymore?"
For instance, all archiving systems have a 'delete' function, Merryman says, but no single product can purge data across all data types, such as messaging, unstructured and structured data.
Merryman's advice: First identify vendors with proven technologies, and then look at emerging vendors. Second, he says, see if the vendors support or plan to support SNIA Archiving Standards being developed by the 100-Year Archive Task Force. "This body of standards is young," he says, "but it's the only industrywide effort to standardize archiving methods." CIO
- Page 1 : Store Or Throw?
- Page 2 : Archiving On The Rise
- Page 3 : The Urge to Purge
Related Articles
Latest Articles