Hadoop: How Open Source can Whittle Big Data to SizeAdded 3rd Mar 2012
In 2011 'Big Data' was, next to 'Cloud', the most dropped buzzword of the year. In 2012 Big Data is set to become a serious issue that many IT organisations across the public and private sectors will need to come to grips with.
The challenge essentially comes down to this: How do you store the massive amounts of often-unstructured data generated by end users and then transform it into meaningful, useful information?
One tool that enterprises have turned to to help with this is Hadoop, an open source framework for the distributed processing of large amounts of data.
Hadoop lets organisations "analyse much greater amounts of information than they could previously," says its creator, Doug Cutting. "Hadoop was developed out of the technologies that search engines use to analyse the entire Web. Now it's being used in lots of other places."
In January this year Hadoop finally hit version 1.0. The software is now developed under the aegis of the Apache Software Foundation.
"The releases coming this year will effectively become Hadoop 2.0," Cutting says. "We're going to see enhanced performance, high-availability and an increased variety of distributed computing metaphors to better support more applications. Hadoop's becoming the kernel of a distributed operating system for Big Data."
Hadoop grew out of Nutch, a project to build an open source search engine Cutting was involved in. Development of Nutch is also conducted under the patronage of the Apache Software Foundation.
"The Hadoop ecosystem now has more than a dozen projects around it," says Cutting. "This is a testament to the utility of the technology and its open source development model. Folks find it useful from the start. Then they want to enhance it, building new systems on top.
"Apache's community-based approach to software development lets users productively collaborate with other companies to build technologies from which they can all profitably share."
Hadoop setups are available from big names in the Cloud computing space, including Amazon (through Amazon Elastic MapReduce) and IBM; in December Microsoft announced a "limited preview" of Hadoop on its Windows Azure Cloud service. Hortonworks, a company set up by Yahoo (which runs a 42,000-node Hadoop environment and is a key driver of the project), and Cloudera, which employs Cutting as chief architect, also offer Hadoop-related services.
Cloudera offers a distribution of Big Data software called CDH -- Cloudera's Distribution Including Apache Hadoop. "This is open-source, Apache licensed software," Cutting says. "Folks can develop their applications against these APIs without fear of ever being locked into paying any one vendor.
The company sells support and a licence to its proprietary software, Cloudera Manager, which helps deploy and monitor CDH. The Oracle Big Data Appliance, released in January, runs CDH.
"Appliances are a great way to get a customer in the door, but most folks end up buying a customised cluster," Cutting says. "Some folks may find the appliance itself to be the right solution, but more frequently people want something that's more suited to their particular uses.
"Folks tend to start with a small proof-of-concept system, perhaps 10 or 20 nodes. Once they've gained some experience with this then they have an idea of both how big their production system needs to be and what its bottlenecks are. This informs the balance of storage, compute, memory and networking that will serve them best.
"Over time, as workloads evolve and grow, folks may gravitate towards common configurations, but we're not yet seeing a lot of one-size-fits-all solutions."
Cutting says when he started Hadoop, which was named after his son's toy elephant, he didn't realise just how significant the project would end up being. "I thought it would probably be useful to lots of folks, but I didn't think much about how many or how they might use it," Cutting says. "I certainly didn't think that it would become the central component of a new paradigm for enterprise data computing.
However, the software is "ultimately the product of a community," he adds. "I contributed the name and parts of the software and am proud of these contributions. The Apache Software Foundation has been a wonderful home for my work over the past decade, and I am pleased to be able to help sustain it."
Cutting uses the example of a hypothetical large retailer to explain what Hadoop can do with an enterprise's data: "Instead of just being able to analyse national sales over the past month, it can with Hadoop analyse sales trends over many years. This lets them better manage pricing, inventory and other core aspects of their business: They get a higher resolution picture of their business.
"Similarly, credit card companies can better guess whether a transaction is fraudulent, banks can better guess whether someone is credit worthy, oil companies can better guess where to drill, and so on. In nearly every case they can use data they were formerly discarding to improve the quality and profitability of their products."
Cutting predicts continued exponential growth in Big Data analytics. "We're still in the steep part of the adoption curve and will be for at least a few more years," he says.
"It will be a while before growth merely tracks that of the larger economy. Developing economies like China and India will fuel continued growth in this space."
In the government sphere, adoption of Big Data technologies has been mixed, Cutting says: Intelligence communities have been early adopters, but other parts of government may not have even begun grappling with it.
"Even folks who are already using these technologies will continue to expand their use for years, incorporating data from new sources and finding new applications," he adds. "We're still at an early stage of the adoption curve.
"Most industries are currently dipping their toes into Big Data. The ones to watch are the industries we expect to grow the most. For example, healthcare and telecom create huge amounts of data that's not yet used as effectively as it could be."
Cisco today announced Managed Threat Defense, a set of security services for the enterprise that Cisco is providing through two new operations centers to remotely support intrusion-detection, incident response and forensics, among other services.
Sometimes a security patch isn't all it's cracked up to be. The security researcher who first found a vulnerability affecting more than 20 different router models says the patch meant to fix it only hides the initial weakness and doesn't remove it whatsoever.
Salesforce.com recently celebrated its 15th year in existence, and as the SaaS (software-as-a-service) vendor races toward US$5 billion in revenue its influence on the industry is being felt more than ever. At the same time, some signs indicate that Salesforce.com is having a few growing pains, as well as showing some trappings of the mega-vendors it once mocked with its "End of Software" marketing campaign.
Can your tablet withstand a 2-meter drop or be submerged in water for 30 minutes and keep functioning? The new $5,000 tablets from Xplore Technologies can.
A malware campaign of yet-to-be-determined origin is infecting jailbroken iPhones and iPads to steal Apple account credentials from SSL encrypted traffic.
Mainframe operators using BMC software may now be able to enjoy the speedy, devops-style development pace that is quickly becoming the norm for customer-facing mobile applications and Internet services.
Dell released a new virtualized storage accelerator appliance called Fluid Cache for SAN on Tuesday, designed to help customers keep data-intensive applications working quickly under load.
Gone are the days when a company could deploy a standalone security appliance to protect an entire network, McAfee network security general manager, Pat Calhoun says.
Verizon today issued its annual data-breach investigations report, a study of what happened in 1,367 known cases across dozens of industries in 95 countries last year, and the most common form of attack was breaking in through Web applications.
The panic over the Heartbleed bug is proving to be a convenient distraction for hackers using standard techniques in a fresh wave of attacks targeting at least 18 U.S. universities, according to a computer security researcher.
A notorious Windows leaker dubbed 'Wzor' says Microsoft will issue yet another update to Windows 8.1 later this year, evidence of an even-faster acceleration in the company's development tempo.
Jurors in the Apple v. Samsung case have heard a lot of big numbers in the past few weeks.
Wireless technology changes quickly. This matters if you're running a business, as faster Wi-Fi can improve employee productivity as well as customer service. These advances in wireless tech are therefore worth watching.
As CIO at Boeing, Ted Colbert is no stranger to the Internet of Things. For more than a decade, the aerospace giant has deployed thousands of communications-enabled smart devices to sense, control and exchange data across the factory floor, on the battlefield, and within the company's 787 Dreamliner aircraft.
Falling hardware sales and the cost of layoffs hit IBM's profit hard in the first quarter, sending it down 21 percent from a year earlier.