Some thoughts on Apache Hadoop
Apache Hadoop, the Java software framework for distributed storage and processing of huge sets of data, has skyrocketed in popularity since its debut in 2011. Is the hype surrounding Hadoop for real, or will the next five years show that this new technology was only a flash in the pan?
Big Data is Just Getting Started
Only time will reveal the answer, but what is clear is that companies' adoption of novel big data techniques is only in the early stages. A 2014 InformationWeek survey found that only 13 percent of responding organisations had adopted Hadoop, as compared to 75 percent using Microsoft SQL Server and 47 percent using Oracle. Hadoop's popularity is in part due to its open-source license, which inspired large early investments and contributions from powerhouse companies like Yahoo. Since the Hadoop ecosystem currently lacks a viable proprietary competitor, it has grown largely unfettered up until now. Some estimates claim that the Hadoop-related hardware and software industry will grow to over $50 billion annually by 2020.
Hadoop Is Not a Cure-All
Some overzealous adopters of Hadoop have treated it as a magical elixir for any possible problem in data management and analytics. The truth is much less exciting. While Hadoop is great at working with certain types of enormous datasets, it is not suited for situations such as real-time analysis, graph analysis or supercomputing. The recent rise of another Apache project, Spark, as well as of other new open-source technologies like Ceph and Kafka, shows that Hadoop has real drawbacks. For instance, Spark claims that in certain situations it can work up to 100 times faster than Hadoop MapReduce, which represents a stunning jump in productivity.
Hadoop Is One of Many Options
Spark has been called the "next big thing" in big data, a title that Hadoop held not too long ago.
And although Spark can run within the Hadoop ecosystem, there's nothing in particular requiring it to do so. Ceph, an open-source distributed storage system with its own POSIX-compatible file system, was purchased in April 2014 by Red Hat, which may look to incorporate Ceph into Red Hat Linux's own file system.
The question is now perhaps whether Hadoop enthusiasts can work to include these recent open-source technologies within the Hadoop ecosystem, or whether it will be eclipsed over time as its newer competitors catch up and eventually surpass it.