Posts Tagged ‘Pig’

Cloudera, Hortonworks, and MapR: Comparing The Top Three Hadoop Distributions

Posted on: August 14th, 2015 by Daniella Lundsberg

As leading companies look for easier and more efficient ways to analyze and use the massive amounts of disparate data at their disposal, Apache Hadoop rises to the occasion. Hadoop is a powerful software framework that makes it possible to process large data sets, doing so across clusters of computers. This design makes it easy to quickly scale-up from a single server to thousands. With data sets distributed across commodity servers, companies can get up and running fairly economically and without the need of high-end hardware. What makes Hadoop even more attractive is the fact that it’s open source.


But Apache’s standard open source software is far from an out-of-box solution, with several restrictions and developments required to make it enterprise-ready. Hadoop excels at running complex analytics against massive volumes of data. But, as a batch-and-load system, it lags in its ability to run near real-time analytic queries. It also lags when it comes to streamlined data management and data governance. Luckily its adaptable, modular architecture make it relatively easy to add new enhancements and functionalities.


As a natural evolution, a number of companies have stepped in to build on Hadoop’s framework to make it enterprise-ready. They’ve adjusted its code and bundled it together with sleek, user-friendly management tools and installers along with related technologies of their own, routine system updates, user training, and technical support. The most recognized of these Hadoop distributions are Cloudera, Hortonworks, and MapR.




Cloudera Inc. is one of the oldest and most widely known Hadoop distributions touting the strongest client base and market penetration. Cloudera was founded in 2008 by leaders in the big data industry from companies like Google, Facebook, and Oracle. Cloudera offers both its open source distribution, called Cloudera Distribution for Hadoop (CDH), and its proprietary Cloudera Management Suite. The company leverages its open-source distribution by offering paid support and services. To differentiate itself, Cloudera also provides proprietary value-added components.


Setting Cloudera apart is its proprietary Management Suite, which includes sought-after features like wizard-based deployment, dashboard management, and a resource management module to simplify capacity and expansion planning. Cloudera’s long-term objective, says the company, is to become an enterprise data hub, which reduces the data warehouse need for companies that depend on it. Largely Cloudera is open source with just a few proprietary components, with its open source CDH distribution running on a Windows server. This benefits users looking to minimize the risk of vendor lock-in and protects the ability to switch to a different Hadoop distribution at a later date with relative ease. Cloudera users include recognized brands like Groupon.




Hortonworks is a newer player on the market founded in 2011 as an independent company spun-off from Yahoo, which maintains the Hadoop infrastructure in-house. Hortonworks focuses solely on providing an open source platform and is the only commercial vendor to do so, with MapR offering only a proprietary distribution and Cloudera offering both proprietary and open source. Its primary offering, Hortonworks Data Platform (HDP), is built upon Apache Hadoop and is enterprise-ready complete with training and other support services.


Setting Hortonworks apart is the fact that it is a completely open enterprise data platform that’s free to use. This could lead to much faster improvements and updates. Its HDP2.0 distribution may be downloaded directly from their website and easily installed. Because Hortonworks is open source, it can be integrated faster and easier. Hortonworks is currently in use by Ebay, Bloomberg, Spotify, and Samsung Electronics.




MapR provides a complete Hadoop distribution, though not based on Apache Hadoop itself, taking a notably different approach than Cloudera and Hortonworks. MapR has made Hadoop enterprise-grade by adding its own IP and enhancements that make it faster, more dependable, and more user friendly. Having altered the file system, MapR is offered solely as a proprietary solution. Additional functionality may be added using Apache’s open source Drill, Spark, and Solr. The company has bundled its solution with supplementary services including training and technical support.


Setting MapR apart is its ease of use, enterprise-grade features, and reliability. The company also claims to be the only distribution offering full data protection with no single points of failure. The proprietary MapRFS file system is more production-ready, with implementation differing slightly from its counterparts due to the fact that it is written not in Java but, instead, in C. MapR is a complete distribution that includes Pig, Sqoop, and Hive with no Java dependencies, independent of the Apache Software Foundation. It’s currently in use by leading companies including Cisco, Boeing, and


Choosing The Right Distribution


How much importance does your company place on technical support, expanded functionality, and system dependability. Are you looking to embrace the flexibility of open source to mitigate the risk of vendor lock-in, or does your company need a solution that can make a rapid impact on business and overall profitability?


Though similar in several ways, each vendor has its own strengths and weaknesses. When choosing the distribution that’s right for your organization, consider the added value offered by each option while balancing cost and risk. Companies will also want to weigh performance, scalability, reliability, data access, and manageability with both their short- and long-term goals.


American Digital- We make big data meaningful.


All of the records and files and facts and figures you’ve amassed over decades offer tremendous value in the form of new revenue and business opportunities. To unlock that value, though, businesses need an advanced and scalable technology solution.

American Digital helps organizations tap into the value of their big data assets, optimizing data and converting it into actionable real-time reports and analytics accessible through one administrative dashboard that’s viewable on any PC or mobile device. We work with all industries – from healthcare organizations that constantly update patient records to online retailers tracking ecommerce orders and social media reviews. Our solutions provide the means to easily collect infinite amounts of data by the minute and optimize it for real-time analysis. Get a complete picture, with essential insight gleaned from existing data at rest and data in motion.

American Digital manages the entire process – from planning through solutions design, implementation, and governance. Shift from a business intelligence to a big-data focused organization supported by a scalable solution able to unite disparate data formats and types. Improve decision-making, quickly identify business trends, and mitigate risk with a richer and more interactive analytics environment.

Contact Us

Learn More About Us