Posts Tagged ‘Hadoop’

Cloudera, Hortonworks, and MapR: Comparing The Top Three Hadoop Distributions

Posted on: August 14th, 2015 by Daniella Lundsberg

As leading companies look for easier and more efficient ways to analyze and use the massive amounts of disparate data at their disposal, Apache Hadoop rises to the occasion. Hadoop is a powerful software framework that makes it possible to process large data sets, doing so across clusters of computers. This design makes it easy to quickly scale-up from a single server to thousands. With data sets distributed across commodity servers, companies can get up and running fairly economically and without the need of high-end hardware. What makes Hadoop even more attractive is the fact that it’s open source.


But Apache’s standard open source software is far from an out-of-box solution, with several restrictions and developments required to make it enterprise-ready. Hadoop excels at running complex analytics against massive volumes of data. But, as a batch-and-load system, it lags in its ability to run near real-time analytic queries. It also lags when it comes to streamlined data management and data governance. Luckily its adaptable, modular architecture make it relatively easy to add new enhancements and functionalities.


As a natural evolution, a number of companies have stepped in to build on Hadoop’s framework to make it enterprise-ready. They’ve adjusted its code and bundled it together with sleek, user-friendly management tools and installers along with related technologies of their own, routine system updates, user training, and technical support. The most recognized of these Hadoop distributions are Cloudera, Hortonworks, and MapR.




Cloudera Inc. is one of the oldest and most widely known Hadoop distributions touting the strongest client base and market penetration. Cloudera was founded in 2008 by leaders in the big data industry from companies like Google, Facebook, and Oracle. Cloudera offers both its open source distribution, called Cloudera Distribution for Hadoop (CDH), and its proprietary Cloudera Management Suite. The company leverages its open-source distribution by offering paid support and services. To differentiate itself, Cloudera also provides proprietary value-added components.


Setting Cloudera apart is its proprietary Management Suite, which includes sought-after features like wizard-based deployment, dashboard management, and a resource management module to simplify capacity and expansion planning. Cloudera’s long-term objective, says the company, is to become an enterprise data hub, which reduces the data warehouse need for companies that depend on it. Largely Cloudera is open source with just a few proprietary components, with its open source CDH distribution running on a Windows server. This benefits users looking to minimize the risk of vendor lock-in and protects the ability to switch to a different Hadoop distribution at a later date with relative ease. Cloudera users include recognized brands like Groupon.




Hortonworks is a newer player on the market founded in 2011 as an independent company spun-off from Yahoo, which maintains the Hadoop infrastructure in-house. Hortonworks focuses solely on providing an open source platform and is the only commercial vendor to do so, with MapR offering only a proprietary distribution and Cloudera offering both proprietary and open source. Its primary offering, Hortonworks Data Platform (HDP), is built upon Apache Hadoop and is enterprise-ready complete with training and other support services.


Setting Hortonworks apart is the fact that it is a completely open enterprise data platform that’s free to use. This could lead to much faster improvements and updates. Its HDP2.0 distribution may be downloaded directly from their website and easily installed. Because Hortonworks is open source, it can be integrated faster and easier. Hortonworks is currently in use by Ebay, Bloomberg, Spotify, and Samsung Electronics.




MapR provides a complete Hadoop distribution, though not based on Apache Hadoop itself, taking a notably different approach than Cloudera and Hortonworks. MapR has made Hadoop enterprise-grade by adding its own IP and enhancements that make it faster, more dependable, and more user friendly. Having altered the file system, MapR is offered solely as a proprietary solution. Additional functionality may be added using Apache’s open source Drill, Spark, and Solr. The company has bundled its solution with supplementary services including training and technical support.


Setting MapR apart is its ease of use, enterprise-grade features, and reliability. The company also claims to be the only distribution offering full data protection with no single points of failure. The proprietary MapRFS file system is more production-ready, with implementation differing slightly from its counterparts due to the fact that it is written not in Java but, instead, in C. MapR is a complete distribution that includes Pig, Sqoop, and Hive with no Java dependencies, independent of the Apache Software Foundation. It’s currently in use by leading companies including Cisco, Boeing, and


Choosing The Right Distribution


How much importance does your company place on technical support, expanded functionality, and system dependability. Are you looking to embrace the flexibility of open source to mitigate the risk of vendor lock-in, or does your company need a solution that can make a rapid impact on business and overall profitability?


Though similar in several ways, each vendor has its own strengths and weaknesses. When choosing the distribution that’s right for your organization, consider the added value offered by each option while balancing cost and risk. Companies will also want to weigh performance, scalability, reliability, data access, and manageability with both their short- and long-term goals.


American Digital- We make big data meaningful.


All of the records and files and facts and figures you’ve amassed over decades offer tremendous value in the form of new revenue and business opportunities. To unlock that value, though, businesses need an advanced and scalable technology solution.

American Digital helps organizations tap into the value of their big data assets, optimizing data and converting it into actionable real-time reports and analytics accessible through one administrative dashboard that’s viewable on any PC or mobile device. We work with all industries – from healthcare organizations that constantly update patient records to online retailers tracking ecommerce orders and social media reviews. Our solutions provide the means to easily collect infinite amounts of data by the minute and optimize it for real-time analysis. Get a complete picture, with essential insight gleaned from existing data at rest and data in motion.

American Digital manages the entire process – from planning through solutions design, implementation, and governance. Shift from a business intelligence to a big-data focused organization supported by a scalable solution able to unite disparate data formats and types. Improve decision-making, quickly identify business trends, and mitigate risk with a richer and more interactive analytics environment.

The Anatomy of a Big Data Solution for Enterprise

Posted on: July 13th, 2015 by Daniella Lundsberg

The entire point of Big Data is to unlock the value that will drive better decision-making, higher operational efficiencies, customer loyalty, behavioral insight, and a host of other business outcomes that can positively impact your organization’s bottom line. Getting there requires an infrastructure that can collect, store, access, analyze and manage all the various forms of data inside your servers, or in the cloud, and allow you to convert it into actionable intelligence. And, by the way, it needs to integrate into your existing environment.

Hardware, Software, Platforms

IT infrastructure has evolved over decades from mainframes that handled yesteryear’s version of high-volume transactions, to online transactional processing (OLTP) databases –that became widely accessible in the form of CRM, ERP and e-commerce systems, to data warehouses that incorporated all of this transactional data with software for analytical insight – the rise of Business Intelligence (BI). Today, the evolution continues with HP ConvergedSystems bringing together compute, storage and networking, (including HP ConvergedSystem for Big Data optimized for SAP HANA), and HP Vertica Big Data Analytics Platform.

HP Vertica is a standards-based relational database that supports SQL, JDBC/ODBC and tightly integrates all popular BI and visualization tools. It can handle SQL and Big Data analytic workloads at 30 percent of the cost of traditional data warehouse solutions. It runs queries 50-1,000x faster, boasts petabyte-scale storage up to 30x more data per server, and offers the openness and simplicity to use any BI/ETL tools and Hadoop. Organizations can use HP Vertica to manage and analyze massive volumes of data quickly and reliably without the limits or compromises of traditional enterprise data warehouses.

Healthcare industry innovators, business collaboration platform providers, multimedia entertainment companies, and mobile game developers are among the legions of HP Vertica believers. (See for yourself.)

How About Hadoop?

The HP Vertica Analytics Platform and Hadoop are highly complementary systems for Big Data analytics. While HP Vertica is ideal for interactive, blazing-fast analytics, Hadoop is well suited for batch-oriented data processing and low-cost data storage. When used together organizations benefit from the synergies of the most powerful set of data analytics capabilities, extracting significantly higher levels of value from massive amounts of structured, unstructured, and semi-structured data.

Avoid Frankenstein’s Monster

Every organization’s technology environment and business requirements are unique, giving rise to the need for tailored solutions. Simply bolting parts onto your existing environment could cause the villagers to revolt. Before you embark on your Big Data quest, consult with a partner like American Digital to understand what embodies a successful enterprise implementation.


Join us in Chicago on July 23 at our Big Data Symposium. Meet tech execs from HP, SAP, Hortonworks, and Big Data guru & Fortune 50 consultant, Chris Surdak. Register here.

How Do You Know When It’s Time To Tackle Big Data?

Posted on: July 9th, 2015 by Daniella Lundsberg

Organizations ranging from healthcare, to education, to manufacturing, and Fortune 1000 companies across a wide swath of industries are candidates for Big Data implementations. Now is the time for information technology and line of business leaders to come together to understand when and how to tackle Big Data.

What exactly is Big Data?

Big Data is characterized by the four “Vs”:

Volume – the vast amount of data that is generated every second (This infographic1  illustrates where all this data comes from.)

Variety – the different forms of data, both structured–like patient health records, and unstructured–like your fitness tracker’s stats

Velocity – the speed at which data is generated and the speed at which data moves around– think skateboarding-cat viral videos

Veracity – the trustworthiness of data, especially that of unstructured data, from social media, for example

You can, and should, also add a fifth “V”:

Value – The “V” that matters most is the ability to turn all that data into business value.

How can I extract value from my Big Data?

More than ever, companies are trying to understand how Big Data can help their organizations operate more efficiently and better serve their customers. It is vitally important to determine the requirements of each line of business and develop use cases that illustrate real-world scenarios. For instance, an industry use case shows how healthcare and life sciences companies can use the HP Big Data Platform to improve patient analytics through multiple aspects of operations. The value is seen in many areas:

  • Patient outcomes can be improved by using analytics to prevent complications, increase the effectiveness of treatments, and manage predictive care.
  • Organizations can generate all the metrics they need at a moment’s notice to stay compliant with healthcare reform mandates.
  • Deep insights from real-time analyses of clinical data can help inform medical researchers.

(See use cases for Financial Services, Public Sector and other industries here.)

Where do I start?

Before you undertake a Big Data initiative, consider what kind of business value you want to derive and consult with an expert, like American Digital, who can help your organization tap into the value of your Big Data assets. We provide everything you need to profit from Big Data from assessments to strategic planning and use case development. We work with top technology partners like HP, SAP, and Hortonworks to provide custom Big Data & Analytics solutions from design to implementation and governance. As a Platinum HP partner, we are certified to support and implement Big Data solutions including HP’s Vertica, a versatile offering that allows clients access to an analytics platform that is designed to exploit a wide variety of data while enabling them to accelerate business value from simplified reporting and analysis processes.

Find out if it’s time for your organization to tackle Big Data. We’re here to help.

If you’re in Chicago on July 23, join us for our free 2015 American Digital Big Data Symposium. Tech execs from HP, SAP and Hortonworks will be in attendance. Plus, you can meet Big Data guru, Fortune 50 consultant, and rocket scientist, Chris Surdak. Register here.

1  Data Never Sleeps 2.0, DOMO




Contact Us

Learn More About Us