Breaking News
Home / Components / Analytic Frameworks

Analytic Frameworks

Apache Apex

AT A GLANCE A Hadoop YARN native big data processing platform, Apex is a enabling real time stream as well as batch processing for your big data. Apex provides the following benefits: High scalability and performance Fault tolerance and state management Hadoop-native YARN & HDFS implementation Event processing guarantees Separation …

Read More »

Apache Zeppelin

zepplin

AT A GLANCE Said to be a collaborative data analytics and visualization tool for Apache Spark, Apache Flink, Apache’s incubating Zeppelin project is a web-based tool for data scientists to collaborate over large-scale data exploration. Zeppelin is independent of the execution framework, and its interpreter allows any language or data-processing …

Read More »

Apache Kudu

Apache Kudo

AT A GLANCE A new addition to the open source Apache Hadoop ecosystem, Apache Kudu (incubating) completes Hadoop’s storage layer to enable fast analytics on fast data. Currently, a limited-functionality version of Kudu is available as a Beta. Like most modern analytic data stores, Kudu internally organizes its data by …

Read More »

Pivotal HAWQ

Pivotal HAWQ

AT A GLANCE Pivotal‘s Hawq is a closed-source product offered as part of their PivotalHD stack, their proprietary distribution of Hadoop. Claiming that Hawq is the ‘worlds fastest SQL engine on Hadoop’ and that it has been in development for 10 years. PROS Full SQL syntax support Interoperability with Hive …

Read More »

Apache Ignite

Apache Ignite

AT A GLANCE The new (late 2014) web site says that Apache’s Ignite in-memory fabric is a “high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.” The goal of an In-Memory …

Read More »

Actian (ParAccel)

ParAccel

AT A GLANCE Actian boldly claims that its SQL-on-Hadoop product, formerly known as Paraccel is the “#1 SQL in Hadoop Analytics Platform“. Actian positions their new platform as an industrialized platform that is priced disruptively and delivers rich, easy to use functionality. In particular that the new platform is not …

Read More »

Twitter Summingbird

Summingbird is a library that lets you write MapReduce programs that look like native Scala or Java collection transformations. So, while a word-counting aggregation in pure Scala might look like this: def wordCount(source: Iterable[String], store: MutableMap[String, Long]) = source.flatMap { sentence => toWords(sentence).map(_ -> 1L) }.foreach { case (k, v) …

Read More »

Apache Drill

Apache Drill

AT A GLANCE Apache’s Drill is the open source version of Google’s Dremel system which is available as an infrastructure service called Google BigQuery. In recent years open source systems have emerged to address the need for scalable batch processing (as in Hadoop) and stream processing (Storm, Apache S4). Hadoop, …

Read More »

Kangaroo

Kangaroo is an open-source project from Conductor for writing MapReduce jobs consuming data from Kafka. The introductory post explains Conductor’s use case—loading data from Kafka to HBase by way of a MapReduce job using the HFileOutputFormat. Unlike other solutions which are limited to a single InputSplit per Kafka partition, Kangaroo …

Read More »

Pydoop

Pydoop is a Python MapReduce and HDFS API for Hadoop, built upon the C++ Pipes and the C libhdfs APIs, that allows to write full-fledged MapReduce applications with HDFS access. Pydoop has several advantages over Hadoop’s built-in solutions for Python programming, i.e., Hadoop Streaming and Jython: being a CPython package, …

Read More »