Breaking News

Apache Ambari

ambari

Apache’s Ambari proposes to be an intuitive, easy-to-use Hadoop management web UI backed by RESTful APIs. Ambari was donated by the Hortonworks team to the ASF. It is deployed as an interface for Hadoop and other typical applications from the Hadoop ecosystem. Ambari is under a heavy development, and it …

Read More »

Apache Falcon

Apache Falcon

Apache Falcon is a data management framework for simplifying data lifecycle management and processing pipelines on Hadoop. It enables users to configure, manage and orchestrate data motion, pipeline processing, disaster recovery, and data retention workflows. Instead of hard-coding complex data lifecycle capabilities, Hadoop applications might rely on Falcon for these …

Read More »

Apache Oozie

Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability. Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs …

Read More »

LinkedIn Norbert

Norbert is a library that provides easy cluster management and workload distribution. With Norbert, you can quickly distribute a simple client/server architecture to create a highly scalable architecture capable of handling heavy traffic. Implemented in Scala, Norbert wraps ZooKeeper, Netty and uses Protocol Buffers for transport to make it easy …

Read More »

Apache Curator

Apache

AT A GLANCE Curator is a set of Java libraries that make using Apache ZooKeeper much easier. New users of ZooKeeper are surprised to learn that a significant amount of connection management must be done manually. For example, when the ZooKeeper client connects to the ensemble it must negotiate a …

Read More »

Google Protocol Buffers

Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from …

Read More »

Apache Avro

flume and avro

Apache Avro is a framework for modeling, serializing and making Remote Procedure Calls (RPC). Avro data is described by a schema, and one interesting feature is that the schema is stored in the same file as the data it describes, so files are self-describing. Avro does not require code generation. …

Read More »

Apache Zookeeper

Zookeeper

AT A GLANCE Zookeeper is a coordination service that claims to offer tools needed to create correct distributed applications. Several Hadoop projects use ZooKeeper to coordinate the cluster and provide highly-available distributed services, such as Apache HBase, Storm, Kafka. Developed at Yahoo, ZooKeeper is an application library with two principal …

Read More »

Apache Thrift

Apache Thrift

AT A GLANCE The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages. It’s …

Read More »

Cloudera Morphlines

AT A GLANCE Cloudera’s Morphlines is an open source framework that is said to reduce the time and skills necessary to integrate, build, and change Hadoop processing applications to perform ETL on data into Apache Solr, Apache HBase, HDFS, enterprise data warehouses, or analytic online dashboards. Cloudera Morphlines is an …

Read More »