Breaking News

Apache Zeppelin

zepplin

AT A GLANCE Said to be a collaborative data analytics and visualization tool for Apache Spark, Apache Flink, Apache’s incubating Zeppelin project is a web-based tool for data scientists to collaborate over large-scale data exploration. Zeppelin is independent of the execution framework, and its interpreter allows any language or data-processing …

Read More »

Apache Geode

AT A GLANCE The initial version of this data store was a distributed caching product that allowed both C++ and Java applications to share objects in a scale out environment at high speeds. GemFire was launched as a result of lessons learned from its predecessor – an object-oriented database and …

Read More »

Amazon Kinesis

AT A GLANCE Amazon’s Kinesis is a cloud-based service for real-time data processing over large, distributed data streams. It claims to be able to continuously capture and store terabytes of data per hour from hundreds of thousands of sources such as website clickstreams, financial transactions, social media feeds, IT logs, …

Read More »

Apache Chukwa

AT A GLANCE Apache Chukwa is an open source data collection system for monitoring large distributed systems. Chukwa is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a toolkit for displaying, monitoring and analyzing results to …

Read More »

Riak

riak

AT A GLANCE Two products: Riak KV is a distributed NoSQL database that is highly available, scalable and easy to operate. It automatically distributes data across the cluster to ensure fast performance and fault-tolerance. Riak KV Enterprise includes multi-cluster replication ensuring low-latency and robust business continuity. Riak TS is a …

Read More »

Prediction.IO

AT A GLANCE Said to “eliminate the friction between software development, data science and production deployment,” PredictionIO is an open-source Machine Learning server for developers and data scientists to build and deploy predictive applications. The core part of the tool is an engine deployment platform built on top of Apache …

Read More »

Apache Giraph

apache giraph

AT A GLANCE Apache’s Giraph project is said to be “a scalable, fault-tolerant implementation of graph-processing algorithms in Apache Hadoop clusters of up to thousands of computing nodes.” Giraph is in use at companies like Facebook and PayPal to help represent and analyze the billions (or even trillions) of connections …

Read More »

Couchbase

couchbase

AT A GLANCE Couchbase Server is an open source, distributed database engineered for scalability, performance, and availability. It’s a general purpose database that can be deployed as a document database, key-value store, and/or distributed cache. Its memory-centric, multi-threaded architecture leverages integrated caching, memory-optimized indexes, and memory-to-memory replication to provide consistent …

Read More »

Apache Kudu

Apache Kudo

AT A GLANCE A new addition to the open source Apache Hadoop ecosystem, Apache Kudu (incubating) completes Hadoop’s storage layer to enable fast analytics on fast data. Currently, a limited-functionality version of Kudu is available as a Beta. Like most modern analytic data stores, Kudu internally organizes its data by …

Read More »

Oryx 2

Oryx2

AT A GLANCE A realization of Nathan Marz‘s lambda architecture that is built on Apache’s Spark and Kafka projects, Oryx 2 is a “framework for building that includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering”. It consists of three tiers, each of which builds on the one …

Read More »