Breaking News

Splice Machine

The concept behind it sounds like a dare: take the Hadoop NoSQL data store and use it to create a SQL relational database solution that can scale as easily as Hadoop. After a beta testing period that began in May of 2014, it’s available to integrate with traditional Hadoop jobs and backed …

Read More »

Apache Apex

AT A GLANCE A Hadoop YARN native big data processing platform, Apex is a enabling real time stream as well as batch processing for your big data. Apex provides the following benefits: High scalability and performance Fault tolerance and state management Hadoop-native YARN & HDFS implementation Event processing guarantees Separation …

Read More »

Apache Zeppelin


AT A GLANCE Said to be a collaborative data analytics and visualization tool for Apache Spark, Apache Flink, Apache’s incubating Zeppelin project is a web-based tool for data scientists to collaborate over large-scale data exploration. Zeppelin is independent of the execution framework, and its interpreter allows any language or data-processing …

Read More »

Apache Geode

AT A GLANCE The initial version of this data store was a distributed caching product that allowed both C++ and Java applications to share objects in a scale out environment at high speeds. GemFire was launched as a result of lessons learned from its predecessor – an object-oriented database and …

Read More »

Amazon Kinesis

AT A GLANCE Amazon’s Kinesis is a cloud-based service for real-time data processing over large, distributed data streams. It claims to be able to continuously capture and store terabytes of data per hour from hundreds of thousands of sources such as website clickstreams, financial transactions, social media feeds, IT logs, …

Read More »

Apache Chukwa

AT A GLANCE Apache Chukwa is an open source data collection system for monitoring large distributed systems. Chukwa is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a toolkit for displaying, monitoring and analyzing results to …

Read More »



AT A GLANCE Two products: Riak KV is a distributed NoSQL database that is highly available, scalable and easy to operate. It automatically distributes data across the cluster to ensure fast performance and fault-tolerance. Riak KV Enterprise includes multi-cluster replication ensuring low-latency and robust business continuity. Riak TS is a …

Read More »


AT A GLANCE Said to “eliminate the friction between software development, data science and production deployment,” PredictionIO is an open-source Machine Learning server for developers and data scientists to build and deploy predictive applications. The core part of the tool is an engine deployment platform built on top of Apache …

Read More »

Apache Giraph

apache giraph

AT A GLANCE Apache’s Giraph project is said to be “a scalable, fault-tolerant implementation of graph-processing algorithms in Apache Hadoop clusters of up to thousands of computing nodes.” Giraph is in use at companies like Facebook and PayPal to help represent and analyze the billions (or even trillions) of connections …

Read More »



AT A GLANCE Couchbase Server is an open source, distributed database engineered for scalability, performance, and availability. It’s a general purpose database that can be deployed as a document database, key-value store, and/or distributed cache. Its memory-centric, multi-threaded architecture leverages integrated caching, memory-optimized indexes, and memory-to-memory replication to provide consistent …

Read More »