Breaking News
Home / 2016 / August

Monthly Archives: August 2016

Apache Zeppelin

zepplin

AT A GLANCE Said to be a collaborative data analytics and visualization tool for Apache Spark, Apache Flink, Apache’s incubating Zeppelin project is a web-based tool for data scientists to collaborate over large-scale data exploration. Zeppelin is independent of the execution framework, and its interpreter allows any language or data-processing …

Read More »

Apache Geode

AT A GLANCE The initial version of this data store was a distributed caching product that allowed both C++ and Java applications to share objects in a scale out environment at high speeds. GemFire was launched as a result of lessons learned from its predecessor – an object-oriented database and …

Read More »

Amazon Kinesis

AT A GLANCE Amazon’s Kinesis is a cloud-based service for real-time data processing over large, distributed data streams. It claims to be able to continuously capture and store terabytes of data per hour from hundreds of thousands of sources such as website clickstreams, financial transactions, social media feeds, IT logs, …

Read More »

Apache Chukwa

AT A GLANCE Apache Chukwa is an open source data collection system for monitoring large distributed systems. Chukwa is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a toolkit for displaying, monitoring and analyzing results to …

Read More »

Riak

AT A GLANCE Two products: Riak KV is a distributed NoSQL database that is highly available, scalable and easy to operate. It automatically distributes data across the cluster to ensure fast performance and fault-tolerance. Riak KV Enterprise includes multi-cluster replication ensuring low-latency and robust business continuity. Riak TS is a …

Read More »

Prediction.IO

AT A GLANCE Said to “eliminate the friction between software development, data science and production deployment,” PredictionIO is an open-source Machine Learning server for developers and data scientists to build and deploy predictive applications. The core part of the tool is an engine deployment platform built on top of Apache …

Read More »

Apache Giraph

apache giraph

AT A GLANCE Apache’s Giraph project is said to be “a scalable, fault-tolerant implementation of graph-processing algorithms in Apache Hadoop clusters of up to thousands of computing nodes.” Giraph is in use at companies like Facebook and PayPal to help represent and analyze the billions (or even trillions) of connections …

Read More »