Breaking News

Apache Kudu

Apache Kudo

AT A GLANCE A new addition to the open source Apache Hadoop ecosystem, Apache Kudu (incubating) completes Hadoop’s storage layer to enable fast analytics on fast data. Currently, a limited-functionality version of Kudu is available as a Beta. Like most modern analytic data stores, Kudu internally organizes its data by …

Read More »

Oryx 2

Oryx2

AT A GLANCE A realization of Nathan Marz‘s lambda architecture that is built on Apache’s Spark and Kafka projects, Oryx 2 is a “framework for building that includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering”. It consists of three tiers, each of which builds on the one …

Read More »

Apache Aurora

Aurora

AT A GLANCE Apache’s Mesos is a cluster manager that provides resource isolation and sharing across distributed applications. You might think of it as a “kernel” for your data center. Now (late 2014) the new Aurora project is a service scheduler that runs on top of Mesos, enabling you to …

Read More »

Memcached

Memcached

AT A GLANCE It’s entirely likely you will eventually encounter a situation where you need very fast access to a large amount of data for a short period of time. For example, let’s say you want to send an email to your customers and prospects letting them know about new …

Read More »

Fluentd

Fluentd

AT A GLANCE Conceived by Sadayuki “Sada” Furuhashi in 2011, Fluentd is an open source data collector that claims to unify the data collection and consumption for a better use and understanding of data. Fluentd tries to structure data as JSON as much as possible: this allows it to unify …

Read More »

Gobblin

Gobblin

AT A GLANCE Gobblin claims to be a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., into Hadoop. Gobblin handles the common routine tasks required for all data ingestion ETLs, including …

Read More »

CockroachDB

CockroachDB

AT A GLANCE At technology giants like Google, Amazon, and Facebook, engineers have pioneered techniques that help make their websites just as hard to kill. If a server goes on the fritz, a series of servers shut down, or even an entire data center goes dark, these sites are supposed …

Read More »

Aerospike

aerospike scale

AT A GLANCE Claiming to be “ideal for real-time big data or context driven applications that must sense and respond right now”, Aerospike is an open-source, in-memory NoSQL database and key-value datastore. The company claims that it “operates at in-memory speed and global scale with enterprise-grade reliability.” ARCHITECTURE A typical cluster has …

Read More »

SAP Hana

SAP Hana

AT A GLANCE SAP didn’t invent in-memory data stores, but over the last five years it has done lots to focus attention on the benefits and possibilities of the technology given advances in compute capacity and RAM affordability. Its HANA data store started out as an in-memory database, but has …

Read More »

MemSQL

memsql

AT A GLANCE Traditional relational databases (Oracle, DB2, MySQL) and most NoSQL data stores keep their data on disk. As the cost of RAM today is significantly less than it used to be and coupled with the advent of 64-bit computing, one can equip a standard, off-the-shelf server with hundreds …

Read More »