Breaking News

Gobblin

AT A GLANCE

Gobblin claims to be a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., into Hadoop.

Gobblin handles the common routine tasks required for all data ingestion ETLs, including job/task scheduling, task partitioning, error handling, state management, data quality checking, data publishing, etc. Gobblin ingests data from different data sources in the same execution framework, and manages metadata of different sources all in one place. This, combined with other features such as auto scalability, fault tolerance, data quality assurance, extensibility, and the ability of handling data model evolution, makes Gobblin an easy-to-use, self-serving, and efficient data ingestion framework.

Gobblin is still in early devlopment and is hosted here on Github.

PROS

CONS

TRENDING

About davidn

Check Also

Apache Samza

Apache Samza

AT A GLANCE Apache Samza is a distributed stream processing framework. It uses Apache’s Kafka for …

2 comments

  1. Very interesting information. I am not aware about this. This helps me to learn more about this details. Very easy to do this. Thank you for this informative blog.Keep blogging like this.

Leave a Reply

Your email address will not be published. Required fields are marked *