Some Hadoop Related Projects

A few commonly-used Hadoop related projects include:

HBase: A scalable, distributed database that supports structured data storage for large tables
Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying
Pig: A high-level data-flow language and execution framework for parallel computation
Spark: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation

The next section lists the steps involved in solving a typical large data problem.

Last updated 4 years ago

Was this helpful?