Some Hadoop Related Projects

A few commonly-used Hadoop related projects include:

  • HBase: A scalable, distributed database that supports structured data storage for large tables

  • Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying

  • Pig: A high-level data-flow language and execution framework for parallel computation

  • Spark: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation

The next section lists the steps involved in solving a typical large data problem.

Last updated