Some Hadoop Related Projects
A few commonly-used Hadoop related projects include:
HBase: A scalable, distributed database that supports structured data storage for large tables
Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying
Pig: A high-level data-flow language and execution framework for parallel computation
Spark: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation
The next section lists the steps involved in solving a typical large data problem.
Last updated