CS-GY 9223-D: Programming for Big Data
main
main
  • Introduction
  • Big Data
  • Hadoop
    • An Introduction to Hadoop
    • The Main Components of Hadoop
    • Some Hadoop Related Projects
    • A Typical Large Data Problem
    • The Google File System vs HDFS
    • Data Types in Hadoop
    • Programming in Hadoop
    • Common Examples of MapReduce Jobs
    • Advantages and Disadvantages of MapReduce
  • Pig
    • An Introduction to Pig
    • Components of Pig
    • An Example Data Analysis Task Using Pig
Powered by GitBook
On this page

Was this helpful?

  1. Hadoop

Some Hadoop Related Projects

A few commonly-used Hadoop related projects include:

  • HBase: A scalable, distributed database that supports structured data storage for large tables

  • Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying

  • Pig: A high-level data-flow language and execution framework for parallel computation

  • Spark: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation

The next section lists the steps involved in solving a typical large data problem.

PreviousThe Main Components of HadoopNextA Typical Large Data Problem

Last updated 4 years ago

Was this helpful?