OLAP
Last updated
Last updated
OLAP stands for Online Analytical Processing.
It allows for interactive analysis of data, allowing data to be summarized and viewed in different ways in an online fashion (with negligible delay).
Data that can be modeled as dimension attributes and measure attributes are called multidimensional data.
Measure attributes
measure some value
can be aggregated upon
Ex. the attribute quantity of the sales relation
Dimension attributes
define the dimensions on which measure attributes (or aggregates thereof) are viewed
Ex. the attributes item_name, color, and clothes_size of the sales relation
Example (sales relation):
The above shows the cross-tabulation table (or cross-tab or pivot table) of the sales table by item_name and color.
Values for one of the dimension attributes form the row headers
Values for another dimension attribute form the column headers
Other dimension attributes are listed on top
Values in individual cells are (aggregates of) the values of the dimension attributes that specify the cell
A data cube is a multidimensional generalization of a cross-tab.
Cross-tabs can be used as views on a data cube.
The cube operation computes a union of group by’s on every subset of the specified attributes.
This computes the union of eight different groupings of the sales relation:
{ (item_name, color, size),
(item_name, color),
(item_name, size),
(color, size),
(item_name),
(color),
(size),
( ) }
where ( ) denotes an empty group by list.
For each grouping, the result contains null for attributes not present in the grouping.
Pivoting: changing the dimensions used in a cross-tab
Slicing: creating a cross-tab for fixed values only
Sometimes called dicing, particularly when values for multiple dimensions are fixed
Rollup: moving from finer-granularity data to a coarser granularity
Drill down: The opposite operation - that of moving from coarser granularity data to finer-granularity data
The earliest OLAP systems used multidimensional arrays in memory to store data cubes, and are referred to as multidimensional OLAP (MOLAP) systems
OLAP implementations using only relational database features are called relational OLAP (ROLAP) systems
Hybrid systems, which store some summaries in memory and store the base data and other summaries in a relational database, are called hybrid OLAP (HOLAP) systems