# An Example Data Analysis Task Using Pig

The goal is to find the top 10 most-visited pages in each category.

Given **data**:

![](https://692505235-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M5-0RGlgjkePcvzcwjB%2F-M5-0TDUmbHuXFo5fa3Q%2F-M5-0VM5tBzGkba1T1Xu%2FDataAnalysisPig.PNG?generation=1586990811423509\&alt=media)The **Data Flow Graph** for the task is shown below:

![](https://692505235-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M5-0RGlgjkePcvzcwjB%2F-M5-0TDUmbHuXFo5fa3Q%2F-M5-0VM7gilmRDc-fx5-%2FDataAnalysisPig_DFG.PNG?generation=1586990811424381\&alt=media)

**Pig Latin** code for the task:

```
visits = load ‘/data/visits’ as (user, url, time);
gVisits = group visits by url;
visitCounts  = foreach gVisits generate url, count(visits);

urlInfo = load ‘/data/urlInfo’ as (url, category, pRank);
visitCounts = join visitCounts by url, urlInfo by url;

gCategories = group visitCounts by category;
topUrls = foreach gCategories generate top(visitCounts,10);

store topUrls into ‘/data/topUrls’;
```
