ITechShree-Data-Analytics-Technologies

Apache Pig Overview



Apache Pig is an abstraction over Mapreduce. Its is a tool  or platform which is used to analyze  a large set of data .It uses high level language known as Pig Latin .This provides various operator by which programmer  can develop their own  functions for reading ,writing ,analyzing and processing data.
A pig can eat anything, similarly Apache pig is used to analyze any kind of data whether its structured or unstructured. This concept is incorporated here.

Programmer writes scripts using Pig Latin and Apache Pig engine that accepts this scripts as input and coverts internally into Map reduce jobs.


There are two modes in which Apache Pig functions
 namely, Local Mode and HDFS mode.
Local Mode
In this mode, Apache Pig is designed to execute in a  single JVM and used for development experimenting and prototyping.
Here, all the files are installed and run from your local host and local file system.
To access the  command or grunt shell in this mode  we need to  execute
Pig -x local
MapReduce Mode
Its also known as hadoop mode and it is default mode in Apache Pig. Here, we load or process the data that resides  in the Hadoop File System (HDFS) . In this mode, whenever we execute the Pig Latin statements to process the data, a MapReduce job is invoked in the back-end to perform a particular operation on the data that exists in the HDFS.

To access the  command or grunt shell in this mode  we need to  execute
Pig or pig -x mapreduce

Basically, Pig scripts can be executed in three modes, namely, interactive mode, batch mode, and embedded mode.

Interactive Mode (Grunt shell) − In this mode Apache Pig is executed in a Grunt shell. In this shell, we can enter the Pig Latin statements and get the output (using Dump operator).
Batch Mode (Script) − Here, we can run scripts file using .pig extension. These files contain Pig Latin commands. We write Pig scripts  in a file and store it in location and using the terminal we use that particular location and  the file and run the code present in that file.
Embedded Mode(UDF) − we can  define  our own functions (User Defined Functions) in programming languages such as Java, and using them in our script.


Important features of Apache Pig.
i. Inbuilt API
It provides  huge set of operators such as join, sort, filer, etc. To perform several operations.
ii. Non complexity in programming
Pig does not include complex code used in Mapreduce Programming.Instead Pig uses Pig latin which is similar to SQL .These queries internally gets converted to Map reduce jobs.So it has grown the ease of programming.
iii. Optimization

In Apache Pig the tasks optimize their execution automatically. Hence,  programmers only need to  focus on the semantics of the language rather than efficiency..
iv. User Defined Functions
 It means users can develop their own functions(User-defined Functions ) and invoke or embed them to read, process, and write data, using the existing operators.

v. Data flexibility 
Handles all kinds of data whether it is structured or unstructured and  stores the results in HDFS.







Pig Architecture

Now moving to Pig Architecture and lets explain below diagram




ITechShree:Pig Overview




Initially the Pig Scripts are processed by the Parser  for  checking syntax ,type  and other miscellaneous  of scripts and outputs  a DAG (directed acyclic graph), which represents the Pig Latin statements and logical operators.
The logical operators of the script are represented as the nodes and the data flows are represented as edges in DAG.
After that the logical plan (DAG) is passed to the logical optimizer .It carries out the logical optimizations such as projection and pushdown.
Then compiler compiles the optimized logical plan into a series of MapReduce jobs.
Finally, all the MapReduce jobs are submitted to Hadoop in a sorted order and  produces the desired results .

This is all about basics understanding of Apache Pig .Now please go through the Pig Latin scripts and its commands like use of store ,load etc to write scripts .

See  you in next blog!!



Post a Comment

4 Comments

Please do not enter any spam link in the comment box