Apache Pig is an abstraction over Mapreduce. Its is a tool or platform which is used to analyze a large set of data .It uses high level language known as Pig Latin .This provides various operator by which programmer can develop their own functions for reading ,writing ,analyzing and processing data.
A pig can eat anything, similarly Apache pig is used to analyze any kind of data whether its structured or unstructured. This concept is incorporated here.
Programmer writes scripts using Pig Latin and Apache Pig engine that accepts this scripts as input and coverts internally into Map reduce jobs.
There are two modes in which Apache Pig functions
namely, Local Mode and HDFS mode.
Local Mode
In this mode, Apache Pig is designed to execute in a single JVM and used for development experimenting and prototyping.
Here, all the files are installed and run from your local host and local file system.
To access the command or grunt shell in this mode we need to execute
Pig -x local
MapReduce Mode
Its also known as hadoop mode and it is default mode in Apache Pig. Here, we load or process the data that resides in the Hadoop File System (HDFS) . In this mode, whenever we execute the Pig Latin statements to process the data, a MapReduce job is invoked in the back-end to perform a particular operation on the data that exists in the HDFS.
To access the command or grunt shell in this mode we need to execute
Pig or pig -x mapreduce
Basically, Pig scripts can be executed in three modes, namely, interactive mode, batch mode, and embedded mode.
Interactive Mode (Grunt shell) − In this mode Apache Pig is executed in a Grunt shell. In this shell, we can enter the Pig Latin statements and get the output (using Dump operator).
Batch Mode (Script) − Here, we can run scripts file using .pig extension. These files contain Pig Latin commands. We write Pig scripts in a file and store it in location and using the terminal we use that particular location and the file and run the code present in that file.
Embedded Mode(UDF) − we can define our own functions (User Defined Functions) in programming languages such as Java, and using them in our script.
Important features of Apache Pig.
i. Inbuilt API
It provides huge set of operators such as join, sort, filer, etc. To perform several operations.
ii. Non complexity in programming
Pig does not include complex code used in Mapreduce Programming.Instead Pig uses Pig latin which is similar to SQL .These queries internally gets converted to Map reduce jobs.So it has grown the ease of programming.
iii. Optimization
In Apache Pig the tasks optimize their execution automatically. Hence, programmers only need to focus on the semantics of the language rather than efficiency..
iv. User Defined Functions
It means users can develop their own functions(User-defined Functions ) and invoke or embed them to read, process, and write data, using the existing operators.
v. Data flexibility
Pig Architecture
Now moving to Pig Architecture and lets explain below diagram
Initially the Pig Scripts are processed by the Parser for checking syntax ,type and other miscellaneous of scripts and outputs a DAG (directed acyclic graph), which represents the Pig Latin statements and logical operators.
The logical operators of the script are represented as the nodes and the data flows are represented as edges in DAG.
After that the logical plan (DAG) is passed to the logical optimizer .It carries out the logical optimizations such as projection and pushdown.
Then compiler compiles the optimized logical plan into a series of MapReduce jobs.
Finally, all the MapReduce jobs are submitted to Hadoop in a sorted order and produces the desired results .
This is all about basics understanding of Apache Pig .Now please go through the Pig Latin scripts and its commands like use of store ,load etc to write scripts .
See you in next blog!!
4 Comments
Good content for learning
ReplyDeleteThanx for information
ReplyDeleteNice content , would love if you add more !
ReplyDeleteDefinitely Shrishti. I shall share my knowledge as much as possible.
DeletePlease do not enter any spam link in the comment box