ITechShree-Data-Analytics-Technologies

Important commands in Apache Sqoop


Sqoop import:
sqoop import
 --connect  jdbc:postgresql://<ip address:port number>/<database name>  
--table < postgressql table name>
--username <username of postgressql user>  --password < password>
--target-dir /<target directory where data needs to be imported> ;

It imports  data from relational database table to hdfs directory  .

Once sqoop imported data in respective target directory , we can not execute the import  command again for the same target dir unless deleting that target directory where data imported earlier or specify different target directory.
--delete-target-dir  should be used along with sqoop command if same target is used so that it will delete the existing one and recreate again.


Sqoop free form SQL import:
The free-form query import is one of the advanced features of Sqoop.  Additionally with
--query parameter , the --split-by parameter also is needed to mention  for slicing the data imported into multiple parallel tasks.
sqoop import
 --connect  jdbc:postgresql://<ip address:port number>/<database name>  
--table < postgressql table name>
--username <username of postgressql user>  --password < password>
--query <Please write sql query>
[ Example: --query “select * from test_table where \$CONDITIONS” ]
--split-by <primary column of postgressql table>
--target-dir /<target directory where data needs to be imported> ;


Sqoop  import to Hbase table:
sqoop import
 --connect  jdbc:postgresql://<ip address:port number>/<database name>  
--table < postgressql table name>
--username <username of postgressql user>  --password < password>
--query <Please write sql query>
[ Example: --query “select * from test_table where \$CONDITIONS” ]
--hbase-table <Hbase atble name>
--column-family <column families name>
--hbase-row-key <row key column in habse table>
-m <number of mappers> ;


Sqoop export:
sqoop export
 --connect  jdbc:postgresql://<ip address:port number>/<database name>  
--table < postgressql table name>
--username <username of postgressql user>  --password < password>
-- export -dir /<target directory where data needs to be exported> ;

It exports  data from hdfs directory  to relational database table.


Import all  tables :
It imports all the tables from the RDBMS database server to the HDFS
 Sqoop import-all-tables
 --connect  jdbc:postgresql://<ip address:port number>/<database name>  
--table < postgressql table name>
--username <username of postgressql user>  --password < password> ;



 list database :
sqoop list- databases
 –connect  jdbc:postgresql://<ip address:port number>/<database name>  
--table < postgressql table name>
--username <username of postgressql user>  --password < password> ;

It lists the databases present in the relational database.


list tables :

It lists out the tables of a particular database in RDBMS database server .
sqoop list - tables
 --connect  jdbc:postgresql://<ip address:port number>/<database name>  
--table < postgressql table name>
--username <username of postgressql user>  --password < password> ;


Codegen :
Every  table in the database has one Data Access Object class that contains ‘getter’ and ‘setter’ methods to initialize objects in object-oriented application. Sqoop Codegen command generates the DAO class automatically. It generates DAO class in Java, based on the Table Schema structure.


sqoop codegen
 --connect  jdbc:postgresql://<ip address:port number>/<database name>  
--table < postgressql table name>
--username <username of postgressql user>  --password < password> ;



Sqoop incremental import:
It appends  new rows continually with increasing row id values when importing source  table .Sqoop imports rows where the check column has a value greater than the one specified with last-value.

sqoop import
 --connect  jdbc:postgresql://<ip address:port number>/<database name>  
--table < postgressql table name>
--username <username of postgressql user>  --password < password>
--target-dir /<target directory where data needs to be imported>
--incremental append
--check-column <updated column name  of postgressql table >
--last-value < last value of imported check-column in target>   ;

Alternate  way supported by Sqoop where updated  rows of the source table is being added is called lastmodified mode.
sqoop import
 --connect  jdbc:postgresql://<ip address:port number>/<database name>  
--table < postgressql table name>
--username <username of postgressql user>  --password < password>
--target-dir /<target directory where data needs to be imported>
--incremental lastmodified
--check-column <updated column name  of postgressql table >
--  merge-key <key column in postgressql table> ;

last modified mode works with merge-key, this mode compares the existing old data in the directory with the new data . when  same merge key appears in the old data then sqoop merges that with the new data and  new data after merge will be written to the target directory.
Append mode in sqoop incremental import works with the existing target  directory(if directory not exists then creates new) and stores the last state value in metastore . But  Lastmodified  will work with the existing target directory, if merge-key argument is mentioned. If we won't mention then sqoop will throw an error if the target directory already exists.

Please find next blog for running SQOOP import using JAVA.

Post a Comment

0 Comments