Sqoop import:
sqoop import
--connect jdbc:postgresql://<ip address:port number>/<database name>
--table < postgressql table name>
--username <username of postgressql user> --password < password>
--target-dir /<target directory where data needs to be imported> ;
It imports data from relational database table to hdfs directory .
Once sqoop imported data in respective target directory , we can not execute the import command again for the same target dir unless deleting that target directory where data imported earlier or specify different target directory.
--delete-target-dir should be used along with sqoop command if same target is used so that it will delete the existing one and recreate again.
Sqoop free form SQL import:
The free-form query import is one of the advanced features of Sqoop. Additionally with
--query parameter , the --split-by parameter also is needed to mention for slicing the data imported into multiple parallel tasks.
sqoop import
--connect jdbc:postgresql://<ip address:port number>/<database name>
--table < postgressql table name>
--username <username of postgressql user> --password < password>
--query <Please write sql query>
[ Example: --query “select * from test_table where \$CONDITIONS” ]
--split-by <primary column of postgressql table>
--target-dir /<target directory where data needs to be imported> ;
Sqoop import to Hbase table:
sqoop import
--connect jdbc:postgresql://<ip address:port number>/<database name>
--table < postgressql table name>
--username <username of postgressql user> --password < password>
--query <Please write sql query>
[ Example: --query “select * from test_table where \$CONDITIONS” ]
--hbase-table <Hbase atble name>
--column-family <column families name>
--hbase-row-key <row key column in habse table>
-m <number of mappers> ;
Sqoop export:
sqoop export
--connect jdbc:postgresql://<ip address:port number>/<database name>
--table < postgressql table name>
--username <username of postgressql user> --password < password>
-- export -dir /<target directory where data needs to be exported> ;
It exports data from hdfs directory to relational database table.
Import all tables :
It imports all the tables from the RDBMS database server to the HDFS
Sqoop import-all-tables
--connect jdbc:postgresql://<ip address:port number>/<database name>
--table < postgressql table name>
--username <username of postgressql user> --password < password> ;
list database :
sqoop list- databases
–connect jdbc:postgresql://<ip address:port number>/<database name>
--table < postgressql table name>
--username <username of postgressql user> --password < password> ;
It lists the databases present in the relational database.
list tables :
It lists out the tables of a particular database in RDBMS database server .
sqoop list - tables
--connect jdbc:postgresql://<ip address:port number>/<database name>
--table < postgressql table name>
--username <username of postgressql user> --password < password> ;
Codegen :
Every table in the database has one Data Access Object class that contains ‘getter’ and ‘setter’ methods to initialize objects in object-oriented application. Sqoop Codegen command generates the DAO class automatically. It generates DAO class in Java, based on the Table Schema structure.
sqoop codegen
--connect jdbc:postgresql://<ip address:port number>/<database name>
--table < postgressql table name>
--username <username of postgressql user> --password < password> ;
Sqoop incremental import:
It appends new rows continually with increasing row id values when importing source table .Sqoop imports rows where the check column has a value greater than the one specified with –last-value.
sqoop import
--connect jdbc:postgresql://<ip address:port number>/<database name>
--table < postgressql table name>
--username <username of postgressql user> --password < password>
--target-dir /<target directory where data needs to be imported>
--incremental append
--check-column <updated column name of postgressql table >
--last-value < last value of imported check-column in target> ;
Alternate way supported by Sqoop where updated rows of the source table is being added is called lastmodified mode.
sqoop import
--connect jdbc:postgresql://<ip address:port number>/<database name>
--table < postgressql table name>
--username <username of postgressql user> --password < password>
--target-dir /<target directory where data needs to be imported>
--incremental lastmodified
--check-column <updated column name of postgressql table >
-- merge-key <key column in postgressql table> ;
last modified mode works with merge-key, this mode compares the existing old data in the directory with the new data . when same merge key appears in the old data then sqoop merges that with the new data and new data after merge will be written to the target directory.
Append mode in sqoop incremental import works with the existing target directory(if directory not exists then creates new) and stores the last state value in metastore . But Lastmodified will work with the existing target directory, if merge-key argument is mentioned. If we won't mention then sqoop will throw an error if the target directory already exists.
Please find next blog for running SQOOP import using JAVA.
0 Comments
Please do not enter any spam link in the comment box