一、import
导入工具从RDBMS到HDFS导入单个表。表中的每一行被视为HDFS的记录。所有记录被存储在文本文件的文本数据或者在Avro和序列文件的二进制数据。
二、命令使用介绍
$ ./bin/sqoop help # 查看sqoop所支持的工具
Result显示:
Available commands(可用的命令):
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
对于我们并不清楚该怎么使用,我么可以使用如下命令进行查询:
sqoop help COMMAND
例如:
./bin/sqoop help list-databases
结果显示如下(截取部分):
Common arguments:
--connect <jdbc-uri> Specify JDBC
connect
string
--help Print usage
instructions
--password <password> Set
authenticati
on password
--temporary-rootdir <rootdir> Defines the
temporary
root
directory
for the
import
--username <username> Set
authenticati
on username
例子:
./bin/sqoop list-databases \
--connect jdbc:mysql://localhost:3306 \
--username root
--password mysql
三、import命令执行步骤
①. 获取元数据信息(关系型数据库中表的元数据信息)
②. 提交map任务(没有reduce任务)
四、import小例子
1.在关系型数据库mysql中创建my_user表
mysql> create table my_user(
-> id INT,
-> name VARCHAR(100),
-> PRIMARY KEY (id),
-> );
2.插入数据
mysql> insert into my_user values(1, "zhangsna");
mysql> insert into my_user values(2, "lisi");
mysql> insert into my_user values(3, "wangwu");
mysql> select * from my_user;
+----+----------+
| id | name |
+----+----------+
| 1 | zhangsna |
| 2 | lisi |
| 3 | wangwu |
+----+----------+
3 rows in set (0.00 sec)
3.执行import语句
sqoop$ ./bin/sqoop import
--connect jdbc:mysql://localhost:3306/test
--username root
--password mysql
--table my_user;
默认的导出路径是在当前用户主目录下(我的主目录是:/user/hadoop),查看导出文件:
./bin/hdfs dfs -ls -R /user/hadoop
drwxr-xr-x - hadoop supergroup 0 2018-08-11 01:11 /user/hadoop/my_user
-rw-r--r-- 1 hadoop supergroup 0 2018-08-11 01:11 /user/hadoop/my_user/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 11 2018-08-11 01:11 /user/hadoop/my_user/part-m-00000
-rw-r--r-- 1 hadoop supergroup 7 2018-08-11 01:11 /user/hadoop/my_user/part-m-00001
-rw-r--r-- 1 hadoop supergroup 9 2018-08-11 01:11 /user/hadoop/my_user/part-m-00002
hadoop$ ./bin/hdfs dfs -cat /user/hadoop/my_user/part*
1,zhangsna
2,lisi
3,wangwu
四、import命令其他参数
1. --target-dir <dir> 指定导出路径
2.--num-mappers <n> 指定map任务的个数