CnetOS7 搭建Sqoop-1.4.7(Hadoop3.0.3+HBase+Hive+zookeeper)分析

你猜 阅读:278 2021-03-31 17:20:33 评论:0

1、Centos7 搭建Hadoop 集群

               参考文章地址:https://blog.csdn.net/zhouzhiwengang/article/details/94549964

2、CentOS7 搭建HBase 

               参考文章地址:https://blog.csdn.net/zhouzhiwengang/article/details/94616635

3、CentOS7 搭建Hive

               参考文章地址:https://blog.csdn.net/zhouzhiwengang/article/details/94576029

4、CentOS7 搭建Sqoop

第一步:下载Sqoop ,官网下载地址:http://mirrors.hust.edu.cn/apache/sqoop/1.4.7/

第二步:[官网][1]下载Sqoop,我这里选择的是1.4.7版本

第三步:通过FTP(WinSCP,FileZilla等)工具上传到指令目录 /usr/local/hadoop

第四步:使用解压命令解压到当前目录并对文件夹重新命令:

tar zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz #解压 
mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7   #文件夹重命名

第五步:编辑全局变量,新增sqoop 全局环境变量,编辑vi /etc/profile 文件

export SQOOP_HOME=/usr/local/hadoop/sqoop-1.4.7 
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$ZOOKEEPER_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin

 执行source /etc/profile,使配置的全局环境变量生效。

第六步:进入sqoop 的安装目录(/usr/local/hadoop/sqoop-1.4.7/conf)文件夹,执行如下指令:

cd /usr/local/hadoop/sqoop-1.4.7/conf

将sqoop-env-template.sh复制一份,并取名为sqoop-env.sh,也就是执行命令:

cp    sqoop-env-template.sh  sqoop-env.sh

编辑sqoop 的配置文件sqoop-env.sh, 编辑内容,我已经添加相关注释说明:

# Licensed to the Apache Software Foundation (ASF) under one or more 
# contributor license agreements.  See the NOTICE file distributed with 
# this work for additional information regarding copyright ownership. 
# The ASF licenses this file to You under the Apache License, Version 2.0 
# (the "License"); you may not use this file except in compliance with 
# the License.  You may obtain a copy of the License at 
# 
#     http://www.apache.org/licenses/LICENSE-2.0 
# 
# Unless required by applicable law or agreed to in writing, software 
# distributed under the License is distributed on an "AS IS" BASIS, 
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
# See the License for the specific language governing permissions and 
# limitations under the License. 
 
# included in all the hadoop scripts with source command 
# should not be executable directly 
# also should not be passed any arguments, since we need original $* 
 
# Set Hadoop-specific environment variables here. 
 
#Set path to where bin/hadoop is available 
export HADOOP_COMMON_HOME=/usr/local/hadoop/hadoop-3.0.3  #设置Hadoop 安装目录 
 
#Set path to where hadoop-*-core.jar is available 
export HADOOP_MAPRED_HOME=/usr/local/hadoop/hadoop-3.0.3  #设置Hadoop 安装目录 
 
#set the path to where bin/hbase is available 
#export HBASE_HOME= 
 
#Set the path to where bin/hive is available 
export HIVE_HOME=/usr/local/hadoop/hive-2.3.5  #设置Hive安装目录 
 
#Set the path for where zookeper config dir is 
#export ZOOCFGDIR=

第七步:将MySQL8 驱动包,拷贝至Sqoop 安装目录下的lib 文件夹(/usr/local/hadoop/sqoop-1.4.7/lib)

温馨提示:sqoop是一个工具,安装完成后,如果操作的命令不涉及hive和hadoop的,可以实现不启动hive和hadoop,直接输入sqoop命令即可,例如sqoop help命令。要使用hive相关的命令,必须事先启动hive和hadoop。

使用Sqoop简单教程如下:

1、使用help 指令: sqoop help

[root@master /]# sqoop help 
2019-07-09 16:07:22,624 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 
usage: sqoop COMMAND [ARGS] 
 
Available commands: 
  codegen            Generate code to interact with database records 
  create-hive-table  Import a table definition into Hive 
  eval               Evaluate a SQL statement and display the results 
  export             Export an HDFS directory to a database table 
  help               List available commands 
  import             Import a table from a database to HDFS 
  import-all-tables  Import tables from a database to HDFS 
  import-mainframe   Import datasets from a mainframe server to HDFS 
  job                Work with saved jobs 
  list-databases     List available databases on a server 
  list-tables        List available tables in a database 
  merge              Merge results of incremental imports 
  metastore          Run a standalone Sqoop metastore 
  version            Display version information 
 
See 'sqoop help COMMAND' for information on a specific command.

2、查看指定MySQL8数据库涉及的表

指令如下:sqoop   list-tables  --username  admini --password  '123456'     --connect     jdbc:mysql://192.168.60.206:3306/test?characterEncoding=UTF-8

[root@master /]# sqoop   list-tables  --username  admini --password  '123456'     --connect     jdbc:mysql://192.168.60.206:3306/test?characterEncoding=UTF-8 
2019-07-09 16:09:14,346 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 
2019-07-09 16:09:14,424 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 
2019-07-09 16:09:14,539 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 
user_test

3、MySQL 8 指定数据库中的表导入Hive中

指令如下: sqoop import  --connect  jdbc:mysql://192.168.60.206:3306/test?characterEncoding=UTF-8  --username admini  --password 123456  --table user_test  -m 1  --hive-import  --create-hive-table   --hive-table  emp_mysql

[root@master bin]# sqoop import  --connect  jdbc:mysql://192.168.60.206:3306/test?characterEncoding=UTF-8  --username admini  --password 123456  --table user_test  -m 1  --hive-import  --create-hive-table   --hive-table  emp_mysql 
2019-07-09 15:02:36,323 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 
2019-07-09 15:02:36,398 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 
2019-07-09 15:02:36,399 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 
2019-07-09 15:02:36,399 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 
2019-07-09 15:02:36,522 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 
2019-07-09 15:02:36,531 INFO tool.CodeGenTool: Beginning code generation 
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary. 
2019-07-09 15:02:37,124 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_test` AS t LIMIT 1 
2019-07-09 15:02:37,165 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_test` AS t LIMIT 1 
2019-07-09 15:02:37,175 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop/hadoop-3.0.3 
注: /tmp/sqoop-root/compile/68589ffcf4797f9934b62d0918a31c53/user_test.java使用或覆盖了已过时的 API。 
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 
2019-07-09 15:02:40,008 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/68589ffcf4797f9934b62d0918a31c53/user_test.jar 
2019-07-09 15:02:40,023 WARN manager.MySQLManager: It looks like you are importing from mysql. 
2019-07-09 15:02:40,023 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 
2019-07-09 15:02:40,023 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 
2019-07-09 15:02:40,023 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 
2019-07-09 15:02:40,032 INFO mapreduce.ImportJobBase: Beginning import of user_test 
2019-07-09 15:02:40,033 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 
2019-07-09 15:02:40,174 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 
2019-07-09 15:02:41,056 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 
2019-07-09 15:02:41,192 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.60.204:8032 
2019-07-09 15:02:42,312 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1562653081455_0001 
2019-07-09 15:02:50,559 INFO db.DBInputFormat: Using read commited transaction isolation 
2019-07-09 15:02:51,051 INFO mapreduce.JobSubmitter: number of splits:1 
2019-07-09 15:02:51,128 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 
2019-07-09 15:02:51,639 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1562653081455_0001 
2019-07-09 15:02:51,645 INFO mapreduce.JobSubmitter: Executing with tokens: [] 
2019-07-09 15:02:51,905 INFO conf.Configuration: resource-types.xml not found 
2019-07-09 15:02:51,905 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 
2019-07-09 15:02:52,608 INFO impl.YarnClientImpl: Submitted application application_1562653081455_0001 
2019-07-09 15:02:52,682 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1562653081455_0001/ 
2019-07-09 15:02:52,682 INFO mapreduce.Job: Running job: job_1562653081455_0001 
2019-07-09 15:03:03,997 INFO mapreduce.Job: Job job_1562653081455_0001 running in uber mode : false 
2019-07-09 15:03:04,003 INFO mapreduce.Job:  map 0% reduce 0% 
2019-07-09 15:03:12,250 INFO mapreduce.Job:  map 100% reduce 0% 
2019-07-09 15:03:13,276 INFO mapreduce.Job: Job job_1562653081455_0001 completed successfully 
2019-07-09 15:03:13,465 INFO mapreduce.Job: Counters: 32 
	File System Counters 
		FILE: Number of bytes read=0 
		FILE: Number of bytes written=211175 
		FILE: Number of read operations=0 
		FILE: Number of large read operations=0 
		FILE: Number of write operations=0 
		HDFS: Number of bytes read=87 
		HDFS: Number of bytes written=17 
		HDFS: Number of read operations=6 
		HDFS: Number of large read operations=0 
		HDFS: Number of write operations=2 
	Job Counters  
		Launched map tasks=1 
		Other local map tasks=1 
		Total time spent by all maps in occupied slots (ms)=4996 
		Total time spent by all reduces in occupied slots (ms)=0 
		Total time spent by all map tasks (ms)=4996 
		Total vcore-milliseconds taken by all map tasks=4996 
		Total megabyte-milliseconds taken by all map tasks=5115904 
	Map-Reduce Framework 
		Map input records=2 
		Map output records=2 
		Input split bytes=87 
		Spilled Records=0 
		Failed Shuffles=0 
		Merged Map outputs=0 
		GC time elapsed (ms)=533 
		CPU time spent (ms)=2520 
		Physical memory (bytes) snapshot=225423360 
		Virtual memory (bytes) snapshot=2804236288 
		Total committed heap usage (bytes)=159383552 
		Peak Map Physical memory (bytes)=225423360 
		Peak Map Virtual memory (bytes)=2804236288 
	File Input Format Counters  
		Bytes Read=0 
	File Output Format Counters  
		Bytes Written=17 
2019-07-09 15:03:13,488 INFO mapreduce.ImportJobBase: Transferred 17 bytes in 32.414 seconds (0.5245 bytes/sec) 
2019-07-09 15:03:13,494 INFO mapreduce.ImportJobBase: Retrieved 2 records. 
2019-07-09 15:03:13,494 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table user_test 
2019-07-09 15:03:13,552 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_test` AS t LIMIT 1 
2019-07-09 15:03:13,570 INFO hive.HiveImport: Loading uploaded data into Hive 
2019-07-09 15:03:13,582 INFO conf.HiveConf: Found configuration file null 
2019-07-09 15:03:20,586 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings. 
2019-07-09 15:03:20,587 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/hadoop/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
2019-07-09 15:03:20,587 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-3.0.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
2019-07-09 15:03:20,587 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 
2019-07-09 15:03:20,617 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 
2019-07-09 15:03:26,335 INFO hive.HiveImport:  
2019-07-09 15:03:26,335 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/usr/local/hadoop/hive-2.3.5/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true 
2019-07-09 15:03:36,139 INFO hive.HiveImport: OK 
2019-07-09 15:03:36,147 INFO hive.HiveImport: Time taken: 8.225 seconds 
2019-07-09 15:03:36,740 INFO hive.HiveImport: Loading data to table default.emp_mysql 
2019-07-09 15:03:38,031 INFO hive.HiveImport: OK 
2019-07-09 15:03:38,031 INFO hive.HiveImport: Time taken: 1.876 seconds 
2019-07-09 15:03:38,610 INFO hive.HiveImport: Hive import complete. 
2019-07-09 15:03:38,629 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory. 
[root@master bin]# pwd 
/usr/local/hadoop/hive-2.3.5/bin 
[root@master bin]# hive shell 
SLF4J: Class path contains multiple SLF4J bindings. 
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-3.0.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 
 
Logging initialized using configuration in jar:file:/usr/local/hadoop/hive-2.3.5/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true 
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. 
hive> use databases; 
FAILED: SemanticException [Error 10072]: Database does not exist: databases 
hive> show databases; 
OK 
db_hive_edu 
default 
user_test 
Time taken: 0.294 seconds, Fetched: 3 row(s) 
hive> use default; 
OK 
Time taken: 0.078 seconds 
hive> show tables; 
OK 
emp_mysql 
pokes 
Time taken: 0.083 seconds, Fetched: 2 row(s) 
hive> select * from emp_mysql; 
OK 
28	1	zzg 
28	2	wz 
Time taken: 1.579 seconds, Fetched: 2 row(s)

 

声明

1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,请转载时务必注明文章作者和来源,不尊重原创的行为我们将追究责任;3.作者投稿可能会经我们编辑修改或补充。

发表评论
搜索
KIKK导航

KIKK导航

排行榜
关注我们

一个IT知识分享的公众号