CnetOS7 搭建Sqoop-1.4.7(Hadoop3.0.3+HBase+Hive+zookeeper)
1、Centos7 搭建Hadoop 集群
参考文章地址:https://blog.csdn.net/zhouzhiwengang/article/details/94549964
2、CentOS7 搭建HBase
参考文章地址:https://blog.csdn.net/zhouzhiwengang/article/details/94616635
3、CentOS7 搭建Hive
参考文章地址:https://blog.csdn.net/zhouzhiwengang/article/details/94576029
4、CentOS7 搭建Sqoop
第一步:下载Sqoop ,官网下载地址:http://mirrors.hust.edu.cn/apache/sqoop/1.4.7/
第二步:[官网][1]下载Sqoop,我这里选择的是1.4.7版本
第三步:通过FTP(WinSCP,FileZilla等)工具上传到指令目录 /usr/local/hadoop
第四步:使用解压命令解压到当前目录并对文件夹重新命令:
tar zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz #解压
mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7 #文件夹重命名
第五步:编辑全局变量,新增sqoop 全局环境变量,编辑vi /etc/profile 文件
export SQOOP_HOME=/usr/local/hadoop/sqoop-1.4.7
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$ZOOKEEPER_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin
执行source /etc/profile,使配置的全局环境变量生效。
第六步:进入sqoop 的安装目录(/usr/local/hadoop/sqoop-1.4.7/conf)文件夹,执行如下指令:
cd /usr/local/hadoop/sqoop-1.4.7/conf
将sqoop-env-template.sh复制一份,并取名为sqoop-env.sh,也就是执行命令:
cp sqoop-env-template.sh sqoop-env.sh
编辑sqoop 的配置文件sqoop-env.sh, 编辑内容,我已经添加相关注释说明:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*
# Set Hadoop-specific environment variables here.
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/local/hadoop/hadoop-3.0.3 #设置Hadoop 安装目录
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/local/hadoop/hadoop-3.0.3 #设置Hadoop 安装目录
#set the path to where bin/hbase is available
#export HBASE_HOME=
#Set the path to where bin/hive is available
export HIVE_HOME=/usr/local/hadoop/hive-2.3.5 #设置Hive安装目录
#Set the path for where zookeper config dir is
#export ZOOCFGDIR=
第七步:将MySQL8 驱动包,拷贝至Sqoop 安装目录下的lib 文件夹(/usr/local/hadoop/sqoop-1.4.7/lib)
温馨提示:sqoop是一个工具,安装完成后,如果操作的命令不涉及hive和hadoop的,可以实现不启动hive和hadoop,直接输入sqoop命令即可,例如sqoop help命令。要使用hive相关的命令,必须事先启动hive和hadoop。
使用Sqoop简单教程如下:
1、使用help 指令: sqoop help
[root@master /]# sqoop help
2019-07-09 16:07:22,624 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
2、查看指定MySQL8数据库涉及的表
指令如下:sqoop list-tables --username admini --password '123456' --connect jdbc:mysql://192.168.60.206:3306/test?characterEncoding=UTF-8
[root@master /]# sqoop list-tables --username admini --password '123456' --connect jdbc:mysql://192.168.60.206:3306/test?characterEncoding=UTF-8
2019-07-09 16:09:14,346 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2019-07-09 16:09:14,424 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2019-07-09 16:09:14,539 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
user_test
3、MySQL 8 指定数据库中的表导入Hive中
指令如下: sqoop import --connect jdbc:mysql://192.168.60.206:3306/test?characterEncoding=UTF-8 --username admini --password 123456 --table user_test -m 1 --hive-import --create-hive-table --hive-table emp_mysql
[root@master bin]# sqoop import --connect jdbc:mysql://192.168.60.206:3306/test?characterEncoding=UTF-8 --username admini --password 123456 --table user_test -m 1 --hive-import --create-hive-table --hive-table emp_mysql
2019-07-09 15:02:36,323 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2019-07-09 15:02:36,398 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2019-07-09 15:02:36,399 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
2019-07-09 15:02:36,399 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
2019-07-09 15:02:36,522 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2019-07-09 15:02:36,531 INFO tool.CodeGenTool: Beginning code generation
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2019-07-09 15:02:37,124 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_test` AS t LIMIT 1
2019-07-09 15:02:37,165 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_test` AS t LIMIT 1
2019-07-09 15:02:37,175 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop/hadoop-3.0.3
注: /tmp/sqoop-root/compile/68589ffcf4797f9934b62d0918a31c53/user_test.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
2019-07-09 15:02:40,008 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/68589ffcf4797f9934b62d0918a31c53/user_test.jar
2019-07-09 15:02:40,023 WARN manager.MySQLManager: It looks like you are importing from mysql.
2019-07-09 15:02:40,023 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
2019-07-09 15:02:40,023 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
2019-07-09 15:02:40,023 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2019-07-09 15:02:40,032 INFO mapreduce.ImportJobBase: Beginning import of user_test
2019-07-09 15:02:40,033 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2019-07-09 15:02:40,174 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2019-07-09 15:02:41,056 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2019-07-09 15:02:41,192 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.60.204:8032
2019-07-09 15:02:42,312 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1562653081455_0001
2019-07-09 15:02:50,559 INFO db.DBInputFormat: Using read commited transaction isolation
2019-07-09 15:02:51,051 INFO mapreduce.JobSubmitter: number of splits:1
2019-07-09 15:02:51,128 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2019-07-09 15:02:51,639 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1562653081455_0001
2019-07-09 15:02:51,645 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-07-09 15:02:51,905 INFO conf.Configuration: resource-types.xml not found
2019-07-09 15:02:51,905 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-07-09 15:02:52,608 INFO impl.YarnClientImpl: Submitted application application_1562653081455_0001
2019-07-09 15:02:52,682 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1562653081455_0001/
2019-07-09 15:02:52,682 INFO mapreduce.Job: Running job: job_1562653081455_0001
2019-07-09 15:03:03,997 INFO mapreduce.Job: Job job_1562653081455_0001 running in uber mode : false
2019-07-09 15:03:04,003 INFO mapreduce.Job: map 0% reduce 0%
2019-07-09 15:03:12,250 INFO mapreduce.Job: map 100% reduce 0%
2019-07-09 15:03:13,276 INFO mapreduce.Job: Job job_1562653081455_0001 completed successfully
2019-07-09 15:03:13,465 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=211175
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=17
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=4996
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4996
Total vcore-milliseconds taken by all map tasks=4996
Total megabyte-milliseconds taken by all map tasks=5115904
Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=533
CPU time spent (ms)=2520
Physical memory (bytes) snapshot=225423360
Virtual memory (bytes) snapshot=2804236288
Total committed heap usage (bytes)=159383552
Peak Map Physical memory (bytes)=225423360
Peak Map Virtual memory (bytes)=2804236288
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=17
2019-07-09 15:03:13,488 INFO mapreduce.ImportJobBase: Transferred 17 bytes in 32.414 seconds (0.5245 bytes/sec)
2019-07-09 15:03:13,494 INFO mapreduce.ImportJobBase: Retrieved 2 records.
2019-07-09 15:03:13,494 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table user_test
2019-07-09 15:03:13,552 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_test` AS t LIMIT 1
2019-07-09 15:03:13,570 INFO hive.HiveImport: Loading uploaded data into Hive
2019-07-09 15:03:13,582 INFO conf.HiveConf: Found configuration file null
2019-07-09 15:03:20,586 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
2019-07-09 15:03:20,587 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/hadoop/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2019-07-09 15:03:20,587 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-3.0.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2019-07-09 15:03:20,587 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2019-07-09 15:03:20,617 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2019-07-09 15:03:26,335 INFO hive.HiveImport:
2019-07-09 15:03:26,335 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/usr/local/hadoop/hive-2.3.5/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
2019-07-09 15:03:36,139 INFO hive.HiveImport: OK
2019-07-09 15:03:36,147 INFO hive.HiveImport: Time taken: 8.225 seconds
2019-07-09 15:03:36,740 INFO hive.HiveImport: Loading data to table default.emp_mysql
2019-07-09 15:03:38,031 INFO hive.HiveImport: OK
2019-07-09 15:03:38,031 INFO hive.HiveImport: Time taken: 1.876 seconds
2019-07-09 15:03:38,610 INFO hive.HiveImport: Hive import complete.
2019-07-09 15:03:38,629 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
[root@master bin]# pwd
/usr/local/hadoop/hive-2.3.5/bin
[root@master bin]# hive shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hive-2.3.5/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-3.0.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/hadoop/hive-2.3.5/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> use databases;
FAILED: SemanticException [Error 10072]: Database does not exist: databases
hive> show databases;
OK
db_hive_edu
default
user_test
Time taken: 0.294 seconds, Fetched: 3 row(s)
hive> use default;
OK
Time taken: 0.078 seconds
hive> show tables;
OK
emp_mysql
pokes
Time taken: 0.083 seconds, Fetched: 2 row(s)
hive> select * from emp_mysql;
OK
28 1 zzg
28 2 wz
Time taken: 1.579 seconds, Fetched: 2 row(s)
1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,请转载时务必注明文章作者和来源,不尊重原创的行为我们将追究责任;3.作者投稿可能会经我们编辑修改或补充。