前期准备工作
- CentOS 7
- JDK 1.8(自行准备,此文不叙)
- https://hadoop.apache.org/releases.html 下载 Hadoop 3.x 版本
- https://www.scala-lang.org/download/ 下载 Scala 2.12.x 版本(.tgz File)
- http://spark.apache.org/downloads.html 下载 Spark 2.3.x 版本
- Choose a package type: Pre-built for Apache Hadoop 2.7 and later
- 全部下载好后,将压缩包放入
/opt/sources
文件夹下,将各个压缩包解压的文件夹放入/opt
文件夹下。
全局设置
/etc/profile
文件中添加如下内容:
## Java
export JAVA_HOME=/opt/jdk1.8.0_181
## Hadoop
export HADOOP_HOME=/opt/hadoop-3.1.1
## HBase
export HBASE_HOME=/opt/hbase-1.4.6
## ZooKeeper
export ZOOKEEPER_HOME=/opt/zookeeper-3.4.12
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$PATH
## Scala
export SCALA_HOME=/opt/scala-2.12.6
export PATH=$PATH:$SCALA_HOME/bin
## Spark
export SPARK_HOME=/opt/spark-2.3.1-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
设置完毕后,记得要: source /etc/profile
Hadoop 3.x 伪分布配置
-
/opt/hadoop-3.1.1/etc/hadoop/hadoop-env.sh
# The java implementation to use. By default, this environment # variable is REQUIRED on ALL platforms except OS X! export JAVA_HOME=/opt/jdk1.8.0_181
-
/opt/hadoop-3.1.1/etc/hadoop/core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/users/hadoop/hadoop/tmp</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
-
/opt/hadoop-3.1.1/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.datanode.data.dir</name> <value>/home/users/hadoop/hadoop/data</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/users/hadoop/hadoop/name</value> </property> <property> <name>dfs.http.address</name> <value>0.0.0.0:8100</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
-
/opt/hadoop-3.1.1/sbin/start-dfs.sh
和/opt/hadoop-3.1.1/sbin/stop-dfs.sh
HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
-
(Optional)
/opt/hadoop-3.1.1/sbin/start-yarn.sh
和/opt/hadoop-3.1.1/sbin/start-yarn.sh
YARN_RESOURCEMANAGER_USER=root HDFS_DATANODE_SECURE_USER=yarn YARN_NODEMANAGER_USER=root
-
hdfs namenode -format
-
start-dfs.sh
测试 Scala 是否正常
[root@localhost ~]# scala -version
Scala code runner version 2.12.6 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
测试 Spark 是否正常
[root@localhost ~]# spark-shell
2018-09-13 15:13:57 WARN Utils:66 - Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.123.101 instead (on interface enp0s3)
2018-09-13 15:13:57 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-09-13 15:13:57 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.123.101:4040
Spark context available as 'sc' (master = local[*], app id = local-1536822855033).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.1
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.
scala>