前期准备工作

  1. CentOS 7
  2. JDK 1.8(自行准备,此文不叙)
  3. https://hadoop.apache.org/releases.html 下载 Hadoop 3.x 版本
  4. https://www.scala-lang.org/download/ 下载 Scala 2.12.x 版本(.tgz File)
  5. http://spark.apache.org/downloads.html 下载 Spark 2.3.x 版本
    • Choose a package type: Pre-built for Apache Hadoop 2.7 and later
  6. 全部下载好后,将压缩包放入 /opt/sources 文件夹下,将各个压缩包解压的文件夹放入 /opt 文件夹下。

全局设置

/etc/profile 文件中添加如下内容:

## Java
export JAVA_HOME=/opt/jdk1.8.0_181

## Hadoop
export HADOOP_HOME=/opt/hadoop-3.1.1

## HBase
export HBASE_HOME=/opt/hbase-1.4.6

## ZooKeeper
export ZOOKEEPER_HOME=/opt/zookeeper-3.4.12
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$PATH

## Scala
export SCALA_HOME=/opt/scala-2.12.6
export PATH=$PATH:$SCALA_HOME/bin

## Spark
export SPARK_HOME=/opt/spark-2.3.1-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin

设置完毕后,记得要: source /etc/profile

Hadoop 3.x 伪分布配置

  1. /opt/hadoop-3.1.1/etc/hadoop/hadoop-env.sh

    # The java implementation to use. By default, this environment
    # variable is REQUIRED on ALL platforms except OS X!
    export JAVA_HOME=/opt/jdk1.8.0_181
    
  2. /opt/hadoop-3.1.1/etc/hadoop/core-site.xml

    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/users/hadoop/hadoop/tmp</value>
        </property>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://localhost:9000</value>
        </property>
    </configuration>
    
  3. /opt/hadoop-3.1.1/etc/hadoop/hdfs-site.xml

    <configuration>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/home/users/hadoop/hadoop/data</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/home/users/hadoop/hadoop/name</value>
        </property>
        <property>
            <name>dfs.http.address</name>
            <value>0.0.0.0:8100</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
    </configuration>
    
  4. /opt/hadoop-3.1.1/sbin/start-dfs.sh/opt/hadoop-3.1.1/sbin/stop-dfs.sh

    HDFS_DATANODE_USER=root
    HDFS_DATANODE_SECURE_USER=hdfs
    HDFS_NAMENODE_USER=root
    HDFS_SECONDARYNAMENODE_USER=root
    
  5. (Optional) /opt/hadoop-3.1.1/sbin/start-yarn.sh/opt/hadoop-3.1.1/sbin/start-yarn.sh

    YARN_RESOURCEMANAGER_USER=root
    HDFS_DATANODE_SECURE_USER=yarn
    YARN_NODEMANAGER_USER=root
    
  6. hdfs namenode -format

  7. start-dfs.sh

测试 Scala 是否正常

[root@localhost ~]# scala -version

Scala code runner version 2.12.6 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.

测试 Spark 是否正常

[root@localhost ~]# spark-shell
2018-09-13 15:13:57 WARN  Utils:66 - Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.123.101 instead (on interface enp0s3)
2018-09-13 15:13:57 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-09-13 15:13:57 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.123.101:4040
Spark context available as 'sc' (master = local[*], app id = local-1536822855033).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.1
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.

scala>