mtitek.com  MTI TEK
 Home |Big Data |Samples |Install |Tutorials |References |Books |Contact

Big Data ▸ Install Apache Hive
  1. References
  2. Create "hive" user
  3. Install Hive
  4. Switch to "hive" user
  5. Update "~/.profile" file
  6. Create "warehouse" directory in hdfs
  7. Configure "${HIVE_HOME}/conf/hive-site.xml"
  8. Configure "${HIVE_HOME}/bin/hive-config.sh"
  9. Create Hive metastore database (PostgreSQL)
  10. Start HiveServer
  11. Hive Ports/Web UIs
  12. Start Hive console
  13. Start Hive MetaStore

  1. References
    See these pages for more details about Apache Hive:
    https://hortonworks.com/apache/hive/
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

    See these pages for more details about Apache Hadoop, PostgreSQL, MySQL installation:
    Install Apache Hadoop
    Install PostgreSQL (database)
    Install MySQL (database)
  2. Create "hive" user
    $ sudo addgroup hive
    $ sudo adduser --ingroup hive hive
    $ sudo usermod -a -G hadoop hive
    
  3. Install Hive
    Download Apache Hive: https://hive.apache.org/downloads.html

    Extract the file "apache-hive-3.0.0-bin.tar.gz" in the folder you want to install Hive: /opt/apache-hive-3.0.0-bin
    $ sudo tar -xf ~/Downloads/apache-hive-3.0.0-bin.tar.gz -C /opt/
    $ sudo chmod -R 755 /opt/apache-hive-3.0.0-bin
    $ sudo chown -R hive:hive /opt/apache-hive-3.0.0-bin
    
    Note: In the following sections, the environment variable ${HIVE_HOME} will refer to this location '/opt/apache-hive-3.0.0-bin'
  4. Switch to "hive" user
    $ su - hive
    
  5. Update "~/.profile" file
    $ vi ~/.profile
    export JAVA_HOME="/opt/jdk1.8.0_172"
    
    export HIVE_HOME="/opt/apache-hive-3.0.0-bin"
    
    export HADOOP_HOME="/opt/hadoop-3.0.3"
    
    export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib:$HADOOP_HOME/share/hadoop/common/lib
    
    PATH="$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH"
    
    Load ".profile" environment variables:
    $ source ~/.profile
    
  6. Create "warehouse" directory in hdfs
    $ su - hadoop
    
    $ hadoop fs -mkdir /hive /hive/warehouse
    $ hadoop fs -chmod -R 775 /hive
    $ hadoop fs -chown -R hive:hadoop /hive
    
  7. Configure "${HIVE_HOME}/conf/hive-site.xml"
    Note: Use this config if you are using PostgreSQL metastore:
    $ vi ${HIVE_HOME}/conf/hive-site.xml
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <property>
            <name>hive.metastore.local</name>
            <value>true</value>
        </property>
    
        <property>
            <name>hive.metastore.warehouse.dir</name>
            <value>/hive/warehouse</value>
        </property>
    
        <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>org.postgresql.Driver</value>
        </property>
    
        <property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:postgresql://localhost:5432/hivemetastoredb</value>
        </property>
    
        <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>postgres</value>
        </property>
    
        <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>postgres</value>
        </property>
    
        <property>
            <name>hive.server2.thrift.port</name>
            <value>10000</value>
        </property>
    
        <property>
            <name>hive.server2.enable.doAs</name>
            <value>true</value>
        </property>
    
        <property>
            <name>hive.execution.engine</name>
            <value>mr</value>
        </property>
    
        <property>
            <name>hive.metastore.port</name>
            <value>9083</value>
        </property>
    
        <property>
            <name>mapreduce.input.fileinputformat.input.dir.recursive</name>
            <value>true</value>
        </property>
    </configuration>
    

    Note: If you are using MySQL metastore, then use these MySQL connection properties:
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/hivemetastoredb?createDatabaseIfNotExist=true</value>
    </property>
    
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>admin</value>
    </property>
    
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>admin</value>
    </property>
    
  8. Configure "${HIVE_HOME}/bin/hive-config.sh"
    Edit file "hive-config.sh" and export "HADOOP_HOME" environment variable.
    $ vi ${HIVE_HOME}/bin/hive-config.sh
    export HADOOP_HOME="/opt/hadoop-3.0.3"
    #export HADOOP_HEAPSIZE=${HADOOP_HEAPSIZE:-256}
    export HADOOP_HEAPSIZE=${HADOOP_HEAPSIZE:-1024}
    
  9. Create Hive metastore database (PostgreSQL)
    Create postgresql database "hivemetastoredb".
    $ su - postgres
    
    $ createdb -h localhost -p 5432 -U postgres --password hivemetastoredb
    
    Create Hive schema.
    $ su - hive
    
    $ ${HIVE_HOME}/bin/schematool -initSchema -dbType postgres
    
    Note: To create Hive metastore database with MySQL:
    $ su - hive
    
    $ cd ${HIVE_HOME}/scripts/metastore/upgrade/mysql/
    
    $ mysql -h localhost -u admin -p
    mysql> CREATE DATABASE hivemetastoredb;
    mysql> CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';
    mysql> GRANT all on metastoredb.* to 'hive'@localhost identified by 'hive';
    mysql> flush privileges;
    mysql> use hivemetastoredb;
    mysql> source hive-schema-3.0.0.mysql.sql;
    
  10. Start HiveServer
    Execute "hiveserver2":
    $ cd ${HIVE_HOME}
    $ nohup hiveserver2 &
    
    Execute "hive" to run hiveserver2 service:
    $ cd ${HIVE_HOME}
    $ nohup hive --service hiveserver2 &
    
    Start HiveServer using custom parameters:
    $ cd ${HIVE_HOME}
    $ nohup hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf hive.root.logger=INFO,console &
    
    Notes:
    ▸ Hive log files can be found in "/tmp/hive/hive.log"
    ▸ Hive jetty web app in "/tmp/jetty*"

    Hive process info:
    ▸ Java virtual machine process status tool: jps

    ▸ Display process info: ps -fp <pid> | less
  11. Hive Ports/Web UIs
    Ports Used by Hive:
    HiveServer2: 10000 (hive.server2.thrift.port)
    
    HiveServer2 Web UI: 10002 (hive.server2.webui.port)
    
    Metastore: 9083
    
    ▸ HiveServer web UI: http://localhost:10002

    hive-web-ui
  12. Start Hive console
    $ hive
    
    Start Hive console using custom parameters:
    $ hive -hiveconf hive.root.logger=DEBUG,console
    
    Some hive commands:
    hive> create database test;
    hive> use test;
    hive> show tables;
    hive> Analyze table testtable compute statistics;
    hive> drop database test cascade;
    
    Using Hive commands from Linux Shell:
    $ hive --database test -f test.hql
    
    $ hadoop fs -ls hdfs://localhost:8020/hive/warehouse/test
    
    $ hadoop fs -ls hdfs://localhost:8020/tmp/hive
    
    $ hive --database test -e 'show tables'
    
    $ hive --database test -e "Drop TABLE IF EXISTS testtable"
    
    Print Hive help:
    $ hive -H
     -d,--define <key=value>          Variable substitution to apply to Hive commands. e.g. -d A=B or --define A=B
        --database <databasename>     Specify the database to use
     -e <quoted-query-string>         SQL from command line
     -f <filename>                    SQL from files
     -H,--help                              Print help information
        --hiveconf <property=value>   Use value for given property
        --hivevar <key=value>         Variable substitution to apply to Hive commands. e.g. --hivevar A=B
     -i <filename>                    Initialization SQL file
     -S,--silent                            Silent mode in interactive shell
     -v,--verbose                           Verbose mode (echo executed SQL to the console)
    
  13. Start Hive MetaStore
    See "hive.metastore.port" property setup in "${HIVE_HOME}/conf/hive-site.xml".

    Start Hive MetaStore:
    $ nohup hive --service metastore &
    
    Hive MetaStore process info:
    ▸ Java virtual machine process status tool: jps

    ▸ Display process info: ps -fp <pid> | less

    ▸ Display process info: ps -aef | grep -i org.apache.hadoop.hive.metastore.HiveMetaStore

    ▸ Display process info: lsof -i:9083

    ▸ Telnet Hive MetaStore server: telnet localhost 9083


© mtitek.com