• Home
  • LLMs
  • Docker
  • Kubernetes
  • Java
  • Maven
  • About
Big Data | Install and configure Apache Hive (HiveServer, Hive MetaStore)
  1. References
  2. Create "hive" user
  3. Install Hive
  4. Create "warehouse" directory in hdfs
  5. Create Hive metastore database (PostgreSQL)
  6. Switch to "hive" user
  7. Update "~/.profile" file
  8. Configure "${HIVE_HOME}/conf/hive-site.xml"
  9. Configure "${HIVE_HOME}/bin/hive-config.sh"
  10. Create Hive schema (PostgreSQL)
  11. Start HiveServer
  12. Hive Ports/Web UIs
  13. HiveServer: status, log files
  14. Start Hive MetaStore
  15. Hive MetaStore: status, log files
  16. Launch Hive console
  17. Beeline – Command Line Shell

  1. References
    See these pages for more details about Apache Hive:
    https://cwiki.apache.org/confluence/display/Hive/GettingStarted
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

    See these pages for more details about Apache Hadoop, PostgreSQL, MySQL installation:
    Install Apache Hadoop
    Install PostgreSQL (database)
    Install MySQL (database)
  2. Create "hive" user
    $ sudo addgroup hive
    $ sudo adduser --ingroup hive hive
    $ sudo usermod -a -G hadoop hive
  3. Install Hive
    Download Apache Hive: https://hive.apache.org/downloads.html

    Extract the file "apache-hive-3.1.2-bin.tar.gz" in the folder you want to install Hive: e.g. '/opt/apache-hive-3.1.2-bin'
    $ tar -xf ~/Downloads/apache-hive-3.1.2-bin.tar.gz -C /opt/
    $ chmod -R 755 /opt/apache-hive-3.1.2-bin
    $ sudo chown -R hive:hive /opt/apache-hive-3.1.2-bin

    Note: In the following sections, the environment variable ${HIVE_HOME} will refer to this location '/opt/apache-hive-3.1.2-bin'
  4. Create "warehouse" directory in hdfs
    $ su - hadoop
    
    $ hdfs dfs -mkdir /hive /hive/warehouse
    $ hdfs dfs -chmod -R 775 /hive
    $ hdfs dfs -chown -R hive:hadoop /hive
  5. Create Hive metastore database (PostgreSQL)
    Create postgresql database "hivemetastoredb".
    $ su - postgres
    
    $ createdb -h localhost -p 5432 -U postgres --password hivemetastoredb
  6. Switch to "hive" user
    $ su - hive
  7. Update "~/.profile" file
    $ vi ~/.profile
    export JAVA_HOME="/opt/jdk1.8.0_172"
    
    export HIVE_HOME="/opt/apache-hive-3.1.2-bin"
    
    export HADOOP_HOME="/opt/hadoop-3.3.0"
    
    export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib:$HADOOP_HOME/share/hadoop/common/lib
    
    PATH="$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH"

    Load ".profile" environment variables:
    $ source ~/.profile
  8. Configure "${HIVE_HOME}/conf/hive-site.xml"
    Note: Use this config if you are using PostgreSQL metastore:
    $ vi ${HIVE_HOME}/conf/hive-site.xml

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <property>
            <name>hive.metastore.local</name>
            <value>true</value>
        </property>
    
        <property>
            <name>hive.metastore.warehouse.dir</name>
            <value>/hive/warehouse</value>
        </property>
    
        <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>org.postgresql.Driver</value>
        </property>
    
        <property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:postgresql://localhost:5432/hivemetastoredb</value>
        </property>
    
        <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>postgres</value>
        </property>
    
        <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>postgres</value>
        </property>
    
        <property>
            <name>hive.server2.thrift.port</name>
            <value>10000</value>
        </property>
    
        <property>
            <name>hive.server2.enable.doAs</name>
            <value>true</value>
        </property>
    
        <property>
            <name>hive.execution.engine</name>
            <value>mr</value>
        </property>
    
        <property>
            <name>hive.metastore.port</name>
            <value>9083</value>
        </property>
    
        <property>
            <name>mapreduce.input.fileinputformat.input.dir.recursive</name>
            <value>true</value>
        </property>
    </configuration>

    Note: If you are using MySQL metastore, then use these MySQL connection properties:
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/hivemetastoredb?createDatabaseIfNotExist=true</value>
    </property>
    
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>admin</value>
    </property>
    
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>admin</value>
    </property>
  9. Configure "${HIVE_HOME}/bin/hive-config.sh"
    Edit file "hive-config.sh" and export "HADOOP_HOME" environment variable.
    $ vi ${HIVE_HOME}/bin/hive-config.sh
    export HADOOP_HOME="/opt/hadoop-3.3.0"
    #export HADOOP_HEAPSIZE=${HADOOP_HEAPSIZE:-256}
    export HADOOP_HEAPSIZE=${HADOOP_HEAPSIZE:-1024}
  10. Create Hive schema (PostgreSQL)
    Create Hive schema.
    $ ${HIVE_HOME}/bin/schematool -initSchema -dbType postgres

    Note: You might get this error:
    Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1380)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1361)
        at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:536)
        at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:554)
        at org.apache.hadoop.mapred.JobConf.(JobConf.java:448)
        at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:5141)
        at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:5104)
        at org.apache.hive.beeline.HiveSchemaTool.(HiveSchemaTool.java:96)
        at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1473)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

    See:
    https://issues.apache.org/jira/browse/HIVE-22915
    https://issues.apache.org/jira/browse/HIVE-22718

    To fix the issue, find guava installed with hadoop and hive:
    $ find /opt/hadoop-3.3.0/ -type f -name "guava-*.jar"
    /opt/hadoop-3.3.0/share/hadoop/hdfs/lib/guava-27.0-jre.jar

    $ find /opt/apache-hive-3.1.2-bin/ -type f -name "guava-*.jar"
    /opt/apache-hive-3.1.2-bin/lib/guava-19.0.jar

    Replace Hive's guava-19.0.jar with Hadoop's guava-27.0-jre.jar:
    $ mv /opt/apache-hive-3.1.2-bin/lib/guava-19.0.jar /opt/apache-hive-3.1.2-bin/lib/guava-19.0.jar.bak
    $ cp /opt/hadoop-3.3.0/share/hadoop/hdfs/lib/guava-27.0-jre.jar /opt/apache-hive-3.1.2-bin/lib/

    Note: To create Hive metastore database with MySQL:
    $ cd ${HIVE_HOME}/scripts/metastore/upgrade/mysql/
    $ mysql -h localhost -u admin -p
    mysql> CREATE DATABASE hivemetastoredb;
    mysql> CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';
    mysql> GRANT all on metastoredb.* to 'hive'@localhost identified by 'hive';
    mysql> flush privileges;
    mysql> use hivemetastoredb;
    mysql> source hive-schema-3.0.0.mysql.sql;
  11. Start HiveServer
    $ mkdir ~/hiveserver2log
    $ cd ~/hiveserver2log

    Execute "hiveserver2":
    $ nohup hiveserver2 &

    Execute "hive" to run hiveserver2 service:
    $ nohup hive --service hiveserver2 &

    Start HiveServer using custom parameters:
    $ nohup hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf hive.root.logger=INFO,console &

    Check "nohup.out" file:
    $ tail -f ~/hiveserver2log/nohup.out
    Starting HiveServer2
    Hive Session ID = 8db09895-1081-41ee-a2ce-a813cf37aa54
    Hive Session ID = 5eb2ba61-db1a-4b97-8667-fe5ed9c221b5
    Hive Session ID = 39187ac2-cc98-457a-8c80-4b1b116fe858
    Hive Session ID = 533469ed-b6b9-4935-9107-bd0d782c647f
  12. Hive Ports/Web UIs
    Ports Used by Hive:
    HiveServer2: 10000 (hive.server2.thrift.port)
    
    HiveServer2 Web UI: 10002 (hive.server2.webui.port)
    
    Metastore: 9083

    ► HiveServer web UI: http://localhost:10002

    hive-web-ui
  13. HiveServer: status, log files
    Hive process info:
    • Java virtual machine process status tool: jps
      $ jps -ml
      13956 org.apache.hadoop.util.RunJar /opt/apache-hive-3.1.2-bin/lib/hive-service-3.1.2.jar org.apache.hive.service.server.HiveServer2

    • Display process info: ps -fp <pid> | less
      $ ps -fp 9299 | less
      UID        PID  PPID  C STIME TTY          TIME CMD
      hive      9299  8966  9 11:18 pts/3    00:00:14 /opt/jdk1.8.0_172/bin/java -Dproc_jar -Dproc_hiveserver2
      -Dlog4j.configurationFile=hive-log4j2.properties
      -Djava.util.logging.config.file=/opt/apache-hive-3.1.2-bin/conf/parquet-logging.properties
      -Djline.terminal=jline.UnsupportedTerminal -Dyarn.log.dir=/opt/hadoop-3.3.0/logs
      -Dyarn.log.file=hadoop.log -Dyarn.home.dir=/opt/hadoop-3.3.0
      -Dyarn.root.logger=INFO,console -Djava.library.path=/opt/hadoop-3.3.0/lib/native -Xmx1024m
      -Dhadoop.log.dir=/opt/hadoop-3.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop-3.3.0
      -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml
      -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar
      /opt/apache-hive-3.1.2-bin/lib/hive-service-3.1.2.jar org.apache.hive.service.server.HiveServer2

    Hive log files:
    • Hive log files can be found in "/tmp/hive/hive.log"
    • Hive jetty web app in "/tmp/jetty*"

    $ ls -al /tmp/ | grep hive
    drwxrwxr-x hive hive hive
    
    drwxrwxr-x hive hive jetty-0.0.0.0-10002-hiveserver2-_-any-2412071273215690221.dir
    
    drwxrwxr-x hive hive hadoop-unjar4204305336241959435

    $ ls -al /tmp/hive
    -rw-rw-r-- hive hive hive.log
  14. Start Hive MetaStore
    $ mkdir ~/hivemetastorelog
    $ cd ~/hivemetastorelog

    Start Hive MetaStore:
    $ nohup hive --service metastore &

    Check "nohup.out" file:
    $ tail -f ~/hiveserver2log/nohup.out
    Starting Hive Metastore Server
  15. Hive MetaStore: status, log files
    Hive MetaStore process info:
    • Java virtual machine process status tool: jps
      $ jps -ml
      14392 org.apache.hadoop.util.RunJar /opt/apache-hive-3.1.2-bin/lib/hive-metastore-3.1.2.jar org.apache.hadoop.hive.metastore.HiveMetaStore

    • Display process info: ps -fp <pid> | less
      $ ps -fp 10527 | less
      UID        PID  PPID  C STIME TTY          TIME CMD
      hive     10527  8966  5 11:40 pts/3    00:00:08 /opt/jdk1.8.0_172/bin/java -Dproc_jar -Dproc_metastore
      -Dlog4j.configurationFile=hive-log4j2.properties
      -Djava.util.logging.config.file=/opt/apache-hive-3.1.2-bin/conf/parquet-logging.properties
      -Dyarn.log.dir=/opt/hadoop-3.3.0/logs -Dyarn.log.file=hadoop.log -Dyarn.home.dir=/opt/hadoop-3.3.0
      -Dyarn.root.logger=INFO,console -Djava.library.path=/opt/hadoop-3.3.0/lib/native -Xmx1024m
      -Dhadoop.log.dir=/opt/hadoop-3.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop-3.3.0
      -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml
      -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar
      /opt/apache-hive-3.1.2-bin/lib/hive-metastore-3.1.2.jar org.apache.hadoop.hive.metastore.HiveMetaStore

    • Display process info: ps -aef | grep -i org.apache.hadoop.hive.metastore.HiveMetaStore

    • Display process info: lsof -i:9083
      See "hive.metastore.port" property setup in "${HIVE_HOME}/conf/hive-site.xml" (9083).
      $ lsof -i:9083
      COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
      java    10527 hive  484u  IPv6 438850      0t0  TCP *:9083 (LISTEN)

    • Telnet Hive MetaStore server: telnet localhost 9083

    Hive MetaStore log files:
    • Hive MetaStore log files can be found in "/tmp/hive/hive.log"
  16. Launch Hive console
    $ hive

    Launch Hive console using custom parameters:
    $ hive -hiveconf hive.root.logger=DEBUG,console

    Use the command exit; to quit Hive console.

    Print Hive help:
    $ hive -H
    --hiveconf <property=value>  |  Use value for given property
    --hivevar <key=value>        |  Variable substitution to apply to Hive commands. e.g. --hivevar A=B
    --database <databasename>    |  Specify the database to use
     -d,--define <key=value>     |  Variable substitution to apply to Hive commands. e.g. -d A=B or --define A=B
     -e <quoted-query-string>    |  SQL from command line
     -f <filename>               |  SQL from files
     -H,--help                   |  Print help information
     -i <filename>               |  Initialization SQL file
     -S,--silent                 |  Silent mode in interactive shell
     -v,--verbose                |  Verbose mode (echo executed SQL to the console)

    Some Hive commands:
    hive> create database test;
    hive> use test;
    hive> show tables;
    hive> Analyze table test compute statistics;
    hive> drop database test cascade;

    Using Hive commands from Linux Shell:
    $ hive --database test -f test.hql
    
    $ hive --database test -e 'show tables'
    
    $ hive --database test -e "Drop TABLE IF EXISTS test"

    Listing data of Hive HDFS directories:
    $ hdfs dfs -ls hdfs://localhost:8020/hive/warehouse/
    drwxr-xr-x   - hive hadoop          0 hdfs://localhost:8020/hive/warehouse/test.db

    $ hdfs dfs -ls hdfs://localhost:8020/tmp/hive
    drwx-wx-wx   - hive hadoop          0 hdfs://localhost:8020/tmp/hive/_resultscache_
    drwx------   - hive hadoop          0 hdfs://localhost:8020/tmp/hive/hive
  17. Beeline – Command Line Shell
    See this page for more details about Beeline:
    https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

    Launch Beeline:
    $ beeline
    Beeline version 3.1.2 by Apache Hive
    beeline>

    Use the command !exit to quit Beeline.

    Connect to Hive:
    beeline> !connect jdbc:hive2://localhost:10000
    Enter username for jdbc:hive2://localhost:10000: hive
    Enter password for jdbc:hive2://localhost:10000: ****

    You might get an error when you connect as hive user (you might get the same error with another user):
    [main]: WARN jdbc.HiveConnection: Failed to connect to localhost:10000
    Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000:
    Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
    User: hive is not allowed to impersonate hive (state=08S01,code=0)

    To fix this error, you have to edit the file ${HADOOP_HOME}/etc/hadoop/core-site.xml and add the following:
    <property>
        <name>hadoop.proxyuser.hive.groups</name>
        <value>*</value>
    </property>
    
    <property>
        <name>hadoop.proxyuser.hive.hosts</name>
        <value>*</value>
    </property>

    You should be able to connect to the Hive after applying the changes and restarting Hadoop:
    beeline> !connect jdbc:hive2://localhost:10000
    Connecting to jdbc:hive2://localhost:10000
    Enter username for jdbc:hive2://localhost:10000: hive
    Enter password for jdbc:hive2://localhost:10000: ****
    Connected to: Apache Hive (version 3.1.2)
    Driver: Hive JDBC (version 3.1.2)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    0: jdbc:hive2://localhost:10000>

    To connect directly to a specific database:
    beeline> !connect jdbc:hive2://localhost:10000/test
    Connecting to jdbc:hive2://localhost:10000/test
    Enter username for jdbc:hive2://localhost:10000/test: hive
    Enter password for jdbc:hive2://localhost:10000/test: ****
    Connected to: Apache Hive (version 3.1.2)
    Driver: Hive JDBC (version 3.1.2)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    0: jdbc:hive2://localhost:10000/test>

    Show Tables:
    0: jdbc:hive2://localhost:10000/test> show tables;
    +----------------------+
    |       tab_name       |
    +----------------------+
    | test_external_table  |
    | test_internal_table  |
    +----------------------+
    2 rows selected (0.073 seconds)
© 2025  mtitek