• Home
  • Docker
  • Kubernetes
  • LLMs
  • Java
  • Ubuntu
  • Maven
  • Big Data
  • Archived
Big Data | Access Hive Tables using Spark SQL
  1. Notes
  2. Spark SQL - Sample application
    1. pom.xml file
    2. Java code
    3. Execution output
  3. Errors

  1. Notes
    Please make sure that Hadoop and Hive MetaStore are running properly.

    See these pages for more details about Apache Hadoop and Apache Hive:
    Install and configure Apache Hadoop (single node cluster)
    Install and configure Apache Hive

    You can find bellow the "pom.xml" file that list the required dependencies to execute the Spark-SQL sample application.

    The java code shows how to create the Spark Session:
    How to list existing Hive databases:
    How to list existing tables of a specific Hive database:
    How to insert new data in a table:
    How to print data of a table:
    The code below is using the Spark local master url:
    You can also use the Spark Standalone cluster master url:
    Please make sure that Spark is running properly.

    See this page for more details about Apache Spark:
    Install and configure Apache Spark (standalone)

    See this page for more details about Spark master URLs:
    https://spark.apache.org/docs/latest/submitting-applications.html#master-urls
  2. Spark SQL - Sample application
    1. pom.xml file
    2. Java code
    3. Execution output
  3. Errors
    You might get these errors when running the code above:

    • You might get this error when running the code below:
      By default, the configuration "hive.exec.scratchdir" has the value to "/tmp/hive"

      In some cases the folder "/tmp/hive" may be owned by another user's processes running on the same host where you are running the Spark SQL application.

      To fix the issue, either you assign write permission on the folder to the group or all ("sudo chmod -R 777 /tmp/hive/").
      Or you can set the configuration "hive.exec.scratchdir" and initialize its values with another directory that the user running the Spark SQL application can write to:
    • You might get this error when inserting data in the hive table:
      One solution is to run the Spark SQL application by a user that has permission to write into the hdfs location "/test_db/test_table_1".
      Or you can assign write permission on the hdfs location to the group or all ("hdfs dfs -chmod -R 777 /test_db/test_table_1").
      You might also consider using ACLs instead:
© 2025  mtitek