• Home
  • Docker
  • Kubernetes
  • Java
  • Ubuntu
  • Maven
  • Big Data
  • CI
  • Install
  • Samples
  • Archived
Big Data | Install and configure Apache Spark (standalone)
  1. References
  2. Create "spark" user
  3. Install Spark
  4. Switch to "spark" user
  5. Update "~/.profile" file
  6. Start Spark
  7. Stop Spark
  8. Spark Ports/Web UIs
  9. Spark: status, log files
  10. Spark "start-all.sh" permission denied: "spark@localhost: Permission denied (publickey, password)"

  1. References
    See these pages for more details about Cluster Modes and Apache Spark Standalone Mode:
    https://spark.apache.org/docs/latest/spark-standalone.html
    https://spark.apache.org/docs/latest/cluster-overview.html

    See this page for more details about Apache Spark Configuration:
    https://spark.apache.org/docs/latest/configuration.html

    See these pages for more details about Apache Hadoop, Apache Hive:
    Install Apache Hadoop
    Install Apache Hive
  2. Create "spark" user
  3. Install Spark
    Download Apache Spark: https://spark.apache.org/downloads.html

    - Choose a Spark release: 3.0.0 (Jun 18 2020)
    - Choose a package type: Pre-built for Apache Hadoop 3.2 and later
    - Download Spark: spark-3.0.0-bin-hadoop3.2.tgz

    Extract the file "spark-3.0.0-bin-hadoop3.2.tgz" in the folder you want to install Spark: e.g. '/opt/spark-3.0.0-bin-hadoop3.2'

    Note: In the following sections, the environment variable ${SPARK_HOME} will refer to this location '/opt/spark-3.0.0-bin-hadoop3.2'
  4. Switch to "spark" user
  5. Update "~/.profile" file

    Load ".profile" environment variables:
  6. Start Spark

    To Start the Spark Master and Workers separately:
    You might get this error "Permission denied (publickey, password)" when you start Spark.
    To fix this error, see Spark "start-all.sh" permission denied: "spark@localhost: Permission denied (publickey, password)".
  7. Stop Spark
  8. Spark Ports/Web UIs
    Ports Used by Spark:
    ► Spark web UI: http://localhost:8080

    spark-web-ui

    ► Spark shell application UI (see bellow how to start spark shell and make sure that the port is 4040): http://localhost:4040

    spark-shell-application-ui
  9. Spark: status, log files
    Spark process info:
    • Java virtual machine process status tool: jps

    • Display process info: ps -fp <pid> | less


    Spark log files:
    • Spark log files can be found in "${SPARK_HOME}/logs/"
    • Spark workers log files can be found in "${SPARK_HOME}/work/"
    • Spark pid files: "/tmp/spark-*.pid"




  10. Spark "start-all.sh" permission denied: "spark@localhost: Permission denied (publickey, password)"
    To fix this issue you have to generate an SSH key:
    Optional: You may also need to edit the file "sshd_config" and update "PubkeyAuthentication" and "AllowUsers" variables.
    Reload SSH configs.
    To test SSH connection:
    To debug SSH connection:
© 2025  mtitek