• Home
  • Docker
  • Kubernetes
  • LLMs
  • Java
  • Ubuntu
  • Maven
  • Big Data
  • Archived
Big Data | Install and configure Apache Hadoop (single node cluster)
  1. References
  2. Create "hadoop" user
  3. Install Hadoop
  4. Switch to "hadoop" user
  5. Update "~/.profile" file
  6. Create "hadoop.tmp.dir" directory
  7. Configure "${HADOOP_HOME}/etc/hadoop/core-site.xml"
  8. Configure "${HADOOP_HOME}/etc/hadoop/hdfs-site.xml"
  9. Configure "${HADOOP_HOME}/etc/hadoop/mapred-site.xml"
  10. Configure "${HADOOP_HOME}/etc/hadoop/hadoop-env.sh"
  11. Format HDFS filesystem
  12. Start single-node Hadoop cluster
  13. Set permission for "/" node in hdfs
  14. Hadoop Ports/Web UIs
  15. Hadoop: status, log files
  16. Stop single-node Hadoop cluster
  17. Uninstall Hadoop
  18. Hadoop "start-all.sh" permission denied: "ssh localhost: Permission denied (publickey, password)"

  1. References
    See this page for more details about Apache Hadoop:
    https://hadoop.apache.org/docs/current/
  2. Create "hadoop" user
  3. Install Hadoop
    Download Apache Hadoop: http://hadoop.apache.org/releases.html

    Extract the file "hadoop-3.3.0.tar.gz" in the folder you want to install Hadoop: e.g. '/opt/hadoop-3.3.0'

    Note: In the following sections, the environment variable ${HADOOP_HOME} will refer to this location '/opt/hadoop-3.3.0'.
  4. Switch to "hadoop" user
  5. Update "~/.profile" file

    Load ".profile" environment variables:
    Print Hadoop version:
  6. Create "hadoop.tmp.dir" directory
  7. Configure "${HADOOP_HOME}/etc/hadoop/core-site.xml"
    See this page for more detail:
    https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml


  8. Configure "${HADOOP_HOME}/etc/hadoop/hdfs-site.xml"
    See this page for more detail:
    https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml


  9. Configure "${HADOOP_HOME}/etc/hadoop/mapred-site.xml"
    See this page for more detail:
    https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml


  10. Configure "${HADOOP_HOME}/etc/hadoop/hadoop-env.sh"
    Edit file "hadoop-env.sh" and export "JAVA_HOME" environment variable.
  11. Format HDFS filesystem
  12. Start single-node Hadoop cluster

    You might get this error "Permission denied (publickey, password)" when you start Hadoop.
    To fix this error, see Hadoop "start-all.sh" permission denied: "ssh localhost: Permission denied (publickey, password)".
  13. Set permission for "/" node in hdfs
    Check permission:
    Set permission:
  14. Hadoop Ports/Web UIs
    Ports Used by Hadoop:
    • HDFS Web UI: http://localhost:9870

      hdfs-web-ui

    • Resource Manager: http://localhost:8088

      resource-manager

    • Node Manager: http://localhost:8042

      node-manager
  15. Hadoop: status, log files
    Hadoop processes info:
    • Java virtual machine process status tool: jps

    • Display process info: ps -fp <pid> | less

    • Display active TCP connections: sudo netstat -plten

    Hadoop log files:
    • Hadoop log files can be found in "${HADOOP_HOME}/logs/"
    • Hadoop jetty web app in "/tmp/jetty*"
    • Hadoop pid files: "/tmp/hadoop-*.pid"


  16. Stop single-node Hadoop cluster
  17. Uninstall Hadoop
    Make sure that Hadoop is not running (see above how to stop Hadoop).


    Note: You also need to delete Hadoop environment variables from "~/.profile" file.
  18. Hadoop "start-all.sh" permission denied: "ssh localhost: Permission denied (publickey, password)"
    To fix this issue you have to generate an SSH key:
    Optional: You may also need to edit the file "sshd_config" and update "PubkeyAuthentication" and "AllowUsers" variables.
    Reload SSH configs.
    To test SSH connection:
    To debug SSH connection:
© 2025  mtitek