• Home
  • Docker
  • Kubernetes
  • LLMs
  • Java
  • Ubuntu
  • Maven
  • Big Data
  • Archived
Big Data
  1. Big Data ecosystem
  2. Apache Hadoop
    • Install and configure Apache Hadoop (single node cluster) (3.3.0)
    • HDFS Commands
      • HDFS - DFS Commands
      • HDFS - DFSADMIN Commands
    • ORC/Parquet/Avro Tools
      • ORC Tools (1.5.4)
      • Parquet Tools (1.9.0)
      • Avro Tools (1.9.0)
  3. Apache Hive
    • Install and configure Apache Hive (HiveServer, Hive MetaStore) (3.1.2)
    • Manage Hive Databases
  4. Apache Spark
    • Install and configure Apache Spark (standalone) (3.0.0)
    • Access Hive Tables using Spark SQL
    • Spark Tools
      • Spark Interactive Shell (Scala): spark-shell
      • Spark Interactive Shell (Python): pyspark
      • Spark Interactive Shell (R): sparkR
      • Submitting Applications: spark-submit
      • Spark SQL CLI: spark-sql
    • Spark API: RDD, DataFrame, Dataset
  5. Apache ZooKeeper
  6. Apache Solr
  7. Install and configure Apache Kafka
  8. Install and configure MongoDB (4.2)
  9. Install and configure Apache Nutch (2.3.1)
  10. Python (3.11.4)
© 2025  mtitek