• Home
  • Docker
  • Kubernetes
  • LLMs
  • Java
  • Ubuntu
  • Maven
  • Big Data
  • Archived
Big Data | Parquet Tools
  1. References
  2. Quick notes about Parquet
  3. Usage
  4. Command: cat
  5. Command: head
  6. Command: schema
  7. Command: meta
  8. Command: dump
  9. Command: merge

  1. References
    See these pages for more details about Parquet Tools:
    https://github.com/apache/parquet-mr/tree/master/parquet-tools
    https://mvnrepository.com/artifact/org.apache.parquet/parquet-tools
  2. Quick notes about Parquet
    Parquet is a columnar (column-oriented) storage format for Hadoop.

    Columnar Layout r{1}-c{1} , r{2}-c{1} , r{3}-c{1} , ... r{1}-c{2} , r{2}-c{2} , r{3}-c{2} , ... r{1}-c{3} , r{2}-c{3} , r{3}-c{3} , ...

    Row-based Layout r{1}-c{1} , r{1}-c{2} , r{1}-c{3} , ... r{2}-c{1} , r{2}-c{2} , r{2}-c{3} , ... r{3}-c{1} , r{3}-c{2} , r{3}-c{3} , ...

    Key concepts
    - Block Size
    - Row Group
    - Page

    File:
    - Magic Number
    - Row group 0
     - Column 1
       - Page 0
  3. Usage
    - Usage (hadoop): hadoop jar parquet-tools-*.jar COMMAND [GENERIC-OPTIONS] [COMMAND-OPTIONS] PARUQET-FILE-PATH

    - Usage (local): java -jar parquet-tools-*.jar COMMAND [GENERIC-OPTIONS] [COMMAND-OPTIONS] PARUQET-FILE-PATH

    Commands:

    Generic options:

    parquet-tools-*.jar print help when invoked without parameters or with "-help" or "--h" parameter:
    hadoop jar parquet-tools-*.jar --help.

    To print the help of a specific command use the following syntax:
    hadoop jar parquet-tools-*.jar COMMAND --help.
  4. Command: cat
    Prints out content for a given parquet file.

    • Usage:

      Command options:

    • Example:
  5. Command: head
    Prints out the first n records for a given parquet file (default: 5).

    • Usage:

      Command options:

    • Example:

  6. Command: schema
    Prints out the schema for a given parquet file.

    • Usage:

      Command options:

    • Example:
  7. Command: meta
    Prints out metadata for a given parquet file.

    • Usage:

    • Example:
  8. Command: dump
    Prints out row groups and metadata for a given parquet file.

    • Usage:

      Command options:

    • Example:
  9. Command: merge
    Merges multiple Parquet files into one Parquet file.

    • Usage:

    • Example:
© 2025  mtitek