• Home
  • Docker
  • Kubernetes
  • LLMs
  • Java
  • Ubuntu
  • Maven
  • Big Data
  • Archived
Big Data | ORC Tools
  1. References
  2. Usage
  3. Command: meta
  4. Command: data
  5. Command: scan
  6. Command: convert
  7. Command: json-schema

  1. References
    ORC is a columnar (column-oriented) storage format for Hadoop.

    Columnar Layout r{1}-c{1} , r{2}-c{1} , r{3}-c{1} , ... r{1}-c{2} , r{2}-c{2} , r{3}-c{2} , ... r{1}-c{3} , r{2}-c{3} , r{3}-c{3} , ...

    Row-based Layout r{1}-c{1} , r{1}-c{2} , r{1}-c{3} , ... r{2}-c{1} , r{2}-c{2} , r{2}-c{3} , ... r{3}-c{1} , r{3}-c{2} , r{3}-c{3} , ...

    See these pages for more details about ORC Tools:
    https://orc.apache.org/docs/java-tools.html
    http://repo1.maven.org/maven2/org/apache/orc/orc-tools/1.5.4/
  2. Usage
    - Usage (hadoop): hadoop jar orc-tools-*-uber.jar [--help] [--define X=Y] <command> <args>

    - Usage (local): java -jar orc-tools-*-uber.jar [--help] [--define X=Y] <command> <args>

    Commands:

    orc-tools-*-uber.jar print help when invoked without parameters or with "-help" parameter:
    hadoop jar orc-tools-*-uber.jar --help.

    To print the help of a specific command use the following syntax:
    hadoop jar orc-tools-*-uber.jar COMMAND --help.
  3. Command: meta
    Print the metadata about the ORC file.

    • Usage:

    • Example:
  4. Command: data
    Print the data from the ORC file.

    • Usage:

    • Example:
  5. Command: scan
    Scan the ORC file.

    • Usage:

    • Example:
  6. Command: convert
    Convert CSV and JSON files to ORC.

    • Usage:

      Options:

    • Example:
  7. Command: json-schema
    Scan JSON files to determine their schema.

    • Usage:

      Options:

    • Example:
© 2025  mtitek