• Home
  • LLMs
  • Docker
  • Kubernetes
  • Java
  • Maven
  • About
Big Data | ORC Tools
  1. References
  2. Usage
  3. Command: meta
  4. Command: data
  5. Command: scan
  6. Command: convert
  7. Command: json-schema

  1. References
    ORC is a columnar (column-oriented) storage format for Hadoop.

    Columnar Layout r{1}-c{1} , r{2}-c{1} , r{3}-c{1} , ... r{1}-c{2} , r{2}-c{2} , r{3}-c{2} , ... r{1}-c{3} , r{2}-c{3} , r{3}-c{3} , ...

    Row-based Layout r{1}-c{1} , r{1}-c{2} , r{1}-c{3} , ... r{2}-c{1} , r{2}-c{2} , r{2}-c{3} , ... r{3}-c{1} , r{3}-c{2} , r{3}-c{3} , ...

    See these pages for more details about ORC Tools:
    https://orc.apache.org/docs/java-tools.html
    http://repo1.maven.org/maven2/org/apache/orc/orc-tools/1.5.4/
  2. Usage
    - Usage (hadoop): hadoop jar orc-tools-*-uber.jar [--help] [--define X=Y] <command> <args>

    - Usage (local): java -jar orc-tools-*-uber.jar [--help] [--define X=Y] <command> <args>

    Commands:
           meta  print the metadata about the ORC file.
           data  print the data from the ORC file.
           scan  scan the ORC file.
        convert  convert CSV and JSON files to ORC.
    json-schema  scan JSON files to determine their schema.
            key  print information about the keys.

    orc-tools-*-uber.jar print help when invoked without parameters or with "-help" parameter:
    hadoop jar orc-tools-*-uber.jar --help.

    To print the help of a specific command use the following syntax:
    hadoop jar orc-tools-*-uber.jar COMMAND --help.
  3. Command: meta
    Print the metadata about the ORC file.

    • Usage:
      $ hadoop jar orc-tools-1.5.4-uber.jar meta --help
      usage: meta <input>
      
      where <input> is the orc file to print its meta data to standard output.

    • Example:
      $ hadoop jar orc-tools-1.5.4-uber.jar meta hdfs://localhost:8020/test1.orc
  4. Command: data
    Print the data from the ORC file.

    • Usage:
      $ hadoop jar orc-tools-1.5.4-uber.jar data --help
      usage: data <input>
      
      where <input> is the orc file to print its data to standard output.

    • Example:
      $ hadoop jar orc-tools-1.5.4-uber.jar data hdfs://localhost:8020/test1.orc
  5. Command: scan
    Scan the ORC file.

    • Usage:
      $ hadoop jar orc-tools-1.5.4-uber.jar scan --help
      usage: scan <input>
      
      where <input> is the orc file to scan and print its info to standard output.

    • Example:
      $ hadoop jar orc-tools-1.5.4-uber.jar scan hdfs://localhost:8020/test1.orc
  6. Command: convert
    Convert CSV and JSON files to ORC.

    • Usage:
      $ hadoop jar orc-tools-1.5.4-uber.jar convert --help
      usage: convert <option> <input>
      
      where <input> is the csv/json file to convert to orc file.

      Options:
      -e,--escape <arg>           |  CSV escape character.
      -h,--help                   |  Provide help.
      -H,--header <arg>           |  CSV header lines.
      -n,--null <arg>             |  CSV null string.
      -o,--output <arg>           |  Output filename.
      -q,--quote <arg>            |  CSV quote character.
      -s,--schema <arg>           |  The schema to write in to the file.
      -S,--separator <arg>        |  CSV separator character.
      -t,--timestampformat <arg>  |  Timestamp Format.

    • Example:
      $ hadoop jar orc-tools-1.5.4-uber.jar convert hdfs://localhost:8020/test1.json -o hdfs://localhost:8020/test1.orc
  7. Command: json-schema
    Scan JSON files to determine their schema.

    • Usage:
      $ hadoop jar orc-tools-1.5.4-uber.jar json-schema --help
      usage: json-schema <option> <input>
      
      where <input> is the json file to scan and print its schema to standard output.

      Options:
      -f,--flat    |  Print types as flat list of types.
      -h,--help    |  Provide help.
      -p,--pretty  |  Pretty print the schema.
      -t,--table   |  Print types as Hive table declaration.

    • Example:
      $ hadoop jar orc-tools-1.5.4-uber.jar json-schema hdfs://localhost:8020/test1.json
© 2025  mtitek