• Home
  • LLMs
  • Docker
  • Kubernetes
  • Java
  • Maven
  • About
Big Data | Avro Tools
  1. References
  2. Usage
  3. Command: getschema
  4. Command: getmeta
  5. Command: tojson
  6. Command: fromjson

  1. References
    Avro is a row-based storage format for Hadoop.

    Row-based Layout r{1}-c{1} , r{1}-c{2} , r{1}-c{3} , ... r{2}-c{1} , r{2}-c{2} , r{2}-c{3} , ... r{3}-c{1} , r{3}-c{2} , r{3}-c{3} , ...

    Columnar Layout r{1}-c{1} , r{2}-c{1} , r{3}-c{1} , ... r{1}-c{2} , r{2}-c{2} , r{3}-c{2} , ... r{1}-c{3} , r{2}-c{3} , r{3}-c{3} , ...

    See these pages for more details about Avro Tools:
    http://avro.apache.org/docs/current/gettingstartedjava.html
    https://mvnrepository.com/artifact/org.apache.avro/avro-tools
    http://apache.forsale.plus/avro/
  2. Usage
    - Usage (hadoop): hadoop jar avro-tools-*.jar <command> <args>

    - Usage (local): java -jar avro-tools-*.jar <command> <args>

    Commands:
        getschema  Prints out schema of an Avro data file.
          getmeta  Prints out the metadata of an Avro data file.
           tojson  Dumps an Avro data file as JSON, record per line or pretty.
         fromjson  Reads JSON records and writes an Avro data file.
        canonical  Converts an Avro Schema to its canonical form.
              cat  Extracts samples from files.
          compile  Generates Java code for the given schema.
           concat  Concatenates avro files without re-compressing.
      fingerprint  Returns the fingerprint for the schemas.
       fragtojson  Renders a binary-encoded Avro datum as JSON.
         fromtext  Imports a text file into an avro data file.
              idl  Generates a JSON schema from an Avro IDL file.
     idl2schemata  Extract JSON schemata of the types from an Avro IDL file.
           induce  Induce schema/protocol from Java class/interface via reflection.
       jsontofrag  Renders a JSON-encoded Avro datum as binary.
           random  Creates a file with randomly generated instances of a schema.
          recodec  Alters the codec of a data file.
           repair  Recovers data from a corrupt Avro Data file.
      rpcprotocol  Output the protocol of a RPC service.
       rpcreceive  Opens an RPC Server and listens for one message.
          rpcsend  Sends a single RPC message.
           tether  Run a tethered mapreduce job.
           totext  Converts an Avro data file to a text file.
         totrevni  Converts an Avro data file to a Trevni file.
      trevni_meta  Dumps a Trevni file’s metadata as JSON.
    trevni_random  Create a Trevni file filled with random instances of a schema.
    trevni_tojson  Dumps a Trevni file as JSON.
  3. Command: getschema
    • Usage:
      $ hadoop jar avro-tools-1.9.0.jar getschema
      usage: getschema <input>
      
      where <input> is the avro file to print its schema to standard output.

    • Example:
      $ hadoop jar avro-tools-1.9.0.jar getschema hdfs://localhost:8020/test1.avro
      {
        "type" : "record",
        "name" : "test1",
        "namespace" : "com.mtitek",
        "doc" : "",
        "fields" : [ {
          "name" : "field1",
          "type" : "long",
          "doc" : ""
        }, {
          "name" : "field2",
          "type" : [ "null", "string" ],
          "doc" : "",
          "default" : null
        } ]
      }
  4. Command: getmeta
    • Usage:
      $ hadoop jar avro-tools-1.9.0.jar getmeta
      usage: getmeta <input> --key [String]
      
      where <input> is the avro file to print its metadata to standard output.

      Options:
      --key [String]  |  Metadata key.

    • Example:
      $ hadoop jar avro-tools-1.9.0.jar getmeta hdfs://localhost:8020/test1.avro
      avro.schema {"type":"record","name":"test1","namespace":"com.mtitek","doc":"","fields":[{"name":"field1","type":"long","doc":""},{"name":"field2","type":["null","string"],"doc":"","default":null}]}
      avro.codec snappy

      $ hadoop jar avro-tools-1.9.0.jar getmeta hdfs://localhost:8020/test1.avro --key "avro.schema"
      {"type":"record","name":"test1","namespace":"com.mtitek","doc":"","fields":[{"name":"field1","type":"long","doc":""},{"name":"field2","type":["null","string"],"doc":"","default":null}]}

      $ hadoop jar avro-tools-1.9.0.jar getmeta hdfs://localhost:8020/test1.avro --key "avro.codec"
      snappy
  5. Command: tojson
    • Usage:
      $ hadoop jar avro-tools-1.9.0.jar tojson
      usage: tojson [--pretty] [--head[=X]] <input>
      
        Dumps an Avro data file as JSON, record per line or pretty.
      
      where:
      <input> is the avro file to convert to json file.
      A dash ('-') can be given as an input file to use stdin

      Options:
      --head [String]  |  Converts the first X records (default is 10).
      --pretty         |  Turns on pretty printing.

    • Example:
      $ hadoop jar avro-tools-1.9.0.jar tojson hdfs://localhost:8020/test1.avro
      {"field1":{"long":123},"field2":{"string":"abc"}}
  6. Command: fromjson
    • Usage:
      $ hadoop jar avro-tools-1.9.0.jar fromjson
      usage: fromjson [OPTIONS] <input>
      
      where <input> is the json file to convert to avro file.

      Options:
      --schema [String]       |  Schema.
      --schema-file [String]  |  Schema File.
      --codec <String>        |  Compression codec (default: null).
      --level <Integer>       |  Compression level (only applies to deflate and xz) (default: -1).

    • Example:
      $ hadoop jar avro-tools-1.9.0.jar fromjson hdfs://localhost:8020/test1.json --schema-file hdfs://localhost:8020/test1.schema --codec snappy
© 2025  mtitek