HDFS - DFS Commands: Apache Hadoop

Big Data | HDFS - DFS Commands

References
Usage
Command: ls
Command: count
Command: find
Command: checksum
Command: stat
Command: test
Command: cat
Command: text
Command: head
Command: tail
Command: mkdir
Command: rmdir
Command: put
Command: get
Command: copyFromLocal
Command: copyToLocal
Command: moveFromLocal
Command: cp
Command: mv
Command: rm
Command: expunge
Command: appendToFile
Command: truncate
Command: touch
Command: touchz
Command: df
Command: du
Command: chmod
Command: chown
Command: chgrp
Command: setfacl
Command: getfacl
Command: setfattr
Command: getfattr
Command: getmerge
Command: setrep
Command: createSnapshot
Command: renameSnapshot
Command: deleteSnapshot

References
See this page for more details about File System (FS) shell commands:
https://hadoop.apache.org/docs/r3.1.2/hadoop-project-dist/hadoop-common/FileSystemShell.html

Usage

dfs run a filesystem command on the file system supported in Hadoop.

The general command line syntax is: hdfs dfs COMMAND [GENERIC-OPTIONS] [COMMAND-OPTIONS]

Generic options supported are:

-conf <configuration file>           specify an application configuration file.
-D <property=value>                  define a value for a given property.
-fs <file:///|hdfs://namenode:port>  specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>     specify a ResourceManager.
-files <file1,...>                   specify a comma-separated list of files to be copied to the map reduce cluster.
-libjars <jar1,...>                  specify a comma-separated list of jar files to be included in the classpath.
-archives <archive1,...>             specify a comma-separated list of archives to be unarchived on the compute machines.

hdfs dfs print help when invoked without parameters or with "-help" parameter: hdfs dfs -help.

To print the help of a specific command use the following syntax: hdfs dfsadmin -help COMMAND.

$ hdfs dfs -help help
-help [cmd ...]

  Displays help for given command or all commands if none is specified.

To print the usage of a specific command use the following syntax: hdfs dfsadmin -usage COMMAND.

$ hdfs dfs -usage help
-usage [cmd ...]

  Usage: hadoop fs [generic options] -help [cmd ...]

Command: ls

Help:

$ hdfs dfs -help ls
-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]

  List the contents that match the specified file pattern.
  If path is not specified, the contents of /user/<currentUser> will be listed.
  For a directory a list of its direct children is returned (unless -d option is specified).

  Directory entries are of the form:
      permissions - userId groupId sizeOfDirectory(in bytes) modificationDate(yyyy-MM-dd HH:mm) directoryName

  and file entries are of the form:
      permissions numberOfReplicas userId groupId sizeOfFile(in bytes) modificationDate(yyyy-MM-dd HH:mm) fileName

    -C  Display the paths of files and directories only.
    -d  Directories are listed as plain files.
    -h  Formats the sizes of files in a human-readable fashion rather than a number of bytes.
    -q  Print ? instead of non-printable characters.
    -R  Recursively list the contents of directories.
    -t  Sort files by modification time (most recent first).
    -S  Sort files by size.
    -r  Reverse the order of the sort.
    -u  Use time of last access instead of modification for display and sorting.
    -e  Display the erasure coding policy of files and directories.

Example:

$ hdfs dfs -ls hdfs://localhost:8020/
$ hdfs dfs -ls -R hdfs://localhost:8020/

Command: count

Help:

$ hdfs dfs -help count
-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] <path> ...

  Count the number of directories, files and bytes under the paths that match the specified file pattern.

  The output columns are:
    DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME

  or, with the -q option:
    QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME

  -h  shows file sizes in human readable format.
  -v  displays a header line.
  -x  excludes snapshots from being calculated.
  -t  displays quota by storage types.
    It should be used with -q or -u option, otherwise it will be ignored.
    If a comma-separated list of storage types is given after the -t option, it displays the quota and usage for the specified types.
    Otherwise, it displays the quota and usage for all the storage types that support quota.
    The list of possible storage types(case insensitive): ram_disk, ssd, disk and archive.
    It can also pass the value '', 'all' or 'ALL' to specify all the storage types.
  -u  shows the quota and the usage against the quota without the detailed content summary.
  -e  shows the erasure coding policy.

Example:

$ hdfs dfs -count -h -v hdfs://localhost:8020/

Command: find

Help:

$ hdfs dfs -help find
-find <path> ... <expression> ... :

  Finds all files that match the specified expression and applies selected actions to them.
  If no <path> is specified then defaults to the current working directory.
  If no expression is specified then defaults to -print.

  The following primary expressions are recognised:
    -name pattern
    -iname pattern
      Evaluates as true if the basename of the file matches the pattern using standard file system globbing.
      If -iname is used then the match is case insensitive.

    -print
    -print0
      Always evaluates to true.
      Causes the current pathname to be written to standard output followed by a newline.
      If the -print0 expression is used then an ASCII NULL character is appended rather than a newline.

  The following operators are recognised:
    expression -a expression
    expression -and expression
    expression expression
      Logical AND operator for joining two expressions.
      Returns true if both child expressions return true.
      Implied by the juxtaposition of two expressions and so does not need to be explicitly specified.
      The second expression will not be applied if the first fails.

Example:

$ hdfs dfs -find hdfs://localhost:8020/ -name test1.txt

Command: checksum

Help:

$ hdfs dfs -help checksum
-checksum <src> ...

  Dump checksum information for files that match the file pattern <src> to stdout.
  Note that this requires a round-trip to a datanode storing each block of the file, and thus is not efficient to run on a large number of files.
  The checksum of a file depends on its content, block size and the checksum algorithm and parameters used for creating the file.

Example:

$ hdfs dfs -checksum hdfs://localhost:8020/test1.txt

Command: stat

Help:

$ hdfs dfs -help stat
-stat [format] <path> ...

  Print statistics about the file/directory at <path> in the specified format.

  Format accepts:
    permissions in octal (%a) and symbolic (%A),
    filesize in bytes (%b),
    type (%F),
    group name of owner (%g),
    name (%n),
    block size (%o),
    replication (%r),
    user name of owner (%u),
    access date (%x, %X),
    modification date (%y, %Y).

  %x and %y show UTC date as "yyyy-MM-dd HH:mm:ss"
  and %X and %Y show milliseconds since January 1, 1970 UTC.

  If the format is not specified, %y is used by default.

Example:

$ hdfs dfs -stat "%n %b %F %u" hdfs://localhost:8020/test1.txt

Command: test

Help:

$ hdfs dfs -help test
-test -[defsz] <path>

  Answer various questions about <path>, with result via exit status.

  -d  return 0 if <path> is a directory.
  -e  return 0 if <path> exists.
  -f  return 0 if <path> is a file.
  -s  return 0 if file <path> is greater than zero bytes in size.
  -w  return 0 if file <path> exists and write permission is granted.
  -r  return 0 if file <path> exists and read permission is granted.
  -z  return 0 if file <path> is zero bytes in size, else return 1.

Example:

$ hdfs dfs -test -f hdfs://localhost:8020/test1.txt

Command: cat

Help:

$ hdfs dfs -help cat
-cat [-ignoreCrc] <src> ...

  Fetch all files that match the file pattern <src> and display their content on stdout.

Example:

$ hdfs dfs -cat hdfs://localhost:8020/test1.txt

Command: text

Help:

$ hdfs dfs -help text
-text [-ignoreCrc] <src> ...

  Takes a source file and outputs the file in text format.
  The allowed formats are zip and TextRecordInputStream and Avro.

Example:

$ hdfs dfs -text hdfs://localhost:8020/test1.txt

Command: head

Help:

$ hdfs dfs -help head
-head <file>

  Show the first 1KB of the file.

Example:

$ hdfs dfs -head hdfs://localhost:8020/test1.txt

Command: tail

Help:

$ hdfs dfs -help tail
-tail [-f] <file>

  Show the last 1KB of the file.

  -f  Shows appended data as the file grows.

Example:

$ hdfs dfs -tail hdfs://localhost:8020/test1.txt
$ hdfs dfs -tail -f hdfs://localhost:8020/test1.txt

Command: mkdir

Help:

$ hdfs dfs -help mkdir
-mkdir [-p] <path> ...

  Create a directory in specified location.

  -p  Do not fail if the directory already exists

Example:

$ hdfs dfs -mkdir hdfs://localhost:8020/test
$ hdfs dfs -mkdir -p hdfs://localhost:8020/abc/abc1

Command: rmdir

Help:

$ hdfs dfs -help rmdir
-rmdir [--ignore-fail-on-non-empty] <dir> ...

  Removes the directory entry specified by each directory argument, provided it is empty.

Example:

$ hdfs dfs -rmdir hdfs://localhost:8020/test
$ hdfs dfs -rmdir --ignore-fail-on-non-empty hdfs://localhost:8020/abc

Command: put

Help:

$ hdfs dfs -help put
-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>

  Copy files from the local file system into fs.
  Copying fails if the file already exists, unless the -f option is given.

  -p  Preserves access and modification times, ownership and the mode.
  -f  Overwrites the destination if it already exists.
  -l  Allow DataNode to lazily persist the file to disk.
      Forces replication factor of 1.
      This option will result in reduced durability.
      Use with care.
  -d  Skip creation of temporary file(<dst>._COPYING_).

Example:

$ hdfs dfs -put /home/hadoop/test1.txt hdfs://localhost:8020/abc/

Command: get

Help:

$ hdfs dfs -help get
-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>

  Copy files that match the file pattern <src> to the local name.
  <src> is kept.
  When copying multiple files, the destination must be a directory.

  -f  overwrites the destination if it already exists.
  -p  preserves access and modification times, ownership and the mode.

Example:

$ hdfs dfs -get hdfs://localhost:8020/abc/test1.txt /home/hadoop/test1-copy.txt

Command: copyFromLocal

Help:

$ hdfs dfs -help copyFromLocal
-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>

  Copy files from the local file system into fs. Copying fails if the file already exists, unless the -f option is given.

  -p  Preserves access and modification times, ownership and the mode.
  -f  Overwrites the destination if it already exists.
  -t <thread count>  Number of threads to be used, default is 1.
  -l  Allow DataNode to lazily persist the file to disk.
      Forces replication factor of 1.
      This option will result in reduced durability.
      Use with care.
  -d  Skip creation of temporary file(<dst>._COPYING_).

Example:

$ hdfs dfs -copyFromLocal /home/hadoop/test1.txt hdfs://localhost:8020/abc/abc1

Command: copyToLocal

Help:

$ hdfs dfs -help copyToLocal
-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>

  Identical to the -get command.

Example:

$ hdfs dfs -copyToLocal hdfs://localhost:8020/abc/test1.txt /home/hadoop/test1-copy.txt

Command: moveFromLocal

Help:

$ hdfs dfs -help moveFromLocal
-moveFromLocal <localsrc> ... <dst>

  Same as -put, except that the source is deleted after it's copied.

Example:

$ hdfs dfs -moveFromLocal /home/hadoop/test1-copy.txt hdfs://localhost:8020/

Command: cp

Help:

$ hdfs dfs -help cp
-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>

  Copy files that match the file pattern <src> to a destination.
  When copying multiple files, the destination must be a directory.

  -p  preserves status [topax] (timestamps, ownership, permission, ACLs, XAttr).
     If -p is specified with no <arg>, then preserves timestamps, ownership, permission.
     If -pa is specified, then preserves permission also because ACL is a super-set of permission.
  -f  overwrites the destination if it already exists.
      raw namespace extended attributes are preserved:
        if (1) they are supported (HDFS only)
        and, (2) all of the source and target pathnames are in the /.reserved/raw hierarchy.
      raw namespace xattr preservation is determined solely by the presence (or absence) of the /.reserved/raw prefix and not by the -p option.
  -d  will skip creation of temporary file(<dst>._COPYING_).

Example:

$ hdfs dfs -cp hdfs://localhost:8020/test1.txt hdfs://localhost:8020/test2.txt

Command: mv

Help:

$ hdfs dfs -help mv
-mv <src> ... <dst>

  Move files that match the specified file pattern <src> to a destination <dst>.
  When moving multiple files, the destination must be a directory.

Example:

$ hdfs dfs -mv hdfs://localhost:8020/test1-copy.txt hdfs://localhost:8020/test1-copy-new.txt

Command: rm

Help:

$ hdfs dfs -help rm
-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...

  Delete all files that match the specified file pattern.
  Equivalent to the Unix command "rm <src>"

  -f  If the file does not exist, do not display a diagnostic message or modify the exit status to reflect an error.
  -[rR]  Recursively deletes directories.
  -skipTrash  option bypasses trash, if enabled, and immediately deletes <src>.
  -safely  option requires safety confirmation, if enabled, requires confirmation before deleting large directory with more than <hadoop.shell.delete.limit.num.files> files.
           Delay is expected when walking over large directory recursively to count the number of files to be deleted before the confirmation.

Example:

$ hdfs dfs -rm hdfs://localhost:8020/test1.txt
$ hdfs dfs -rm -r hdfs://localhost:8020/abc

Command: expunge

Help:

$ hdfs dfs -help expunge
-expunge

  Delete files from the trash that are older than the retention threshold.

Example:
```
$ hdfs dfs -expunge
```

Command: appendToFile

Help:

$ hdfs dfs -help appendToFile
-appendToFile <localsrc> ... <dst>

  Appends the contents of all the given local files to the given dst file.
  The dst file will be created if it does not exist.
  If <localSrc> is -, then the input is read from stdin (click enter to confirm your input + ctrl c to exit).

Example:

$ hdfs dfs -appendToFile /home/hadoop/test1.txt hdfs://localhost:8020/test2.txt
$ hdfs dfs -appendToFile - hdfs://localhost:8020/test2.txt

Command: truncate

Help:

$ hdfs dfs -help truncate
-truncate [-w] <length> <path> ...

  Truncate all files that match the specified file pattern to the specified length.

  -w  Requests that the command wait for block recovery to complete, if necessary.

Example:

$ hdfs dfs -truncate 6 hdfs://localhost:8020/test1.txt

Command: touch

Help:

$ hdfs dfs -help touch
-touch [-a] [-m] [-t TIMESTAMP ] [-c] <path> ...

  Updates the access and modification times of the file specified by the <path> to the current time.
  If the file does not exist, then a zero length file is created at <path> with current time as the timestamp of that <path>.

  -a  Change only the access time
  -m  Change only the modification time
  -t TIMESTAMP  Use specified timestamp (in format yyyyMMddHHmmss) instead of current time
  -c  Do not create any files

Example:

$ hdfs dfs -touch hdfs://localhost:8020/test1.txt

Command: touchz

Help:

$ hdfs dfs -help touchz
-touchz <path> ...

  Creates a file of zero length at <path> with current time as the timestamp of that <path>.
  An error is returned if the file exists with non-zero length

Example:

$ hdfs dfs -touchz hdfs://localhost:8020/test1.txt

Command: df

Help:

$ hdfs dfs -help df
-df [-h] [<path> ...]

  Shows the capacity, free and used space of the filesystem.
  If the filesystem has multiple partitions, and no path to a particular partition is specified,
  then the status of the root partitions will be shown.

  -h  Formats the sizes of files in a human-readable fashion rather than a number of bytes.

Example:

$ hdfs dfs -df -h hdfs://localhost:8020/

Command: du

Help:

$ hdfs dfs -help du
-du [-s] [-h] [-v] [-x] <path> ...

  Show the amount of space, in bytes, used by the files that match the specified file pattern.

  -s  Rather than showing the size of each individual file that matches the pattern, shows the total (summary) size.
  -h  Formats the sizes of files in a human-readable fashion rather than a number of bytes.
  -v  option displays a header line.
  -x  Excludes snapshots from being counted.

  Note that, even without the -s option, this only shows size summaries one level deep into a directory.

  The output is in the form
      size    disk space consumed    name(full path)

Example:

$ hdfs dfs -du -s -h -v hdfs://localhost:8020/

Command: chmod

Help:

$ hdfs dfs -help chmod
-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...

  Changes permissions of a file.
  This works similar to the shell’s chmod command with a few exceptions.

  -R  modifies the files recursively. This is the only option currently supported.
  <MODE>  Mode is the same as mode used for the shell’s command.
                The only letters recognized are 'rwxXt', e.g. +t,a+r,g-w,+rwx,o=r.
  <OCTALMODE>  Mode specifed in 3 or 4 digits.
                     If 4 digits, the first may be 1 or 0 to turn the sticky bit on or off, respectively.
                     Unlike the shell command, it is not possible to specify only part of the mode, e.g. 754 is same as u=rwx,g=rx,o=r.

  If none of 'augo' is specified, 'a' is assumed and unlike the shell command, no umask is applied.

Example:

$ hdfs dfs -chmod 755 hdfs://localhost:8020/test1.txt

Command: chown

Help:

$ hdfs dfs -help chown
-chown [-R] [OWNER][:[GROUP]] PATH...

  Changes owner and group of a file.
  This is similar to the shell’s chown command with a few exceptions.

  -R  modifies the files recursively.
      This is the only option currently supported.

  If only the owner or group is specified, then only the owner or group is modified.
  The owner and group names may only consist of digits, alphabet, and any of [-_./@a-zA-Z0-9].
  The names are case sensitive.

  WARNING: Avoid using '.' to separate user name and group though Linux allows it.
  If user names have dots in them and you are using local file system,
  you might see surprising results since the shell command 'chown' is used for local files.

Example:

$ hdfs dfs -chown hadoop hdfs://localhost:8020/test1.txt
$ hdfs dfs -chown hadoop:hadoop hdfs://localhost:8020/test1.txt
$ hdfs dfs -chown :hadoop hdfs://localhost:8020/test1.txt

Command: chgrp

Help:

$ hdfs dfs -help chgrp
-chgrp [-R] GROUP PATH...

  This is equivalent to -chown ... :GROUP ...

Example:

$ hdfs dfs -chgrp hadoop hdfs://localhost:8020/test1.txt

Command: setfacl

Help:

$ hdfs dfs -help setfacl
-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]

  Sets Access Control Lists (ACLs) of files and directories.

  -b  Remove all but the base ACL entries.
      The entries for user, group and others are retained for compatibility with permission bits.
  -k  Remove the default ACL.
  -R  Apply operations to all files and directories recursively.
  -m  Modify ACL.
      New entries are added to the ACL, and existing entries are retained.
  -x  Remove specified ACL entries. Other ACL entries are retained.
  --set  Fully replace the ACL, discarding all existing entries.
         The <acl_spec> must include entries for user, group, and others for compatibility with permission bits.
  <acl_spec>  Comma separated list of ACL entries.
  <path>  File or directory to modify.

  Note: You might get this error:
        setfacl: The ACL operation has been rejected.
        Support for ACLs has been disabled by setting dfs.namenode.acls.enabled to false.
        Edit "${hadoop}/etc/hadoop/hdfs-site.xml" and add the follwing:
        <property>
          <name>dfs.namenode.acls.enabled</name>
          <value>true</value>
        </property>

Example:

$ hdfs dfs -setfacl -m user:mtitek:r-- hdfs://localhost:8020/test1.txt
$ hdfs dfs -setfacl -x user:mtitek hdfs://localhost:8020/test1.txt
$ hdfs dfs -setfacl --set user:mtitek:rw-,user::rwx,group::r--,other::r-- hdfs://localhost:8020/test1.txt

Command: getfacl

Help:

$ hdfs dfs -help getfacl
-getfacl [-R] <path>

  Displays the Access Control Lists (ACLs) of files and directories.
  If a directory has a default ACL, then getfacl also displays the default ACL.

  -R  List the ACLs of all files and directories recursively.
  <path>  File or directory to list.

Example:

$ hdfs dfs -getfacl hdfs://localhost:8020/test1.txt

Command: setfattr

Help:

$ hdfs dfs -help setfattr
-setfattr {-n name [-v value] | -x name} <path>

  Sets an extended attribute name and value for a file or directory.

  -n name  The extended attribute name.
  -v value  The extended attribute value.
            There are three different encoding methods for the value.
            If the argument is enclosed in double quotes, then the value is the string inside the quotes.
            If the argument is prefixed with 0x or 0X, then it is taken as a hexadecimal number.
            If the argument begins with 0s or 0S, then it is taken as a base64 encoding.
  -x name  Remove the extended attribute.
  <path>  The file or directory.

Example:

$ hdfs dfs -setfattr -n "user.attr1" -v "attr1v1" hdfs://localhost:8020/test1.txt

Command: getfattr

Help:

$ hdfs dfs -help getfattr
-getfattr [-R] {-n name | -d} [-e en] <path>

  Displays the extended attribute names and values (if any) for a file or directory.

  -R  Recursively list the attributes for all files and directories.
  -n name  Dump the named extended attribute value.
  -d  Dump all extended attribute values associated with pathname.
  -e <encoding>  Encode values after retrieving them.Valid encodings are "text","hex", and "base64".
                       Values encoded as text strings are enclosed in double quotes (”),
                       and values encoded as hexadecimal and base64 are prefixed with 0x and 0s, respectively.
  <path>  The file or directory.

Example:

$ hdfs dfs -getfattr -d hdfs://localhost:8020/test1.txt

Command: getmerge

Help:

$ hdfs dfs -help getmerge
-getmerge [-nl] [-skip-empty-file] <src> <localdst>

  Get all the files in the directories that match the source file pattern
  and merge and sort them to only one file on local
  fs. <src> is kept.

  -nl  Add a newline character at the end of each file.
  -skip-empty-file  Do not add new line character for empty file.

Example:

$ hdfs dfs -getmerge "hdfs://localhost:8020/*.txt" /home/hadoop/mergefile1.txt

Command: setrep

Help:

$ hdfs dfs -help setrep
-setrep [-R] [-w] <rep> <path> ...

  Set the replication level of a file.
  If <path> is a directory
  then the command recursively changes the replication factor of all files under the directory tree rooted at <path>.
  The EC files will be ignored here.

  -w  It requests that the command waits for the replication to complete.
      This can potentially take a very long time.
  -R  It is accepted for backwards compatibility.
      It has no effect.

Example:

$ hdfs dfs -setrep 1 hdfs://localhost:8020/test1.txt

Command: createSnapshot

Help:

$ hdfs dfs -help createSnapshot
-createSnapshot <snapshotDir> [<snapshotName>]

  Create a snapshot on a directory.

  Note: You need to allow snapshots and enables the directory to become snapshottable.
  hdfs dfsadmin -allowSnapshot path

Example:

hdfs dfs -createSnapshot hdfs://localhost:8020/abc abcshn

Command: renameSnapshot

Help:

$ hdfs dfs -help renameSnapshot
-renameSnapshot <snapshotDir> <oldName> <newName>

  Rename a snapshot from oldName to newName.

Example:

hdfs dfs -renameSnapshot hdfs://localhost:8020/abc abcshn abcshnnew

Command: deleteSnapshot

Help:

$ hdfs dfs -help deleteSnapshot
-deleteSnapshot <snapshotDir> <snapshotName>

  Delete a snapshot from a directory.

Example:

hdfs dfs -deleteSnapshot hdfs://localhost:8020/abc abcshnnew