关于后端:hdfs-file-system-shell的简单使用

@[TOC]

此处咱们通过命令行，简略的学习一下 hdfs file system shell 的一些操作。

咱们能够通过如下网址 https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#appendToFile 来看看反对的命令操作。其中大部分命令都和 linux 的命令用法相似。

# 操作本地文件系统
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls file:///
Found 19 items
dr-xr-xr-x   - root root      24576 2023-02-18 14:47 file:///bin
dr-xr-xr-x   - root root       4096 2022-06-13 10:41 file:///boot
drwxr-xr-x   - root root       3140 2023-02-28 20:17 file:///dev
......
# 操作 hdfs 文件系统
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls hdfs://hadoop01:8020/
Found 1 items
drwxrwx---   - hadoopdeploy supergroup          0 2023-02-19 17:20 hdfs://hadoop01:8020/tmp
# 操作 hdfs 文件系统 fs.defaultFS
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /
Found 1 items
drwxrwx---   - hadoopdeploy supergroup          0 2023-02-19 17:20 /tmp
[hadoopdeploy@hadoop01 ~]$

文件名	内容
1.txt	aaa
2.txt	bbb
3.txt	ccc

语法： Usage: hadoop fs -mkdir [-p] <paths>
-p示意，如果父目录不存在，则创立父目录。

[hadoopdeploy@hadoop01 sbin]$ hadoop fs -mkdir -p /bigdata/hadoop
[hadoopdeploy@hadoop01 sbin]$

语法： Usage: hadoop fs -put [-f] [-p] [-d] [-t <thread count>] [-q <thread pool queue size>] [- | <localsrc> ...] <dst>
-f 如果指标文件曾经存在，则进行笼罩操作
-p 保留拜访和批改工夫、所有权和权限
-d 跳过._COPYING_ 的临时文件
-t 要应用的线程数，默认为 1。上传蕴含 1 个以上文件的目录时很有用
-q 要应用的线程池队列大小，默认为 1024。只有线程数大于 1 时才失效

# 创立 3 个文件 1.txt 2.txt 3.txt
[hadoopdeploy@hadoop01 ~]$ echo aaa > 1.txt
[hadoopdeploy@hadoop01 ~]$ echo bbb > 2.txt
[hadoopdeploy@hadoop01 ~]$ echo ccc > 3.txt
# 上传本地的 1.txt 到 hdfs 的 /bigdata/hadoop 目录中
[hadoopdeploy@hadoop01 ~]$ hadoop fs -put -p 1.txt /bigdata/hadoop
# 因为 /bigdata/hadoop 中曾经存在了 1.txt 所有上传失败
[hadoopdeploy@hadoop01 ~]$ hadoop fs -put -p 1.txt /bigdata/hadoop
put: `/bigdata/hadoop/1.txt': File exists
# 通过 -f 参数，如果指标文件曾经存在，则进行笼罩操作
[hadoopdeploy@hadoop01 ~]$ hadoop fs -put -p -f 1.txt /bigdata/hadoop
# 查看 /bigdata/hadoop 目录中的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 1 items
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1.txt
# 通过多线程和 通配符 上传多个文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -put -p -f -t 3 *.txt /bigdata/hadoop
# 查看 /bigdata/hadoop 目录中的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 3 items
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/3.txt

语法： Usage: hadoop fs -ls [-h] [-R] <paths>
-h 展现成人类可读的，比方文件的大小，展现成多少 M 等。
-R 递归展现。

# 列出 /bigdata 目录和文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/
Found 1 items
drwxr-xr-x   - hadoopdeploy supergroup          0 2023-02-28 12:37 /bigdata/hadoop
# -R 递归展现
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls -R /bigdata/
drwxr-xr-x   - hadoopdeploy supergroup          0 2023-02-28 12:37 /bigdata/hadoop
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/3.txt
# -h 展现成人类可读的，比方多少 k, 多少 M 等
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls -R -h /bigdata/
drwxr-xr-x   - hadoopdeploy supergroup          0 2023-02-28 12:37 /bigdata/hadoop
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/3.txt

语法： Usage: hadoop fs -cat [-ignoreCrc] URI [URI ...]
-ignoreCrc 禁用 checkshum 验证
留神： 如果文件比拟大，须要谨慎读取，因为这是查看文件的全部内容

# 查看 1.txt 和 2.txt 的文件内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat -ignoreCrc /bigdata/hadoop/1.txt /bigdata/hadoop/2.txt
aaa
bbb
[hadoopdeploy@hadoop01 ~]$

语法： Usage: hadoop fs -head URI
Displays first kilobyte of the file to stdout(显示文件的前 1000 字节)

# 查看 1.txt 的前 1000 字节
[hadoopdeploy@hadoop01 ~]$ hadoop fs -head /bigdata/hadoop/1.txt
aaa
[hadoopdeploy@hadoop01 ~]$

语法： Usage:hadoop fs -tail [-f] URI
Displays last kilobyte of the file to stdout.(显示文件的后 1000 字节)
-f：示意将随着文件的增长输入附加数据，就像在 Unix 中一样。

# 查看 1.txt 的后 1000 字节
[hadoopdeploy@hadoop01 ~]$ hadoop fs -tail /bigdata/hadoop/1.txt
aaa
[hadoopdeploy@hadoop01 ~]$

语法： Usage: hadoop fs -appendToFile <localsrc> ... <dst>
将单个 src 或多个 src 从本地文件系统附加到指标文件系统。还能够从规范输出 (localsrc 是 -) 读取输出并附加到指标文件系统。

# 查看 1.txt 文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/1.txt
aaa
# 查看 2.txt 文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/2.txt
bbb
# 将 1.txt 文件的内容追加到 2.txt 文件中
[hadoopdeploy@hadoop01 ~]$ hadoop fs -appendToFile 1.txt  /bigdata/hadoop/2.txt
# 再次查看 2.txt 文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/2.txt
bbb
aaa
[hadoopdeploy@hadoop01 ~]$

语法： Usage: hadoop fs -get [-ignorecrc] [-crc] [-p] [-f] [-t <thread count>] [-q <thread pool queue size>] <src> ... <localdst>

将文件复制到本地文件系统。能够应用 -gnrecrc 选项复制未能通过 CRC 查看的文件。能够应用 -crc 选项复制文件和 CRC。

-f 如果指标文件曾经存在，则进行笼罩操作
-p 保留拜访和批改工夫、所有权和权限
-t 要应用的线程数，默认为 1。下载蕴含多个文件的目录时很有用
-q 要应用的线程池队列大小，默认为 1024。只有线程数大于 1 时才失效

# 下载 hdfs 文件系统的 1.txt 到本地当前目录下的 1.txt.download 文件 
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get /bigdata/hadoop/1.txt ./1.txt.download
# 查看 1.txt.download 是否存在
[hadoopdeploy@hadoop01 ~]$ ls
1.txt  1.txt.download  2.txt  3.txt
# 再次下载，因为本地曾经存在 1.txt.download 文件，所有报错
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get /bigdata/hadoop/1.txt ./1.txt.download
get: `./1.txt.download': File exists
# 通过 -f 笼罩曾经存在的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get -f /bigdata/hadoop/1.txt ./1.txt.download
# 多线程下载
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get -f -t 3 /bigdata/hadoop/*.txt ./123.txt.download
get: `./123.txt.download': No such file or directory
# 多线程下载
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get -f -t 3 /bigdata/hadoop/*.txt .
[hadoopdeploy@hadoop01 ~]$

语法： Usage: hadoop fs -getmerge [-nl] [-skip-empty-file] <src> <localdst>

将多个 src 文件的内容合并到 localdst 文件中

-nl 示意在每个文件开端减少换行符
-skip-empty-file 跳过空文件

# hdfs 上 1.txt 文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/1.txt
aaa
# hdfs 上 3.txt 文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/3.txt
ccc
# 将 hdfs 上 1.txt 3.txt 下载到本地 merge.txt 文件中 -nl 减少换行符 -skip-empty-file 跳过空文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -getmerge -nl -skip-empty-file /bigdata/hadoop/1.txt /bigdata/hadoop/3.txt ./merge.txt
# 查看 merge.txt 文件
[hadoopdeploy@hadoop01 ~]$ cat merge.txt
aaa

ccc

[hadoopdeploy@hadoop01 ~]$

语法： Usage: hadoop fs -cp [-f] [-p | -p[topax]] [-t <thread count>] [-q <thread pool queue size>] URI [URI ...] <dest>

-f 如果指标文件存在则进行笼罩。
-t 要应用的线程数，默认为 1。复制蕴含多个文件的目录时很有用
-q 要应用的线程池队列大小，默认为 1024。只有线程数大于 1 时才失效

# 查看 /bigdata 目录下的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata
Found 1 items
drwxr-xr-x   - hadoopdeploy supergroup          0 2023-02-28 12:55 /bigdata/hadoop
# 查看 /bigdata/hadoop 目录下的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 3 items
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          8 2023-02-28 12:55 /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/3.txt
# 将 /bigdata/hadoop 目录下所有的文件 复制到 /bigdata 目录下
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cp /bigdata/hadoop/* /bigdata
# 查看 /bigdata/ 目录下的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata
Found 4 items
-rw-r--r--   2 hadoopdeploy supergroup          4 2023-02-28 13:17 /bigdata/1.txt
-rw-r--r--   2 hadoopdeploy supergroup          8 2023-02-28 13:17 /bigdata/2.txt
-rw-r--r--   2 hadoopdeploy supergroup          4 2023-02-28 13:17 /bigdata/3.txt
drwxr-xr-x   - hadoopdeploy supergroup          0 2023-02-28 12:55 /bigdata/hadoop
[hadoopdeploy@hadoop01 ~]$

语法： Usage: hadoop fs -mv URI [URI ...] <dest>
将文件从源挪动到指标。此命令还容许多个源，在这种状况下，指标须要是一个目录。不容许跨文件系统挪动文件。

# 列出 /bigdata/hadoop 目录下的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 3 items
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          8 2023-02-28 12:55 /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/3.txt
# 将 1.txt 重命名为 1-new-name.txt
[hadoopdeploy@hadoop01 ~]$ hadoop fs -mv /bigdata/hadoop/1.txt /bigdata/hadoop/1-new-name.txt
# 列出 /bigdata/hadoop 目录下的文件，能够看到 1.txt 曾经改名了
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 3 items
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1-new-name.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          8 2023-02-28 12:55 /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/3.txt
[hadoopdeploy@hadoop01 ~]$

语法： Usage: hadoop fs -setrep [-R] [-w] <numReplicas> <path>
更改文件的正本数。如果 path 是一个目录，则该命令递归更改以 path 为根的目录树下所有文件的正本数。执行此命令时，EC 文件将被疏忽。
-R - R 标记是为了向后兼容。它没有影响。
-w - w 标记申请命令期待复制实现。这可能须要很长时间。

# 批改 1 -new-name.txt 文件为 3 个正本
[hadoopdeploy@hadoop01 ~]$ hadoop fs -setrep -w 3 /bigdata/hadoop/1-new-name.txt
Replication 3 set: /bigdata/hadoop/1-new-name.txt
Waiting for /bigdata/hadoop/1-new-name.txt .... done
[hadoopdeploy@hadoop01 ~]$

语法： Usage: hadoop fs -df [-h] URI [URI ...]

[hadoopdeploy@hadoop01 ~]$ hadoop fs -df /bigdata/hadoop
Filesystem                   Size     Used    Available  Use%
hdfs://hadoop01:8020  27697086464  1228800  17716019200    0%
# -h 显示人类可读的
[hadoopdeploy@hadoop01 ~]$ hadoop fs -df -h /bigdata/hadoop
Filesystem              Size   Used  Available  Use%
hdfs://hadoop01:8020  25.8 G  1.2 M     16.5 G    0%

语法： Usage: hadoop fs -df [-h] URI [URI ...]

[hadoopdeploy@hadoop01 ~]$ hadoop fs -du /bigdata/hadoop
4  12  /bigdata/hadoop/1-new-name.txt
8  16  /bigdata/hadoop/2.txt
4  8   /bigdata/hadoop/3.txt
[hadoopdeploy@hadoop01 ~]$ hadoop fs -du -s /bigdata/hadoop
16  36  /bigdata/hadoop
[hadoopdeploy@hadoop01 ~]$ hadoop fs -du -s -h /bigdata/hadoop
16  36  /bigdata/hadoop
# 16 示意 /bigdata/hadoop 目录下所有文件的总大小
# 36 示意 /bigdata/hadoop 目录下所有文件占据所有正本的总大小
[hadoopdeploy@hadoop01 ~]$ hadoop fs -du -s -h -v /bigdata/hadoop
SIZE  DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS  FULL_PATH_NAME
16    36                                     /bigdata/hadoop
[hadoopdeploy@hadoop01 ~]$

[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          8 2023-02-28 12:55 /bigdata/hadoop/2.txt
# 给 2.txt 减少可执行的权限
[hadoopdeploy@hadoop01 ~]$ hadoop fs -chmod +x /bigdata/hadoop/2.txt
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop/2.txt
-rwxrwxr-x   2 hadoopdeploy hadoopdeploy          8 2023-02-28 12:55 /bigdata/hadoop/2.txt
[hadoopdeploy@hadoop01 ~]$

语法： Usage: hadoop fs -rm [-f] [-r |-R] [-skipTrash] [-safely] URI [URI ...]
如果启用了 回收站 ，文件系统会将已删除的文件挪动到垃圾箱目录。
目前，默认状况下禁用垃圾桶性能。用户能够通过为参数 fs. trash.interval（在core-site.xml 中）设置 大于零 的值来 启用 回收站。

-f 如果文件不存在，将不会显示诊断音讯或批改退出状态以反映谬误。
-R 选项递归删除目录及其下的任何内容。
-r 选项等价于 -R。
-skipTrash 选项将绕过回收站，如果启用，并立刻删除指定的文件。当须要从大目录中删除文件时，这很有用。
-safely 在删除文件总数大于 hadoop.shell.delete.limited.num.files的文件时（在 core-site.xml 中，默认值为 100）之前，须要进行平安确认

# 删除 2.txt，因为我本地启动了回收站，所以文件删除的文件进入了回收站
[hadoopdeploy@hadoop01 ~]$ hadoop fs -rm /bigdata/hadoop/2.txt
2023-02-28 22:04:51,302 INFO fs.TrashPolicyDefault: Moved: 'hdfs://hadoop01:8020/bigdata/hadoop/2.txt' to trash at: hdfs://hadoop01:8020/user/hadoopdeploy/.Trash/Current/bigdata/hadoop/2.txt
[hadoopdeploy@hadoop01 ~]$

可能有些人会说，这么多的命令，怎么记的住，如果咱们能够操作 hdfs 的界面，则能够在界面上进行操作。

1、https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#appendToFile

关于后端:hdfs-file-system-shell的简单使用

1、背景

2、hdfs file system shell 命令有哪些

3、确定 shell 操作的是哪个文件系统

4、本地筹备如下文件

5、hdfs file system shell

5.1 mkdir 创立目录

5.2 put 上传文件

5.3 ls 查看目录或文件

5.4 cat 查看文件内容

5.5 head 查看文件前 1000 字节内容

5.6 tail 查看文件后 1000 字节内容

5.7 appendToFile 追加数据到 hdfs 文件中

5.8 get 下载文件

5.9 getmerge 合并下载

5.10 cp 复制文件

5.11 mv 挪动文件

5.12 setrep 批改指定文件的正本数

5.13 df 显示可用空间

5.14 du 统计文件夹或文件的大小

5.15 chgrp chmod chown 扭转文件的所属权限

5.16 rm 删除文件或目录

6、界面操作

7、参考链接