共计 1900 个字符,预计需要花费 5 分钟才能阅读完成。
源自小伙伴的求助,虽然没能定位到最终的原因,调试的过程也比较有意思
缘起
小伙伴求助我,同一个 docker 镜像在测试机器上可以运行,在阿里云上运行提示用户不存在。
在阿里云上运行提示如下:
1 | <table class="hljs-ln"><tbody><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="1"><div class="hljs-ln-n" data-line-number="1"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="1"># docker run --rm -it image:tag</td></tr><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="2"><div class="hljs-ln-n" data-line-number="2"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="2"><span class="hljs-symbol">docker:</span> <span class="hljs-keyword">Error</span> response <span class="hljs-keyword">from</span> daemon: linux spec user: unable <span class="hljs-keyword">to</span> find user www-data: no matching entries <span class="hljs-keyword">in</span> passwd file.</td></tr><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="3"><div class="hljs-ln-n" data-line-number="3"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="3">ERRO[<span class="hljs-number">0000</span>] <span class="hljs-keyword">error</span> waiting <span class="hljs-keyword">for</span> container: context canceled</td></tr></tbody></table> |
- 镜像名称统一使用 image:tag 代替,其实错误和镜像的关系不大
- 从错误描述看:应该是在
/etc/passwd
中未能找到www-data
这个用户,判断用户不存在
调试过程
换成用 root
启动,依然提示找不到用户
1 | <table class="hljs-ln"><tbody><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="1"><div class="hljs-ln-n" data-line-number="1"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="1"># docker run <span class="hljs-comment">--rm -it --user root image:tag</span></td></tr><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="2"><div class="hljs-ln-n" data-line-number="2"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="2">docker: Error response <span class="hljs-keyword">from</span> daemon: linux spec <span class="hljs-keyword">user</span>: unable <span class="hljs-keyword">to</span> find <span class="hljs-keyword">user</span> root: <span class="hljs-keyword">no</span> matching entries <span class="hljs-keyword">in</span> passwd file.</td></tr></tbody></table> |
- 看来
root
也要在/etc/passwd
里面找
换一种方式启动,错误提示变了
1 | <table class="hljs-ln"><tbody><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="1"><div class="hljs-ln-n" data-line-number="1"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="1"># docker run <span class="hljs-comment">--rm -it --user $(id -u) image:tag</span></td></tr><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="2"><div class="hljs-ln-n" data-line-number="2"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="2">docker: Error response <span class="hljs-keyword">from</span> daemon: OCI runtime <span class="hljs-keyword">create</span> failed: container_linux.go:<span class="hljs-number">348</span>: starting container process caused "exec: \"docker<span class="hljs-operator">-</span>php<span class="hljs-operator">-</span>entrypoint\": executable file not found in $PATH": unknown.</td></tr></tbody></table> |
- 看来镜像设置有 entrypoint
- 但是为什么找不到 entrypoint
换一个 entrypoint 试试看
1 | <table class="hljs-ln"><tbody><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="1"><div class="hljs-ln-n" data-line-number="1"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="1"># docker run --rm -it --user $(id -u) --entrypoint<span class="hljs-comment">'ls' image:tag</span></td></tr><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="2"><div class="hljs-ln-n" data-line-number="2"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="2"><span class="hljs-symbol">docker:</span> <span class="hljs-keyword">Error</span> response <span class="hljs-keyword">from</span> daemon: OCI runtime create failed: container_linux.go:<span class="hljs-number">348</span>: starting container process caused <span class="hljs-string">"exec: \"</span>ls\<span class="hljs-string">": executable file not found in $PATH"</span>: unknown.</td></tr></tbody></table> |
-
ls
也找不到?那用/bin/ls
试试看
1 | <table class="hljs-ln"><tbody><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="1"><div class="hljs-ln-n" data-line-number="1"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="1"># docker run --rm -it --user $(id -u) --entrypoint<span class="hljs-comment">'/bin/ls' image:tag</span></td></tr><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="2"><div class="hljs-ln-n" data-line-number="2"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="2"><span class="hljs-symbol">docker:</span> <span class="hljs-keyword">Error</span> response <span class="hljs-keyword">from</span> daemon: OCI runtime create failed: container_linux.go:<span class="hljs-number">348</span>: starting container process caused <span class="hljs-string">"exec: \"</span>/bin/ls\<span class="hljs-string">": stat /bin/ls: no such file or directory"</span>: unknown.</td></tr></tbody></table> |
- 这次错误提示换了,找不到
/bin/ls
- 怀疑是文件系统错误,整个
/
下的文件都找不到
把 /bin/ls
挂载到容器内试试
1 | <table class="hljs-ln"><tbody><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="1"><div class="hljs-ln-n" data-line-number="1"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="1"># docker run --rm -it --user $(id -u) -v<span class="hljs-string">'/bin/ls'</span>:<span class="hljs-string">'/bin/ls'</span>--entrypoint<span class="hljs-string">'/bin/ls'</span> image:tag</td></tr><tr><td class="hljs-ln-line hljs-ln-numbers" data-line-number="2"><div class="hljs-ln-n" data-line-number="2"></div></td><td class="hljs-ln-line hljs-ln-code" data-line-number="2">standard_init_linux.<span class="hljs-keyword">go</span>:<span class="hljs-number">190</span>: exec user process caused <span class="hljs-string">"no such file or directory"</span></td></tr></tbody></table> |
- 基本可以确定是 docker 内文件系统挂了
山穷水尽
暂时没找到办法进一步的追踪。通过 docker inspect
和docker history
均看不出镜像的异常。
通过 docker logs
也看不到容器启动中的其他错误。
柳暗花明
别的小伙伴帮忙找到了这个 issue:Error response from daemon: OCI runtime create failed – when running a Node.js Docker image
虽然错误类型不太一致,发现我一直忘记查看 docker daemon 的日志!!!!
通过 journalctl -fu docker.service
查看错误日志,发现和 issue 中的错误一致。
1 | <span class="hljs-meta">...</span> <span class="language-python">level=error msg=<span class="hljs-string">"stream copy error: reading from a closed fifo"</span></span> |
可能是 docker 的一个未修复的 BUG。
TODO
为何 --user root
时会查找 passwd
文件,--user $(id -u)
可以跳过 passwd
文件
正文完