惊!Rsync 文件同步竟遭遇失败?手把手教你排查,告别同步烦恼!
2024-11-08 14:11:38
Jinyu
273
问题
rsync客户端:抛错异常退出
发布失败截图如下:
# rsync -avz --delete --exclude='.git' --exclude='.svn' rsync://<rsync_srv>:<rsync_port>/path/to/folder /tmp/rsync-test
receiving incremental file list
...
rsync: read error: Connection reset by peer (104)
rsync error: error in rsync protocol data stream (code 12) at io.c(759) [receiver=3.0.6]
rsync: connection unexpectedly closed (99 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) [generator=3.0.6]
rsync客户端:进程僵死
# rsync -avzP --delete--exclude='.git' --exclude='.svn' rsync://<rsync_srv>:<rsync_port>/path/to/folder /tmp/rsync-test
Password:
receiving incremental file list
./
<rsync-test-pkg>-SNAPSHOT.jar
^C
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(551) [generator=3.0.9]
rsync error: received SIGUSR1 (code 19) at main.c(1298) [receiver=3.0.9]
执行第二,甚至第三次时,才成功:
# rsync -avzP --delete--exclude='.git' --exclude='.svn' rsync://<rsync_srv>:<rsync_port>/path/to/folder /tmp/rsync-test
Password:
receiving incremental file list
./
<rsync-test-pkg>-SNAPSHOT.jar
60606801 100% 16.13MB/s0:00:03 (xfer#1, to-check=0/3)
sent 21035 bytesreceived 43167123 bytes5758421.07 bytes/sec
total size is 60607326speedup is 1.40
网上搜了下,发现已经有人发现 rsync 类似问题了,引用其博客:
尽管您可能已经在rsyncd服务的后端进程中设置了--timeout选项(即在rsyncd.conf配置中),然而,在某些情况下(under the circumstances),这个选项可能根本不起作用,一些极不稳定的网络导致大量TCP超时连接,进而导致 rsync 进程失败,虽然断裂的 TCP 连线已经消失,但 rsync 应用进程却可能因为种种原因(如因等候I/O中断而处于不可中断状态),而遗留在系统之中,并最终变成为僵尸进程(zombie process)。
按照其手册页的解释,rsync 命令本身的 timeout 预设为0,也就是没有逾时设置,因此运行中的 rsync 进程将会永久地等待远端的反应。在rsyncd服务后端进程的 rsyncd.conf中设置timeout选项,同时在rsync客户端命令行中使用timeout选项,实践证明是可杜绝此问题的。
--timeout
参数,再次执行后确实能够异常退出了:# rsync -avzP --timeout=60 --delete--exclude='.git' --exclude='.svn' rsync://<rsync_srv>:<rsync_port>/path/to/folder /tmp/rsync-testPassword: receiving incremental file list./<rsync-test-pkg>-SNAPSHOT.jar2011425333% 19.18MB/s0:00:02[receiver] io timeout after 60 seconds -- exitingrsync error: timeout in data send/receive (code 30) at io.c(140) [receiver=3.0.9]rsync: connection unexpectedly closed (115 bytes received so far) [generator]rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9]
rsync服务端:异常日志
2018/10/26 14:40:30 [4228] name lookup failed for <rsync-client>: Name or service not known
2018/10/26 14:40:30 [4228] connect from UNKNOWN (<rsync-client>)
2018/10/26 14:40:30 [4228] rsync on path/to/folder from UNKNOWN (<rsync-client>)
2018/10/26 14:40:30 [4228] building file list
2018/10/26 14:40:35 [4228] rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Connection timed out (110)
2018/10/26 14:40:35 [4228] rsync error: error in rsync protocol data stream (code 12) at io.c(1525) [sender=3.0.6]
rsync客户端:strace排查
lstat("<rsync-test-pkg>-SNAPSHOT.jar", 0x7fff7d6e0000) = -1 ENOENT (No such file or directory)select(5, [4], [3], [3], {30, 0}) = 2 (in [4], out [3], left {29, 999998})select(5, [4], [], NULL, {30, 0}) = 1 (in [4], left {29, 999999})read(4, "34", 8184)= 4write(3, "2671103240", 26) = 26select(5, [4], [], NULL, {30, 0}./<rsync-test-pkg>-SNAPSHOT.jar) = 0 (Timeout)8.79MB/s0:00:02select(5, [4], [], NULL, {30, 0}) = 0 (Timeout)select(5, [4], [], NULL, {30, 0}[receiver] io timeout after 60 seconds -- exitingrsync error: timeout in data send/receive (code 30) at io.c(140) [receiver=3.0.9]) = 1 (in [4], left {28, 559609})--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15649, si_status=30, si_utime=20, si_stime=5} ---wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 30}], WNOHANG, NULL) = 15649wait4(-1, 0x7fff7d6e11e4, WNOHANG, NULL) = -1 ECHILD (No child processes)rt_sigreturn()= 1read(4, "", 8184) = 0write(2, "rsync: connection unexpectedly c"..., 77rsync: connection unexpectedly closed (115 bytes received so far) [generator]) = 77write(2, " ", 1) = 1rt_sigaction(SIGUSR1, {SIG_IGN, [], SA_RESTORER, 0x7f1e73cca670}, NULL, 8) = 0rt_sigaction(SIGUSR2, {SIG_IGN, [], SA_RESTORER, 0x7f1e73cca670}, NULL, 8) = 0getpid()= 15648kill(15649, SIGUSR1)= -1 ESRCH (No such process)write(2, "rsync error: error in rsync prot"..., 89rsync error: error in rsync protocol data stream (code 12) at io.c(605) [generator=3.0.9]) = 89write(2, " ", 1) = 1exit_group(12)= ?+++ exited with 12 +++
问题根源:网络质量
而我们在与发布系统在同一机房网络环境下,抓到的包是这样的:
TCP协议
# wget http://<sitename>/<rsync-test-pkg>-SNAPSHOT.jar
--2018-10-26 10:57:45--http://<sitename>/<rsync-test-pkg>-SNAPSHOT.jar
Resolving <sitename> (<sitename>)... 10.20.51.127
Connecting to <sitename> (<sitename>)|10.20.51.127|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 60606801 (58M) [application/java-archive]
Saving to: ‘<rsync-test-pkg>-SNAPSHOT.jar’
100%[==================================================================================================================================================================>] 60,606,8019.81MB/s in 5.2s
2018-10-26 10:57:50 (11.1 MB/s) - ‘<rsync-test-pkg>-SNAPSHOT.jar’ saved [60606801/60606801]
但对其流量抓包,发现仍然有丢包的现象: