β

并发引起的诡异 Bug 一枚

Dutor 108 阅读

  Tair 又 core dump 了。

Core was generated by `sbin/tair_server -f etc/dataserver.conf'.
Program terminated with signal 11, Segmentation fault.
(gdb) f
#0  0x000000000048229b in request_processor::process () at request_processor.cpp:1340
        req->r->opacket = resp;
(gdb) p/a req
$36 = 0x2aadba4d0d80
(gdb) p/a resp
$37 = 0x2aadba4d7e60
(gdb) p/a req->r
$38 = 0x2aacf3986da0
(gdb) p/a &req->r->opacket
$39 = 0x2aacf3986de0
(gdb) disassemble $rip-9, +15
Dump of assembler code from 0x482292 to 0x4822a1:
   # load address of resp to rax
x0000000000482292 <request_processor::process()+98>:       mov    0x10(%rsp),%rax
   # load address of req->r to rdx
x0000000000482297 <request_processor::process()+103>:      mov    0x20(%r13),%rdx
   # assign address of resp to req->r->opacket
=> 0x000000000048229b <request_processor::process()+107>:      mov    %rax,0x40(%rdx)
x000000000048229f <request_processor::process()+111>:      mov    0x10(%rsp),%rdi
End of assembler dump.
(gdb) x/a $rsp+0x10
x4f56fcd0:     0x2aadba4d7e60 # address of resp
(gdb) p/a $rax
$40 = 0x2aadba4d7e60 # address of resp
(gdb) p/a $r13 # address of req
$41 = 0x2aadba4d0d80
(gdb) x/a $r13+0x20 # address of req->r
x2aadba4d0da0: 0x2aacf3986da0
(gdb) p/a $rdx
$42 = 0x0

  又是一个 Segmentation falt,core 在一个赋值操作, req->r->opacket = resp; 按照惯例,req, req->r 或者 req->r->opacket 指向的地址应该是非法的,但查看这些地址,却全都是合法的地址。查看汇编代码,程序 core 在指令 mov %rax, 0x40(%rdx) 处,%rdx 内容为 NULL,即 req->r 为 NULL!%rdx 的值是从 %r13 + 0x20 处取得的,而该处的值是 0x2aacf3986da0,不是 NULL!
  只有一种可能:最初从 (%r13+0x20),即 req->r 取出的值(到 %rdx)是 NULL,在访问 0x40(%rdx) 之前,req->r 又被复制为非 NULL。那就是并发问题了。
  类似这种诡异的现象,可能还会遇到 assert(var != 0) 失败,但 var 却是非 0 的情况。
  遇到难以置信的 bug,就想想并发。

作者:Dutor
熟读而精思,循序而渐进,厚积而薄发。
原文地址:并发引起的诡异 Bug 一枚, 感谢原作者分享。

发表评论