Friday, August 29, 2014

How ASLR exposed an old bug

Recently I was investigating an issue where OpenAFS server processes where crashing on start-up if ASLR (Address Space Layout Randomization) is enabled. All of them were crashing in the same place. Initially I enabled ASLR globally and restarted AFS services:
$ sxadm enable -c model=all aslr
$ svcadm restart ms/afs/server
This resulted in core files from all daemons, let's look at one of them:
$ mdb core.fileserver.1407857175
Loading modules: [ libc.so.1 libnvpair.so.1 ld.so.1 ]
> ::status
debugging core file of fileserver (64-bit) from XXXX
file: /usr/afs/bin/fileserver
initial argv: /usr/afs/bin/fileserver -p 123 -pctspare 20 -L -busyat 50 -rxpck 2000 -rxbind
threading model: native threads
status: process terminated by SIGSEGV (Segmentation Fault), addr=ffffffffb7a94b20
> ::stack
libc.so.1`memset+0x31c()
afsconf_OpenInternal+0x965()
afsconf_Open+0x125()
main+0x118()
0x44611c()
>
All the other daemons crashed in the same place. Let's take a closer look at the core.
> afsconf_OpenInternal+0x965::dis
afsconf_OpenInternal+0x930: movl $0x0,0xfffffffffffffce4(%rbp)
afsconf_OpenInternal+0x93a: movq 0xfffffffffffffce0(%rbp),%r9
afsconf_OpenInternal+0x941: movq -0x8(%rbp),%r8
afsconf_OpenInternal+0x945: movq %r9,0x18(%r8)
afsconf_OpenInternal+0x949: movq -0x8(%rbp),%rdi
afsconf_OpenInternal+0x94d: movl $0x0,%eax
afsconf_OpenInternal+0x952: call +0x1619 
afsconf_OpenInternal+0x957: movq -0x8(%rbp),%rdi
afsconf_OpenInternal+0x95b: movl $0x0,%eax
afsconf_OpenInternal+0x960: call +0x8ecb <_afsconf_loadrealms>
afsconf_OpenInternal+0x965: movl $0x0,-0x24(%rbp)
afsconf_OpenInternal+0x96c: jmp +0x6 
afsconf_OpenInternal+0x96e: nop
afsconf_OpenInternal+0x970: jmp +0x2 
afsconf_OpenInternal+0x972: nop
afsconf_OpenInternal+0x974: movl -0x24(%rbp),%eax
afsconf_OpenInternal+0x977: leave
afsconf_OpenInternal+0x978: ret
0x4e4349: nop
0x4e434c: nop
ParseHostLine: pushq %rbp
>
It actually crashes in _afsconf_LoadRealms(), we need a little bit more debug info:
$ truss -u a.out -u :: -vall /usr/afs/bin/fileserver $args
...
/1: -> _afsconf_LoadRealms(0x831790290, 0x1, 0x3, 0x0, 0x5bbbe8, 0x8317952bc)
/1: -> libc:malloc(0x28, 0x1, 0x3, 0x0, 0x28, 0x8317952bc)
...
/1: <- -="" 0x8317965c0="" libc:malloc=""> libc:memset(0x317965c0, 0x0, 0x28, 0x0, 0x28, 0x8317952bc)
/1: Incurred fault #6, FLTBOUNDS %pc = 0x7FFD55C802CC
/1: siginfo: SIGSEGV SEGV_MAPERR addr=0x317965C0
/1: Received signal #11, SIGSEGV [default]
/1: siginfo: SIGSEGV SEGV_MAPERR addr=0x317965C0
It fails just after first malloc() followed by memset() in _afsconf_LoadRealms(), the corresponding source code is:
local_realms = malloc(sizeof(struct afsconf_realms));
if (!local_realms) {
code = ENOMEM;
goto cleanup;
}
memset(local_realms, 0, sizeof(struct afsconf_realms));
queue_Init(&local_realms->list);
The code looks fine... but notice in the above truss output that memset() is using a different pointer to what malloc() returned. Might be a bug in truss but since this is where it crashes it is probably real. Let's confirm it with other tool and maybe we can also spot some pattern.
$ dtrace -n 'pid$target::_afsconf_LoadRealms:entry{self->in=1}' \
  -n 'pid$target::memset:entry/self->in/{printf("%p %d %d", arg0, arg1, arg2);}' \
  -n 'pid$target::malloc:entry/self->in/{trace(arg0);}' \
  -n 'pid$target::malloc:return/self->in/{printf("%p, %p", arg0,arg1);}' \
  -c "/usr/afs/bin/fileserver $args"

CPU ID FUNCTION:NAME
3 99435 malloc:entry 40
3 99437 malloc:return 54, c62324a50
3 99433 memset:entry 62324a50 0 40

$ dtrace -n 'pid$target::_afsconf_LoadRealms:entry{self->in=1}' \
  -n 'pid$target::memset:entry/self->in/{printf("%p %d %d", arg0, arg1, arg2);}' \
  -n 'pid$target::malloc:entry/self->in/{trace(arg0);}' \
  -n 'pid$target::malloc:return/self->in/{printf("%p, %p", arg0,arg1);}' \
  -c "/usr/afs/bin/fileserver $args"

CPU ID FUNCTION:NAME
3 99435 malloc:entry 40
3 99437 malloc:return 54, 10288d120
3 99433 memset:entry 288d120 0 40

$ dtrace -n 'pid$target::_afsconf_LoadRealms:entry{self->in=1}' \
  -n 'pid$target::memset:entry/self->in/{printf("%p %d %d", arg0, arg1, arg2);}' \
  -n 'pid$target::malloc:entry/self->in/{trace(arg0);}' \
  -n 'pid$target::malloc:return/self->in/{printf("%p, %p", arg0,arg1);}' \
  -c "/usr/afs/bin/fileserver $args"

CPU ID FUNCTION:NAME
3 99435 malloc:entry 40
3 99437 malloc:return 54, de9479a10
3 99433 memset:entry ffffffffe9479a10 0 40
It looks like the lowest 4 bytes in the pointer returned from malloc() and passed to memset() are always preserved, while the top 4 bytes are mangled. I was curious how it looks like when ASLR is disabled:
$ elfedit -e 'dyn:sunw_aslr disable' /usr/afs/bin/fileserver
$ dtrace -n 'pid$target::_afsconf_LoadRealms:entry{self->in=1}' \
  -n 'pid$target::_afsconf_LoadRealms:return{self->in=0}' \
  -n 'pid$target::memset:entry/self->in/{printf("%p %d %d", arg0, arg1, arg2);}' \
  -n 'pid$target::malloc:entry/self->in/{trace(arg0);}' \
  -n 'pid$target::malloc:return/self->in/{printf("%p, %p", arg0,arg1);}' \
  -c "/usr/afs/bin/fileserver $args"

1 99436 malloc:entry 40
1 99438 malloc:return 54, 5bd170
1 99434 memset:entry 5bd170 0 40
... [ it continues as it doesn't crash ]
Now the pointer passed to memset() is the same as the one returned from malloc() - notice however that it is 32bit (all daemons are compiled as 64bit). Let's have a look at the core again where it actually fails:
_afsconf_LoadRealms+0x59: call -0xa7c36 
_afsconf_LoadRealms+0x5e: movl %eax,%eax
_afsconf_LoadRealms+0x60: cltq
_afsconf_LoadRealms+0x62: movq %rax,%r8
_afsconf_LoadRealms+0x65: movq %r8,-0x20(%rbp)
_afsconf_LoadRealms+0x69: movq -0x20(%rbp),%r8
_afsconf_LoadRealms+0x6d: cmpq $0x0,%r8
_afsconf_LoadRealms+0x71: jne +0xd <_afsconf_loadrealms x80="">
_afsconf_LoadRealms+0x73: movl $0xc,-0x18(%rbp)
_afsconf_LoadRealms+0x7a: jmp +0x1f5 <_afsconf_loadrealms x274="">
_afsconf_LoadRealms+0x7f: nop
_afsconf_LoadRealms+0x80: movl $0x28,-0x48(%rbp)
_afsconf_LoadRealms+0x87: movl $0x0,-0x44(%rbp)
_afsconf_LoadRealms+0x8e: movq -0x48(%rbp),%r8
_afsconf_LoadRealms+0x92: movq %r8,%rdx
_afsconf_LoadRealms+0x95: movl $0x0,%esi
_afsconf_LoadRealms+0x9a: movq -0x20(%rbp),%rdi
_afsconf_LoadRealms+0x9e: movl $0x0,%eax
_afsconf_LoadRealms+0xa3: call -0xa7d10 

Bingo! See the movl and cltq instructions just after returning from malloc(). This means that malloc() is returning a 64bit address but compiler thinks it returns a 32bit address, so it clears the top 4 bytes and then expands the pointer back to 64 bits and this is what is being passed to memset(). With ASLR disabled it just happens we get a low address that the lowest 4 bytes are enough to address it so we don't get the issue, with ASLR most of the time we end up with much higher address where you can't just chop of the top four bytes.

Compilers do it if they have an implicit function declaration and then they assume the return is an int which on x86_64 means 32 bits. The fix was trivial - all that was required was to add #include <stdlib.h> and recompile - now compiler knows that malloc() returns 64 bit pointer, the movl, cltq instructions are gone and we get no more crashes.

Friday, August 08, 2014

Monitoring Dropped UDP packets

I needed to monitor incoming UDP packets being dropped by kernel and report some basic information on each of them (src and dst IP:port is enough). There are multiple probes which one could us, particularly ip:::drop-in is the best candidate, however it is not fully implemented so it doesn't provide all the same arguments other probes in IP provider have (this seems easy to fix). We can intercept ip_drop_input() function though (which is where the ip::drop-in probe is located as well) and juggle pointers to get what we want - once the drop-in probe is fixed the below script will be shorter and easier to understand. 
# ./udp_dropped_packets.d
2014 Aug  6 14:32:14 dropped UDP packet [udpIfStatsInOverflows] from 10.10.1.4:52537 to 10.10.2.9:7001
2014 Aug  6 14:32:15 dropped UDP packet [udpIfStatsInOverflows] from 10.10.1.4:52537 to 10.10.2.9:7001
2014 Aug  6 14:32:17 dropped UDP packet [l2 multicast not forwarded] from 10.10.3.11:137 to 10.10.3.9:137

Notice that the script prints reason why a packet was dropped, if you are interested only in drops due to lack of buffer space then you need to grep the output for udpIfStatsInOverflows.

The script has only been tested on Solaris 11.1 and 11.2 but it probably works on 11.0 and Illumos based distros as well.

# cat udp_dropped_packets.d
#!/usr/sbin/dtrace -qa

/* Robert Milkowski */

fbt::ip_drop_input:entry
/ ((ipha_t *)args[1]->b_rptr)->ipha_protocol == IPPROTO_UDP /
{
  this->ipha = (ipha_t *)args[1]->b_rptr;
  this->ip_src = inet_ntoa(&this->ipha->ipha_src);
  this->ip_dst = inet_ntoa(&this->ipha->ipha_dst);

  this->iph_hdr_length = (int)(this->ipha->ipha_version_and_hdr_length & 0xF) << 2;
  this->udpha = (udpha_t *)((char *)this->ipha + this->iph_hdr_length);
  this->src_port = ntohs(this->udpha->uha_src_port);
  this->dst_port = ntohs(this->udpha->uha_dst_port);

  printf("%Y dropped UDP packed [%s] from %s:%d to %s:%d\n", walltimestamp,
          stringof arg0,
          this->ip_src, this->src_port,
          this->ip_dst, this->dst_port
        );
}