thread local caching in glibc malloc

Swing

2018-05-29

Summary

cache, pwn

thread local caching in glibc malloc

TCACHE (2.26+)

Thread local storage caching

Singly linked list in chunk’s fd (sim. to FastBins)

All sizes

Major performance enhancement

Not even trying to be secured :/

在 glibc 2.26的版本中添加了一个新的机制，这个机制名字叫 tcache ，从官网的概述来看这是一个通过创建线程缓存一定大小的 chunk 用来提升性能。目前在 ubuntu 18.04 的发行版本中已经使用了这个机制。

New structures

在这个新的机制中有两个新的结构体，tcache entry 以及tcache_perthread_struct 这两个都是很简单的结构。他们规定了一个 tcache 容器默认包含7个 chunk。

/* We overlay this structure on the user-data portion of a chunk when the chunk is stored in the per-thread cache.  */
typedef struct tcache_entry
{
  struct tcache_entry *next;
} tcache_entry;

/* There is one of these for each thread, which contains the per-thread cache (hence "tcache_perthread_struct").  Keeping overall size low is mildly important.  Note that COUNTS and ENTRIES are redundant (we could have just counted the linked list each time), this is for performance reasons.  */
typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

static __thread tcache_perthread_struct *tcache = NULL;

tcahe 的使用

Chunks can end up in the thread caches multiple ways:

upon free: before the fastbin code in _int_free, if the chunk has an appropriate size and the corresponding bin isn’t full
upon malloc, there are 3 places where caches are filled
- if a fast chunk is returned, the other chunks from the corresponding fastbin are used to fill the appropriate tcache bin.
- the same is done if a small chunk is returned by malloc.
- - in the binning code, exact size matches are first put in the tcache instead of returning immediately.

Chunks are taken from the tcache:

in __libc_malloc, before _int_malloc.
after the binning code, if at least one exact match was found.
there can also be a limit on the number chunks that are put in the tcache in a run of the binning code. If that’s reached, the last one found is returned. However, this is unlimited by default.

Some observations:

the tcache fill code in the fast path of malloc will reverse the order of the chunks.
cached chunks won’t be coalesced
2.1 neither on free of neighboring chunks
2.2 nor with top when they are freed

Pwn 中的 tcache

Tcahe 为了追求性能，舍弃了许多以前建立起来的 check 机制，这就意味这一个我们 fake 或者恶意构造破坏的 chunk 只要能符合对其和包装的检查就可以被使用。

这意味着什么呢？这其实说白了，这意味着许多 Glibc Pwn的技术变得更加简单了。比如下面的几个例子：

The House of Spirit

House of Spirit 是 the Malloc Maleficarum 中的一种技术。

该技术的核心在于在目标位置处伪造 fastbin chunk，并将其释放，从而达到分配指定地址的 chunk 的目的。在过去要想构造 fastbin fake chunk，并且将其释放时，可以将其放入到对应的 fastbin 链表中，需要绕过一些必要的检测，即

fake chunk 的 ISMMAP 位不能为1，因为 free 时，如果是 mmap 的 chunk，会单独处理。
fake chunk 地址需要对齐， MALLOC_ALIGN_MASK
fake chunk 的 size 大小需要满足对应的 fastbin 的需求，同时也得对齐。
fake chunk 的 next chunk 的大小不能小于 2 * SIZE_SZ，同时也不能大于av->system_mem 。
fake chunk 对应的 fastbin 链表头部不能是该 fake chunk，即不能构成 double free 的情况。

然而在 tcahe 的机制下只需要满足：
- 它是一个2*SIZE_SZ对齐的地址
- 其值MINSIZE在最大缓存块大小（1032/516字节）之间。
- 在x64上，它也不能有第四个LSB集。

Overlapping chunks

我们知道在 GLIBC Pwn 中 OVerlapping chunks是一个常见的利用技术。我们通常是可以通过覆盖堆的头部，比如覆盖一个已经释放的 chunk 的 size 域。在这个机制下，我们仍然也可以用这个方法。

tcache poisoning

tache中的垃圾回收机制如下：

static void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  return (void *) e;
}

tcache_get负责从tcache bin中删除块的代码。破坏chunk中的next指针tcache_entry可以返回完全任意的块。

在这个情况下，我们通过 double free 来构造一个循环的tcahe bin 也比 fast bin 的double free简单了，因为在空闲时没有对bin的第一个成员进行双重检查。

Smallbin cache filling bck write

因为 tcahe 取消了一些检查，如unlink 中bck->fd != victim 的check.这意味着

House of Lore could 能被使用 (虽然我们有更好的方法)
an uncontrolled write similar to the unsorted bck write could be achieved

tcache_perthread_structs as corruption targets

emmm 由于线程管理的不严谨，极其有可能造成一些破坏，如竞争。

Conclusion

这特么是傻逼吧！！！！！！

link

http://tukan.farm/2017/07/08/tcache/