Skip to content

Incorrect memory view after running self-modifying code #820

@jbremer

Description

@jbremer

Hi!

We're in the process of integrating Unicorn Engine in Cuckoo Sandbox. Its first purpose is unpacking shikata ga nai-encoded (a metasploit encoder) payloads. While working on this we encountered an interesting shellcode sample that behaves incorrectly in Unicorn. Following some additional information.

First of all, find the decoding stub as follows. What's notable (and I'm quite sure also related to the bug) is that the shikata ga nai stub decodes not only the payload, but also parts of the decoder stub.
In this particular sample you'll find that the immediate operand of the loop instruction is decoded during the first xor operation and I suspect that this causes some out-of-sync issues with the tcg.

➜  cuckoo git:(master) ✗ ndisasm -b32 tests/files/shellcode/shikata/5.bin|head -n20
00000000  DBD0              fcmovnbe st0
00000002  D97424F4          fnstenv [esp-0xc]
00000006  5F                pop edi
00000007  B8E67741BC        mov eax,0xbc4177e6
0000000C  31C9              xor ecx,ecx
0000000E  B158              mov cl,0x58
00000010  31471A            xor [edi+0x1a],eax
00000013  03471A            add eax,[edi+0x1a]
00000016  83C704            add edi,byte +0x4
00000019  E213              loop 0x2e
0000001B  8BA93EDB742A      mov ebp,[ecx+0x2a74db3e]
00000021  5F                pop edi
00000022  52                push edx
00000023  91                xchg eax,ecx
00000024  1B5F00            sbb ebx,[edi+0x0]
00000027  D10C6F            ror dword [edi+ebp*2],1
0000002A  43                inc ebx
0000002B  B7A0              mov bh,0xa0
0000002D  0401              add al,0x1
0000002F  2C32              sub al,0x32

Expected output after running the shellcode may be found as follows.
expected output

The actual output may be demonstrated by the following script (running unicorn==1.0.0).

import unicorn
import unicorn.x86_const as x86

sc = (
    "dbd0d97424f45fb8e67741bc31c9b15831471a03471a83c704e2138ba93edb742"
    "a5f52911b5f00d10c6f43b7a004012c32688d43f3c7eb6a047bcfed868603ceb7"
    "48560fffb59a5da8b20872dd8f90f9ad1e901e6520b1b0fd7b1132d1f7182c363"
    "dd3c78cc9e201dd32486cd1c091a8d63ae4c024c6fe16561c8b8cf0d72b69003b"
    "adfa0ef0baa512076fde2f8c8e31a6d6b495e28dd58c4e63eacf30dc4e9bdd09e"
    "3c689a39e8c4954170424cd83bef47a0d38fa50609d5708d1720bc6ef22d2b1f0"
    "1e77ed64a22b4210ffda64e0175064e0e7460ca6d7ad862648a641aff7f0917a8"
    "e3b3eec91f12168c2a6f227b61e9d2c6db1664d5b5bf2bb3b0c8388c3cc0a0ea9"
    "c85ca43187344d08b94352419618ff394ff7d2bb777cd31102425e90423679cca"
    "c0ddb5bb2bb7124244495a4b42c95a4f4acc6ccac08bbe9b284a8a11fae2912c8"
    "b0959d08e283f51a92a2e4e44f31286ebdb2ae8efe4170e5e511b2590ed4cb993"
    "1614311fda3c8b573d46323595f0c85c5fe98bc05"
).decode("hex")

def main():
    uc = unicorn.Uc(unicorn.UC_ARCH_X86, unicorn.UC_MODE_32)
    uc.mem_map(0x1000, 0x2000)
    uc.mem_write(0x1000, sc)
    uc.reg_write(x86.UC_X86_REG_ESP, 0x2000)
    uc.emu_start(0x1000, 0, count=0x166)
    out = uc.mem_read(0x1000, 0x2000)

    for x in xrange(len(sc)):
        print "0x%08x: 0x%02x => 0x%02x" % (x, ord(sc[x]), out[x])
    print str(out)

if __name__ == "__main__":
    main()

On one hand you'll see that the remainder of the shellcode is decoded correctly, as you'll be able to find the www3.chrome-up.date string somewhere at the end of it. On the other hand, however, you'll find that the bytes just after the loop instruction aren't decoded properly. The output of the script is as follows.

0x0000000f: 0x58 => 0x58
0x00000010: 0x31 => 0x31
0x00000011: 0x47 => 0x47
0x00000012: 0x1a => 0x1a
0x00000013: 0x03 => 0x03
0x00000014: 0x47 => 0x47
0x00000015: 0x1a => 0x1a
0x00000016: 0x83 => 0x83
0x00000017: 0xc7 => 0xc7
0x00000018: 0x04 => 0x04
0x00000019: 0xe2 => 0xe2  ; the loop instruction
0x0000001a: 0x13 => 0xf5  ; the correct immediate after decoding
0x0000001b: 0x8b => 0x8b  ; definitely not the `cld` instruction (see `x64dbg` screenshot)
0x0000001c: 0xa9 => 0xa9  ; definitely not a `call` instruction
0x0000001d: 0x3e => 0x3e
0x0000001e: 0xdb => 0x00  ; interestingly enough this byte is correct!
0x0000001f: 0x74 => 0x77  ; ^ that's the first byte of the 2nd xor
0x00000020: 0x2a => 0xc1
0x00000021: 0x5f => 0xa5
0x00000022: 0x52 => 0x89  ; and so is this one (first byte, 3rd xor)
0x00000023: 0x91 => 0xeb
0x00000024: 0x1b => 0xb7

At this point I don't know much about the Unicorn internals, but I sure do hope that somebody can pick up this issue! Thanks in advance! :-)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions