Skip to content

Conversation

mlodic
Copy link
Contributor

@mlodic mlodic commented May 12, 2020

Hi @decalage2 and thank you again for your amazing work with tool.

Last days I was working on some XLM macros and I found out that you were aware of the problems with this kind of old macros (ex #556 ). I found out that your recent commits solved this problem when launching the tool via command line but I was still blocked when using VBA_Parser as a library.

For instance, this sample https://www.virustotal.com/gui/file/553d2198eeb77a376e3fbc9af9f40d8ad2960dd6f5fc88f2146c8665765629fd/detection contains an encrypted XLM macro.

When calling
vbaparser = VBA_Parser(filepath)
I would analyze only the encrypted file. I did not find an available way to check via VBA_Parser if the file is encrypted and extract and analyze the decrypted file and, in this way, the real dangerous macro.

So I added in VBA_Parser two simple methods:

  • detect_is_encrypted to detect if the file is encrypted
  • decrypt_file to decrypt the file if encrypted

In this way I can call:
decrypted_file_name = vbaparser.decrypt_file()

and if the file is extracted:
vbaparser = VBA_Parser(decrypted_file_name)

If you are ok with these changes, please merge and pin a patch version so I can use it in my tools.
Thank you very much again!

@mlodic
Copy link
Contributor Author

mlodic commented May 19, 2020

Hi @decalage2, regarding the issue #415, today we found out new malicious XLSM documents that leverage XLM macros.

Reference:

Using the most recent version available, the tool says that there are no macros in the documents.
I found that this is because the function open_openxml is looking for OLE objects only.
However, while looping for subfiles extracted from the archive, we can check if one of the XML file contains a macro by looking for the XML tag <xm:macrosheet>. In that case we can use VBAParser for this file too. The result is awesome and even related IOCs are extracted.

I added a new commit to the previous one to solve this problem too. Please give me feedback when possible. Thank you

@mlodic mlodic changed the title added "detect_is_encrypted" and "decrypt_file" methods in VBA_Parser improvements to analysis to XLM macros (encrypted ones + contained in XLSM) May 19, 2020
@mlodic mlodic changed the title improvements to analysis to XLM macros (encrypted ones + contained in XLSM) improvements to analysis of XLM macros (encrypted ones + contained in XLSM) May 19, 2020
@decalage2 decalage2 self-requested a review May 19, 2020 15:34
@decalage2 decalage2 self-assigned this May 19, 2020
@decalage2 decalage2 added this to the oletools 0.56 milestone May 19, 2020
@mlodic
Copy link
Contributor Author

mlodic commented Jun 17, 2020

Hi @decalage2, I have added a new commit that allows olevba to detect template injection in OpenXML documents. I found several documents in the wild still using this method to avoid detection. The URL used to download the template from a remote location is correctly extracted as an IOC.

Reference:

@mlodic mlodic changed the title improvements to analysis of XLM macros (encrypted ones + contained in XLSM) improvements to analysis of XLM macros (encrypted ones + contained in XLSM) + template injection Jun 17, 2020
@mlodic
Copy link
Contributor Author

mlodic commented Jul 21, 2020

I have added the commit from my colleague's pull request #591 plus I have changed how the detection is handled for the CALL and REGISTER functions.

During his research, he found out that olevba incorrectly detects those functions for some samples. The tool states: Could contain following functions: <function name>. To avoid misunderstanding, I changed the detection to clearly state that, if that string is matched, it does not mean that the "CALL" or the "REGISTER" functions were found but they only "could" be there.

Samples:

Output:
' 0006 40 FORMULA : Cell Formula - R45899C114 len=18 ptgNameV *INCOMPLETE FORMULA PARSING* Could contain following functions: EXEC - Remaining, unparsed expression: bytearray(b'n\x00\x00\x00\x17\x06\x00pLBrGQ\x0eA\xac\x00')

' 0006 31 FORMULA : Cell Formula - R56495C91 len=9 ptgNameV *INCOMPLETE FORMULA PARSING* Could contain following functions: REGISTER - Remaining, unparsed expression: bytearray(b'\x95\x00\x00\x00B\x01v\x80')

Actual Detection:
|Suspicious|EXEC |May run an executable file or a system | | | |command using Excel 4 Macros (XLM/XLF) | |Suspicious|REGISTER |May call a DLL using Excel 4 Macros (XLM/XLF)|

Proposed new detection:

|Suspicious|Could contain       |Could contain a function that allows to run  |
|          |following functions:|an executable file or a system command using |
|          |EXEC                |Excel 4 Macros (XLM/XLF)                     |
|Suspicious|Could contain       |Could contain a function that allows to call |
|          |following functions:|a DLL using Excel 4 Macros (XLM/XLF)         |
|          |REGISTER            |                                             |

@mlodic
Copy link
Contributor Author

mlodic commented Sep 28, 2020

any update on this?

@@ -811,6 +828,14 @@ def __init__(self, stream_path, variable, expected, value):
),
'May run an executable file or a system command on a Mac (if combined with libc.dylib)':
('system', 'popen', r'exec[lv][ep]?'),
'May run an executable file or a system command using Excel 4 Macros (XLM/XLF)':
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the change in lines 831-838, could you please explain why those regexes are needed, and why there are two keyword entries for EXEC and for REGISTER that look similar?

Copy link
Contributor Author

@mlodic mlodic Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid the case when the output of oletools is:

|Suspicious|EXEC                |May run an executable file or a system       |
|          |                    |command using Excel 4 Macros (XLM/XLF)       |
|Suspicious|REGISTER            |May call a DLL using Excel 4 Macros (XLM/XLF)|

but, actually, there are no REGISTER or EXEC keywords called.

In recent XLM macro-based samples those keywords are extracted from this example code:

' 0006     40 FORMULA : Cell Formula - R45899C114 len=18 ptgNameV  *INCOMPLETE FORMULA PARSING* Could contain following functions: EXEC - Remaining, unparsed expression: bytearray(b'n\x00\x00\x00\x17\x06\x00pLBrGQ\x0eA\xac\x00')
' 0006     31 FORMULA : Cell Formula - R56495C91 len=9 ptgNameV  *INCOMPLETE FORMULA PARSING* Could contain following functions: REGISTER - Remaining, unparsed expression: bytearray(b'\x95\x00\x00\x00B\x01v\x80')

They are a false positive based on a bad parsing of the available output. Actually there is a RUN instead of those keywords.

With this change, we avoid oletools to show those detections that can mislead analysts. Oletools will show only REGISTER and EXEC detections when they are not followed by that phrase, so when they are effectively extracted from the code. Therefore it will explicitly say that the match is "Could contain following functions: EXEC" and not "EXEC" only. I hope to have been clear enough

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have samples that I can check, which trigger this issue with EXEC/REGISTER? (I don't have VT, please upload here in a zip with password, or link to any.run, hybrid-analysis or similar)
Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That behavior is caused by plugin_biff here.

This is a sample you can check with v0.55 and v0.56: https://app.any.run/tasks/921d3825-a2f7-4152-9b25-6dd9adb1cd49

Copy link
Owner

@decalage2 decalage2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a lot of different changes in a single PR, so not easy to approve everything at once, but OK.
Thank you for the improvements!

@mlodic
Copy link
Contributor Author

mlodic commented Sep 29, 2020

Thank you @decalage2. I aggregated all of our findings of last months in a single PR. I know that it could be hard to understand everything. Please let me know if we can help in any way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants