Skip to content

rtfobj: bug with regex on Python 3 (unicode instead of bytes) #692

@decalage2

Description

@decalage2

some regex are not explicitly typed as bytes, so they are unicode strings for Python 3. This causes an exception when scanning RTF data, which is bytes. This happens for OLE2Link objects:

rtfobj 0.60 on Python 2.7.18 - http://decalage.info/python/oletools
===============================================================================
File: 'sample_with_external_link_to_doc.rtf' - size: 50810 bytes
---+----------+---------------------------------------------------------------
id |index     |OLE Object
---+----------+---------------------------------------------------------------
0  |00002A8Fh |format_id: 2 (Embedded)
   |          |class name: 'OLE2Link'
   |          |data size: 2560
   |          |MD5 = 'a8f34530b8f91fc93ef5113f4be1601a'
   |          |CLSID: 88D96A0C-F192-11D4-A65F-0040963251E5
   |          |SAX XML Reader 6.0 (msxml6.dll)
   |          |Possibly an exploit for the OLE2Link vulnerability (VU#921560,
   |          |CVE-2017-0199)
   |          |URL extracted: https://raw.githubusercontent.com/decalage2/olet
   |          |ools/master/tests/test-data/msodde/harmless-clean.doc
---+----------+---------------------------------------------------------------

c:\>py -3 rtfobj.py sample_with_external_link_to_doc.rtf
rtfobj 0.60 on Python 3.9.0 - http://decalage.info/python/oletools
===============================================================================
File: 'sample_with_external_link_to_doc.rtf' - size: 50810 bytes
---+----------+---------------------------------------------------------------
id |index     |OLE Object
---+----------+---------------------------------------------------------------
Traceback (most recent call last):
  File "rtfobj.py", line 1085, in <module>
    main()
  File "rtfobj.py", line 1080, in main
    process_file(container, filename, data, output_dir=options.output_dir,
  File "rtfobj.py", line 927, in process_file
    found_list =  re.findall(r'[a-fA-F0-9\x0D\x0A]{128,}',data)
  File "C:\Program Files\Python39\lib\re.py", line 241, in findall
    return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object

Solution: make regex byte strings

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions