Motivation

Reversed Engineering

The problem was born in reversed engineering, in which we decompile the binary code to obtain the source code. In this process, two versions of function are involved: the original function before compilation f1 and the recovered function after decompilation f2 . Does it hold that f1=f2 ? In most cases, they can not look the same and they can also behave differently in some edge cases.

Bug Detection

Suppose we have two different versions of OpenSSL. Some functions are changed during the last update. A careless engineer made a small typo somewhere that no one noticed and the code seemed to behave nicely. But in the production environment, for some edge cases, say n=114514 , the program crashed! We would desire to know before we publish the new version, if the new function and the old function do the same thing or where is the improvement/drawback of the new function.

KLEE: A Symbolic

The traditional way of doing this stupid f1=f2 problem is static code analysis, but we propose here a new symbolic execution method built upon KLEE.

Implementation

Pipeline

TODO

Test of LLM

The aim of running the tests is to judge whether the code generated by LLM can be relied upon. An interesting thing happens now:

  1. Our initial task is to test if f1=f2 .
  2. We send the signatures of both functions to LLM, receiving main.c #1. In the test, we compare it with another code written by hand main.c #2. The task now becomes judging if main1=main2 .
  3. The intuitive answer is to count the number of assertion failures in the output message of ktest-tool.

But the problem is, if the main function does not actually contain any code, assertion failure will never happen. The count of assertion error will be 0. And if f1 and f2 happen to be the same. We would expect there is no assertion error. So in this case the test is incorrect!!
In order to make the test results more believable, we studied the output of KLEE and found some files suffixing .ktest. The number of these files implies the number of branches in our input function. As a result, if the main is empty, there will be no .ktest files in the output directory.
At least in this case the count of .ktest files make sense.
Another thing we use is the content of one .ktest file. It contains the value of the variables when certain branch is reached.
Notice that these still do not guarantee the test works for every possible scenarios but it should suffice for simple cases.