Motivation
Reversed Engineering
The problem was born in reversed engineering, in which we decompile the binary code to obtain the source code. In this process, two versions of function are involved: the original function before compilation
Bug Detection
Suppose we have two different versions of OpenSSL. Some functions are changed during the last update. A careless engineer made a small typo somewhere that no one noticed and the code seemed to behave nicely. But in the production environment, for some edge cases, say
KLEE: A Symbolic
The traditional way of doing this stupid
Implementation
Pipeline
TODO
Test of LLM
The aim of running the tests is to judge whether the code generated by LLM can be relied upon. An interesting thing happens now:
- Our initial task is to test if
. - We send the signatures of both functions to LLM, receiving
main.c #1. In the test, we compare it with another code written by handmain.c #2. The task now becomes judging if . - The intuitive answer is to count the number of assertion failures in the output message of
ktest-tool.
But the problem is, if the main function does not actually contain any code, assertion failure will never happen. The count of assertion error will be 0. And if
In order to make the test results more believable, we studied the output of KLEE and found some files suffixing .ktest. The number of these files implies the number of branches in our input function. As a result, if the main is empty, there will be no .ktest files in the output directory.
At least in this case the count of .ktest files make sense.
Another thing we use is the content of one .ktest file. It contains the value of the variables when certain branch is reached.
Notice that these still do not guarantee the test works for every possible scenarios but it should suffice for simple cases.