The AI decompilers have shown promise in tests, outperforming both Ghidra and the most comparable LLM4Decompile model. However, the models are not perfect and still make mistakes. RevEng.AI is exploring several avenues for improvement, including expanding the dataset to support other languages and compilers, exploring alternative intermediate representations as model inputs, developing more sophisticated evaluation metrics, and integrating semantic correctness feedback directly into the training process.
Key takeaways:
- RevEng.AI has developed AI models that can convert low-level assembly code back into human-readable source code, a process known as decompilation, which is a key challenge in reverse engineering.
- The AI-powered decompilation approach learns patterns directly from large datasets, producing output that resembles human-written code and is not limited by predefined rules and heuristics.
- The AI decompiler outperformed traditional decompilers like Ghidra and other AI models in the HumanEval benchmark, a standard benchmark for code generation tasks.
- Future improvements for the AI decompiler include supporting more languages and compilers, exploring alternative intermediate representations as model inputs, developing better evaluation metrics, and integrating semantic correctness feedback into the training process.