Xiang Gao

Automated Software Vulnerability Repair

As every developer knows, maintaining software and fixing software bugs are difficult and extremely time-consuming. Researchers have proposed automated patch generation techniques to help developers fix software bugs. However, it is very hard to ensure that automatically generated patches reflect the intent of developers. This project aims to efficiently generate high-quality program patches.

Fix2Fit & VulnFix are techniques that generate patches by combining test generation and patch generation. Existing repair techniques take a test suite as correctness criterion, which may lead to overfitting patches, where the patched programs pass given tests but still fail on tests outside them. Fix2Fit/VulnFix alleviates the overfitting problem via an intelligent test generation to filter out overfitting patches. Fix2Fit/VulnFix provides greater confidence about the correctness of our suggested patches.

ExtractFix presents a repair method that fixes program vulnerabilities based on semantic reasoning. Given a vulnerability as evidenced by an exploit, ExtractFix extracts a constraint representing the vulnerability. The extracted constraint then serves as a proof obligation that our synthesized patch should satisfy. Semantic reasoning in terms of the extracted constraint ensures generated patches completely fix the vulnerabilities.

F1X presents a repair method which fixes program bugs efficiently. In its core, F1X proposes a partition-based exploration strategy to efficiently explore the candidate search space. F1X generates more repairs and finds repairs faster compared with state-of-the-art techniques.

Program Synthesis for Program Transformation

During program development or maintenance, developers usually perform some repetitive code edits, such as boilerplate code edits (e.g., equality comparisons or constructors), code refactorings (e.g., rename class, extract method), and quick fixes (e.g., fix possible NullReferenceException). To automate these edits, tool builders implement code transformations that manipulate the Abstract Syntax Tree (AST) of the user’s code to produce the desired code edit. The aim of this project is to automatically synthesize high-quality program transformation rules using examples of code edits.

Semi-supervised synthesis is a technique that can automatically synthesize high-quality program transformation rules. Different from traditional techniques that synthesis rule from concrete edits that are instances of the general transformation, our approach also exploits access to additional inputs (program subtrees) that are marked as positive or negative depending on whether the transformation applies on those inputs. This feature enables us to be vastly more effective in successfully predicting edits with significantly lesser amounts of past edit data.

FixMorph is a technique that performs automated patch backporting in Linux. The Linux Kernel developers often fix Kernel bugs by introducing a patch into the mainline version of the Linux kernel source tree. However, the patch should also be “backported” to one or more of these older kernel versions. FixMorph synthesizes a transformation rule according to the patch from the mainline version and automatically backports the patch to old stable versions by applying the learned rules.

Software Engineering for Artificial Intelligence (SE4AI)

Due to poor interpretability, large number of parameters, huge data requirements, and poor reliability, AI model, as the core of intelligent software systems, suffers from poor reusability, high testing overhead, and high security risks in development, testing, and deployment. AI models are considered as "Software 2.0". We target the above problems from the perspective of software engineering. we aim to apply software engineering techniques/notions to AI model engineering to improve the models' usability and robustness.

CNNSpliter & SeaM are techniques that re-engineer a trained DNN model to improve its reusability and security. Specifically, given a target problem and a trained model, CNNSpliter/SeaM searches for the model’s weights that are relevant to the target problem. The re-engineered model that only retains the relevant weights is then reused to solve the target problem, which could reduce the reuse overhead and vulnerability inheritance rate.

Sensei is a technique that re-purposes software testing methods, specifically mutation-based fuzzing, to augment the training data of DNNs, with the objective of enhancing their robustness. Sensei casts the DNN data augmentation problem as an optimization problem. It uses genetic search to generate the most suitable variant of input data to use for training the DNN. Sensei improves the robust accuracy of the DNN, compared to the state of the art, by up to 11.9% and 5.5% on average.