You are here

General document | A Review of Machine Learning in Software Vulnerability Research

Abstract

Searching for and identifying vulnerabilities in computer software has a long and rich history, be that for preventative or malicious purposes. In this paper, we investigate the use of Machine Learning (ML) techniques in Software Vulnerability Research (SVR), discussing previous and current efforts to illustrate how ML is utilised by academia and industry in this area. We find that the primary focus is not only on discovering new approaches, but on helping SVR practitioners by simplifying and automating their processes. Considering the variety of applications already in evidence, we believe ML will continue to provide assistance to SVR in the future as new areas of use are explored and improved algorithms to enhance existing functionality become available.

Executive Summary

Computer software, like any other product, may contain faults. Some may be benign, others can have serious implications to the operation and security of the systems they are deployed on. Flaws in software that can be exploited for malicious purposes by an attacker belong to the class of software vulnerabilities. Research into computer software vulnerabilities has a long and rich history, be that for preventative or malicious purposes, and has increased relevance to Defence as it tries to maintain a high level of information assurance of its systems.

In this report we review the use of Machine Learning (ML) techniques in Software Vulnerability Research (SVR), and discuss previous and current efforts to illustrate how ML is utilised by the various researchers in this area. It is intended as an accessible introduction to those not yet fully immersed in both fields of study and to also encourage the identification of and further research into the problems fitting the specific goals of the reader. For this reason, the description of individual articles is kept short to highlight only the techniques employed and their intended purposes. A short introduction to basic concepts and techniques of both Software Vulnerability Research and Machine Learning is also provided.

By conducting this review, we find that the primary benefit of ML in SVR is not only in the discovery of new approaches, but also in helping SVR practitioners by simplifying and automating their processes. Considering the variety of applications already in evidence, we conclude that ML will continue to provide assistance to SVR in the future as new areas of use are explored and improved algorithms to enhance existing functionality become available.

Key information

Author

Tamas Abraham and Olivier de Vel

Publication number

DST-Group-GD-0979

Publication type

General document

Publish Date

October 2017

Classification

Unclassified - public release

Keywords

software vulnerability research, machine learning, source code analysis, binary code analysis, computer security, software security, program analysis