Automatic Understanding of the Semantics of Source Code For IdentifyingSensitive Code Fragments
Source code is what programmers write as instructions to the computer to execute to complete a desired task. All operating systems and applications on a computer or a mobile device is a runnable version of a compiled source code. Experienced programmers can easily browse and understand source code in different programming languages because they have the necessary technical background that is not available for every-day users. Those experienced programmers can identify parts of source code that are of interest (e.g., can make the program run 10x faster if improved) or pose a threat (e.g., if reverse-engineered can expose cleints’ personal information). With billions of lines of code available within private companies or in public code repositories, it is not scalable to ask experienced programmers to identify these parts of code. This project targets finding a way to automatically detect such parts using machine learning.