Better Code Representation for Machine Learning

Jenni, Raphael Better Code Representation for Machine Learning. Better Code Representation for Machine Learning.

Warning
There is a more recent version of this item available.
Full text not available from this repository.

Abstract

Using machine learning for code becomes more and more common. Different approaches based on paths or BERT are available. This paper focuses on improving parts of the input vector by creating a more compact embedding. Furthermore, it explores and discusses ways to reduce the amount of data inserted into a model when working with code changes. The results presented in this paper show that it is possible to reduce the input data into a latent space, cutting it to half the input data size, representing differences and similarities between code paths in a very compact way while still maintaining an accuracy of 99%. Moreover, it is shown that with proper preprocessing, it is possible to reduce the amount of data inserted into a code changes model by around 84%.

Item Type: Article
Subjects: Area of Application > Development Tools
Technologies > Programming Languages
Divisions: Master of Advanced Studies in Software Engineering
Depositing User: Users 59836 not found.
Contributors:
Contribution
Name
Email
Thesis advisor
Bläser, Luc
UNSPECIFIED
Date Deposited: 19 Sep 2022 07:38
Last Modified: 19 Sep 2022 07:38
URI: https://eprints.ost.ch/id/eprint/1065

Available Versions of this Item

Actions (login required)

View Item
View Item