Automated transaction classification using machine learning algorithm

The procurement process of an organization is key to understand company costs. Organizations gather large amounts of data coming from different sources (e.g. income statement, balance sheet, general ledger lines). This information is heterogeneous in nature as it is a mix of unstructured and structured data. Moreover, it needs to be cleaned and consolidated in a taxonomy to enable category management. The objective is to group like-to-like items and/or services into categories from Supply Market Analysis point of view and consider category management for the holistic spend. Supervised and unsupervised machine learning algorithm seemed to be natural choices for this kind of problem because of the nature of the available data. PwC has already a first iteration of a classification product, dubbed SAM (Spend Analysis Machine) and it is based on supervised learning for text classification on general ledger accounts and supplier characteristics. 

Charles Ashby-Léporé, Jean Hounkpe
Faculty Supervisor: 
Maciej Augustyniak, Manuel Morales
Project Year: