Novel Corrective and Training Procedures for Neural Network Compliance
In AI safety, compliance ensures that a model adheres to operational specifications at runtime to avoid adverse events for the end user. This proposal looks at obtaining model compliance in two ways: (i) applying corrective measures to a non-compliant Machine Learning (ML) model and (ii) ensuring compliance throughout the model’s training process. We aim to achieve the first via removal of gradient information related to features involved in biasing the model. For the second, we look at incorporating constraints from the world of satisfiability to enforce desired specifications into gradient based methods used for training. Real-world applications for both approaches include bias and robustness compliance in the world of finance via, for example, ensuring that loans are handed out to clients in fair ways.