Corporations are under a lot of scrutiny, especially when they annually release their financial reports to the government. If a corporation makes a mistake, or if an employee submits fraudulent information, or if it appears that either is the case, then they risk being asked to amend the filing by the government, which will cause their share price to suffer and force them to painstakingly redo the report at great expense. Caseware will sell software that can analyze these reports and determine if an amendment request is likely.
For this project, a data mining, visualization, and modeling technique will be developed and tested specifically for emails, using publicly available datasets. The mining will consist of gathering email and other potentially related datasets and cleaning those datasets. Cleaning will consist of removing duplicate or unnecessary information, as well as labeling data with basic information in order to ease training in the later steps. Next that data will be visualized in some form (graphs, charts, etc.) so that it may be more easily understood and a training model can be development.
Firms meet legally established accounting rules by including notes in financial statements that essentially hide key information in plain sight. That is, they contain information necessary to understand the statements, but due to the volume of notes and arcane language may be uninterpretable by even lawyers. However, these notes may be interpretable by a software application. For example, the notes in the Enron case were found to be decipherable by a human being, but only after substantial effort.