Network attacks are becoming more complex every day. It is crucial that we use tools that can detect these sophisticated attacks on networks so that we can identify malicious behavior and prevent attacks and intrusions. The use of machine learning to create intrusion detection engines is great, and we need enough data to train these engines. The purpose of this project is to analyze the problems of existing public datasets and the challenges involved in finding the right machine learning techniques and settings for them.
Credit card payments are one of the most common transaction methods in our daily life, such as online shopping, e-commerce, and mobile payment. However, with the extensive usage of credit cards, numerous credit card fraud transactions occur every year and cause a huge economic loss. In order to improve the detection performance, this project proposes a transformer-based model to conduct fraud detection. The proposed transformer-based model omits convolutional or recurrent operations and relies solely on attention mechanisms to extract dependencies in the sequence dataset.
The Firefighter Problem is a deterministic, discrete-time model of the spread of a fire on the nodes of a graph. If a graph is a network where bank accounts are nodes, then an edge between two accounts is a transaction between one bank account and another. Imagine we have a suspicious bank account with suspicious transactions possibly tied to money laundering. We view this suspicious bank account as a place a fire breaks out. Then, those accounts that receive money from the suspicious bank account are considered suspect.
As financial networks grow limiting the spread of illicit funds becomes an increasingly important aspect of security. We model this spread within a network-like structure called a graph and examine how best to defend against the spread of these illicit funds. This problem has been examined in the past and we further that knowledge by restricting the defender’s ability to defend the network, thus allowing our models to behave more similarly to the real world in certain applications.
Using simplified language understandable to a layperson; provide a general, one-paragraph description of the proposed research project to be undertaken by the intern(s) as well as the expected benefit to the partner organization. (100 - 150 words) This project aims to increase image datasets by not doing experiments or collecting physical checks. Instead, the image data augmentation is implemented by generative adversarial networks (GANs), generating new images from original images using different algorithms. GANs have a generator and a discriminator.
Over the past few years, Graph Convolutional Networks (GCNs) have achieved state-of-the-art performance in machine learning tasks on graph data and have been widely applied to many real-world applications across different fields, such as traffic prediction, user behavior analysis, and fraud detection. However, networks in the real world are often with heterogeneous degree distributions, such as power-law.
At a high level, the goal of this project is to create a system for producing synthetic datasets based on real data. As a large financial crime detection firm, Verafin deals with large volumes of sensitive data which must be kept private, however they are also interested in collaborating with academics to gain new insights into their data.
Contagious diseases, such as SARS and COVID-19, bring a large amount of damage to human’s life and world economy. Pathogens spread among individuals through the contact network. It is observed that most social networks show a power-law degree distribution, implying that hubs exist in these networks. Finding underlying super-spreaders (hubs) and isolating or immunizing them can decrease the pathogen spreading dramatically.
Data collection over time is a common practice in many large organizations- including financial institutions and health care providers- often with the goal of using this data to predict future challenges and opportunities. While this data may contain valuable information, it is often unstructured, coming from different sources and recorded at different times. This lack of structure makes extracting useful information difficult, as most standard statistical and machine learning tools are designed to work with data in a fixed structure.