Advancing Data Science Research for Social Good

Due to rapid development of technology, such as the Internet of Things, collecting data is easier and cheaper than ever before. As a result, municipal governments and urban centres across Canada are being inundated with data—data that have potential to improve public service. Despite this, local governments do not have enough data expertise to extract insight from these overwhelming datasets, which are often unstructured and “dirty” (i.e., incomplete, inaccurate, and/or erroneous).

Creating moments for shoppers: Impact of time on effectiveness of notifications

Over the last recent years, internet usage through mobile devices has grown rapidly and today, majority of the online traffic is coming from mobiles. This means that most of the time, a user uses his/her mobile to view the retailer website, review items or finish a purchase. Accord ingly, reta ilers have come to the idea of building a platform that would engage and re-engage users through pushing notification. However, there are multiple factors such as time of a day that a retailer would takes into account to send out notific.ation.

Shoppers Persona Analysis: Statistical Learning of Shoppers’ Behaviour

The project is to break down shoppers into different groups. Shoppers have different preferences, for instance some shoppers tend to buy online in the morning, some might prefer purchasing online at night. If one could group together shoppers based on their different shopping behaviours, one would then be able to come up with personalized sales strategy that could better serve the customers, for example the retailer could send push notification in the morning to the group whose shoppers tend to buy in the morning.

Webpage customer persona discovery and push notification guidelines

Cellphones get notifications from different companies every day, but we do not know whether these notifications have a significant impact on customers’ behaviour. Knowing the impact of these notifications would provide useful insights to marketing strategists. Since user behaviour will determine the efficacy of push notifications, this project initially aims to build a behavioural model, which will group customers based on their web site navigation behaviour.

Estimation and Prediction of Censored Arrival Processes with Censoring for Replenishable Item Purchases

The aim of the project is to predict future customer demand for repeat-buying items based on available customer purchase records. However, the purchase history for a single customer may not be sufficient to base predictions on. Also, some purchase records might be missing due to sales events at competitors’ locations. Thus, treating each customer as a replicant of the average customer and averaging inter-purchase times to predict future demand will likely be an inadequate approach.

Approximations of Exotic Option Pricing Models

In order to properly manage the risks associated with trading complex financial products, CIBC carries out computationally expensive calculations to properly assess their exposure to various market factors. To make these computations more feasible, it is of interest to find new, efficient and approximate mathematical formulas to compute such exposures. It is also important to complement these new formulas with rigorous justification of the resulting approximation error. Such justification will provide confidence to practitioners at CIBC who will use these formulas in the future.

Statistical and Physiological Beat Modelling of Seismocardiogram Signal

"Seismocardiogram (SCG) is a signal that is captured by placing an accelerometer on the human chest. This signal captures very important timing information such as opening and closing of the heart valves. In addition to these timing information, the non-invasive nature of this signal makes it an attractive solution for remote monitoring of patients with heart conditions.
The morphology of SCG signal changes depending on different types of heart conditions and diseases. A mathematical model represents the morphology of a signal in terms of certain parameters.

Deconvolution of Whole Blood Transcriptome based on mRNA-Seq data

Gene expression in blood is highly affected by the type and proportion blood cells. Therefore, cell composition needs to be taken into account when looking for signatures specific to a condition. The issue is that cell composition needs to be assessed on fresh blood, i.e. at time of blood collection. If this has not been done, the only way one can assess is by predicting it using a methodology suggested in this proposal. Therefore, if blood cell count is not available, the cell composition can be inferred from existing next generation sequencing data sets.

A statistical method for competing risk survival analysis with clustered big data

Over the last few years, the data revolution occurred with the emergence of “Big data”. In medical field, the term big data refers to large databases in terms of patients and/or information from varied sources. Nevertheless, heterogeneity is encountered in this kind of data. Indeed, data arise from different medical centers. Furthermore, we can’t perform traditional statistical methods on these large databases: major problem are multicollinearity and overfitting. Lots of regularization methods have been proposed in order to adapt classical methods. Mittal et al.

A new method for educational assessment: measuring association via LOC index

Our objective is to develop the new technics, based on the pioneers’ work, especially the Qoyyimi and Zitikis’s (2015) extension, which can have a good performance on measuring some kinds of relationship between students’ marks of subjects. In order to understand the relationship between students’ marks on different study subjects, many studies apply some kinds of indices, such as the Pearson correlation to measuring association between variables of interest in a variety of research areas, including education. We do some extension on this route.

Pages