Improving statistical literacy and accessibility of big data using dimension reduction

Information and data are more accessible and abundant than ever before, but this presents challenges in the face of rising misinformation and mistrust in science. Large-scale datasets or “big data” is particularly suited to a set of statistical analyses called dimension reduction techniques that can reduce the complexity of a dataset while preserving its information, but these analyses are currently difficult and costly for the average citizen to learn and implement. This project aims integrate dimension reduction techniques into an online platform that individuals can use to automatically run and visualize these analyses to answer questions in large, publicly available datasets. The platform will output the results using plain language and visualizations rather than test statistics to help users better understand and interpret the results. This will make analysis and exploration of big datasets more accessible, tracking with the increasing general availability of such data, and it will promote statistical literacy in the general population.

Carina Fan
Faculty Supervisor: 
Brian Levine
Partner University: