Related projects
Discover more projects across a range of sectors and discipline — from AI to cleantech to social innovation.
With the fast-growing size of machine learning datasets, it has become increasingly important to store them in a reliable and distributed manner. Large scale distributed file systems such as GFS, HDFS and Amazon S3 have the capability to store large scale of data reliably. However, these distributed file systems have an intrinsic shortcoming: they provide good read/write access guarantees only for large size files, and therefore cannot efficiently handle frequent read/write operations for large number of small files. In machine learning training protocols, the ability to shuffle data points within a dataset is crucial to avoid local minima and overfitting, which requires the data points to be accessed in a random manner, preferable efficiently. The main focus of this project is to find a way to store machine learning datasets on distributed file systems while maintaining a competitive randomly reading performance for shuffling data points. TO BE CONT’D
Yashar Ganjali
Hongbo Fan
Uber Advanced Technologies Group
Computer science
Information and communications technologies
Accelerate
Discover more projects across a range of sectors and discipline — from AI to cleantech to social innovation.
Find the perfect opportunity to put your academic skills and knowledge into practice!
Find ProjectsThe strong support from governments across Canada, international partners, universities, colleges, companies, and community organizations has enabled Mitacs to focus on the core idea that talent and partnerships power innovation — and innovation creates a better future.