Related projects
Discover more projects across a range of sectors and discipline — from AI to cleantech to social innovation.
All data-driven solutions today start with the ingestion (input) of data. Typically that data is messy and unlabelled. However, downstream consumers of data benefit from well-labelled data. Data labelling (assigning categories, data types, privacy and sensitivity tags, source characteristics, etc.) is usually an error-prone, time-consuming, manual effort. There are no readily available off-the-shelf tools that perform reliable data labelling today. This project aims to design and build a configurable, scalable, automated tool for classifying data fields given a data source. The automated tool will be a software product that generates label(s) for given input data sources and data fields including at least information such as entity type, entity context, and privacy/sensitivity tags, and does so using natural language processing (NLP) in conjunction with a heuristic-based expert rule system. The tool developed as part of this project will provide the partner organisation with a competitive edge in the data science/ML/AI market, helping it grow its customer base by attracting companies and organisations that do not have the technical skillset to build their own data processing systems, which in turn will lead to increased revenue for the partner organisation.
Mark Chignell
Scribble Data Inc.
Computer science
Information and cultural industries
University of Toronto
Business Strategy Internship
Discover more projects across a range of sectors and discipline — from AI to cleantech to social innovation.
Find the perfect opportunity to put your academic skills and knowledge into practice!
Find ProjectsThe strong support from governments across Canada, international partners, universities, colleges, companies, and community organizations has enabled Mitacs to focus on the core idea that talent and partnerships power innovation — and innovation creates a better future.