Meta-information Extraction from Large-scale Streaming News for Entity-level Media Intelligence and Reporting

Gnowit is an Ottawa-based information services company that employs artificial intelligence and machine learning to automate the process of monitoring web sources at scale to provide real-time briefings and notifications for the purposes of competitive intelligence, evidence-based policy research and media monitoring. The company currently monitors more than 40 thousand web sources and generates atleast 1.2 million fully analysed documents daily. Gnowit’s customers currently only employ traditional Boolean full-text queries and simple meta-information-based filters to extract documents that are of interest to them. The current technology allows an undesirable quantity of noise and requires substantial improvement. Our main goal is to create (i) new set of customer-facing filters based on the geographic location of news publications, genres, central topics and themes (ii) extract meta-information that can be applied to web-sources, individual articles and segments of documents and (iii) develop entity-level analytics pipelines. Applying these tags to individual sources and documents is beyond the capacity of human effort, and so could benefit from techniques from the field of natural language processing and deep learning. Additionally, we will contribute to the field of machine learning research by developing innovative methods for tackling interpretability challenges associated with deep learning models.

Faculty Supervisor:

Burak Kantarci


Haruna Isah;Tuerxun Waili


Gnowit Inc


Engineering - computer / electrical



University of Ottawa



