Controllable Generation of Text - Towards generating monographs - QC-146
Preferred Disciplines: Machine Learning, Computational Linguistics, Natural Language Processing, Artificial Intelligence, Deep Learning, Cognitive Science, Abstractive Summarization, Narrative Generation, Extracted Associations (Master, PhD, Post-Doc)
Company: Marketmuse Inc.
Project Length: 24-36 months (6 units)
Desired start date: As soon as possible
Location: Montreal, Quebec
No. of Positions: 3
Preferences: No preferences
- To be added
The Oxford English Dictionary defines monograph as "a detailed written study of a single specialized subject or an aspect of it". While the word in context is a bit aspirational, MarketMuse already provides the tools for non-expert human writers to create such documents. We aim for our tools to support the quality and the comprehensiveness of a monograph.
Our users can of course choose their length and depth, and most use cases are currently for blog posts, web pages, and whitepapers. Yet even these can take a while to write even if somebody has a great outline or content plan to work from. M4 therefore sets its sights on machine-generating drafts of genuinely useful and informative content, for content marketers to then edit and adapt, bringing together the strengths of humans and AI. M4 was created to disrupt outmoded paradigms in writing.
We will crawl and then walk and then run. We'll be able to parameterize the tone, the intent, conceptual distribution, sophistication, the length, and other features of custom-generated content. Generating content will enable people to learn better and to communicate faster than they ever could before. In the future, the technologies we create here can be used in educational tools for custom lesson plans, assistive agents, and fun ways of learning one's own unknown unknowns.
MarketMuse has already built up differentiated capabilities that we're well-positioned to leverage on this road to content generation, feeding these neural models with our strong existing and evolving distributional semantics, knowledge graph generation, recursive discourse-aware topic models, and content plan generation capabilities and augmenting them with attention mechanisms, embeddings of various kinds, and creative conditional neural architectures per our plans, will enable us to generate comprehensive monographs.
Therefore our main research areas will be:
Natural language generation
- Our focus will be researching neural network architectures capable of generating long, coherent, comprehensive and controllable text. Architectures and techniques we will particularly research include, but are not limited to, hierarchical latent variable encoder-decoder models, conditional generative adversarial networks, variational autoencoders, sequence-to-sequence models, recursive & recurrent neural networks, neural memory networks, Long short-term memory networks and attention mechanisms.
- Controllable generation of text requires us to model and represent the plethora of nuances human language exhibits. Robust and disentangled representations from discourse structure down to word & phrase semantics will be paramount for our success in generating long, coherent and comprehensive text. Therefore we will accompine our NLG research with inquiries into the geometry of semantics within semantic vector space models as well as techniques to learn representations at the discourse, document, paragraph & sentence level. Ultimately we will combine those representations with our neural natural language generation architectures to generate high quality text.
Background and required skills
Automatic generation of long, coherent and comprehensive monographs
- controlled generation of sentences
- controlled generation of paragraphs
- controlled generation of complete monographs
- “controlled” meaning that we want to control the conceptual distribution (semantics), tone, intent, sophistication of the generated text
- NLG supportive representation learning including, but not limited to, manifold and predicate learning in semantic vector space models, disentangled representation learning, specific embedding spaces for representing hierarchies and NLG suitable discourse representation techniques
- To be defined
Expertise and Skills Needed:
- Programming skills (Python, Java, etc.) and extensive experience with NN training frameworks like PyTorch, Tensorflow, etc.
- Experience with generative deep learning techniques in a text, language, or NLP domain
- Experience with attention mechanisms, conditional language models, and RNNs
- Knowledge of information theory, multivariate calculus, and multilinear algebra solid enough to specify novel relevant deep learning and machine learning algorithms
- Extensive experience with algebraic embedding spaces/distributional semantics
- Thorough understanding of differential geometry and algebra
For more info or to apply to this applied research position, please
- Check your eligibility and find more information about open projects.
- Interested students need to get approval from their supervisor and send their CV along with a link to their supervisor’s university webpage by applying through the webform or directly to Gabriel Garcia-Curiel at ggarciacuriel(a)mitacs.ca.