Democratizing Natural Language Processing (NLP) through self-serve models- ON-451Project type: Research
Desired discipline(s): Computer science, Mathematical Sciences
Company: Phase AI Technologies Inc.
Project Length: 4 to 6 months
Preferred start date: As soon as possible.
Language requirement: English
No. of positions: 1
Desired education level: PhD
About the company:
Phase AI is a community for data and AI professionals to connect, learn, and advance their careers. We host webinars with thought leaders from across North America, regularly blog about the latest topics in data/AI, and use custom software and NLP to match job candidates with employers. Phase AI emerged in response to the needs of new graduates and experienced data/AI professionals facing challenges upskilling and job seeking during COVID-19.
Describe the project.:
Phase AI is seeking a Natural Language Processing (NLP) Researcher to support the development of an NLP platform that automates and makes NLP more accessible to everyday businesses. This is a mature NLP research project and the student will be reporting to the data scientist running the work itself. The NLP Researcher will be required to:
- Understand existing NLP data sets and build models against them. These data sets are ready for analysis and modeling.
- Work with pre-developed AutoNLP tools that can be integrated into the existing NLP modeling pipeline.
- Phase AI already has a pipeline of models built that use standard approaches to automatically build NLP models. The researcher will be testing new research libraries, and State-of-the-Art approaches to build better and improved models.
- New models developed by the NLP Researcher will be scored against existing model performance to measure success of improvements or new architecture. We expect to use ROC/AUC and accuracy for classification tasks, but are open to exploring other approaches with the NLP Researcher.
The NLP Researcher will work with Phase AI to brainstorm new model architectures, feature engineering approaches, and pipeline changes that will maximize model performance.
More broadly, this work will require the following skillsets:
- Ability to review research literature: NLP is a fast-moving field with numerous new developments that could be tested.
- Project management: the NLP Researcher will work with a supervisor, but will be expected to manage the timelines and experiments they are running.
- Collaboration: Execute against our product roadmap in close collaboration with our technology leader who previously founded a Y-Combinator ML company and has led ML projects at McKinsey and IBM.
- Write research reports or publications: Share novel learnings and compelling experiments from your research.
We are looking for a PhD candidate with:
- Advanced Python programming experience (our entire codebase is in Python; this is an absolute must)
- Experience with advanced ML packages, including but not limited to PyTorch, Keras, Sci-Kit Learn
- Experience with NLP toolkits like Spacy, Hugging Face, NLTK, or others
- Understanding of machine learning and how to structure and evaluate NLP and ML experiments
- Understanding of ML architectures and data engineer principles is an asset (e.g., AutoML, AutoNLP, experiment design, ML pipelining)
- Understanding of broader Python packages for web development and software-as-a-service (Django, Psycopg2 for Postgres) is an asset, but not required
- Curious, thoughtful, and self-driven with an interest in gaining startup experience
- Interest and experience in publishing NLP research
- Product experience is an asset