Collocation-focused Writing Assistance for Adult Non-Native Writers
The goal of this project is to conduct research into the development of writing assistance software for advanced adult non-native writers of English. I intend to approach the project in as language-general a way as possible so that the method can be easily extended to French. At the end of the project we expect to have prototype software that can do the following:
1. Build a lexicon of good (native English) collocations from a large internet corpus
2. Distinguish native and non-native texts based on the presence or absence of these collocations
3. Identify particular instances of bad (non-native) collocations in texts
4. For each bad collocation, build a small set of potential replacements from the good collocation list (for the user to select).
Step 1 is a fairly well studied problem (e.g. Shone and Jurafsky, 2001), and I have already done some preliminary work on something similar to improve word prediction during an earlier internship at Quillsoft. In brief, we can use the frequency of phrases and various more complex statistical metrics, e.g. pointwise mutual information, to identify word combinations that seem particularly salient...