Non-intrusive failure diagnosis for distributed systems
Software failures are catastrophic. For example, a software failure resulted in the 2003 northeast blackout which lasted 7 hours and took over 55 million people in Ontario and U.S. out of power. Unfortunately, it is dauntingly difficult to diagnose such failures because the underlying software systems are extremely complex. This research is the first to propose non-intrusive failure diagnosis that does not require any modifications to the software. The expected results include the discovery and dissemination of knowledge on novel non-intrusive failure diagnosis techniques, software artefacts that include the failure diagnosis tools, possibly patents and technology transfers. The importance of the problem and the experimental nature of this research, together with the world-leading expertise of the supervisor in this area, provide the student with invaluable training opportunity. If successful, the research result can be directly used by our industrial partner, improving their systems availability, and save their system management cost.