Text size
  • Small
  • Medium
  • Large
  • Standard
  • Blue text on blue
  • High contrast (Yellow text on black)
  • Blue text on beige

    Operating System Support to Detect Application Hangs

    Second International Workshop on Verification and Evaluation of Computer and Communication Systems (VECoS 2008)

    Leeds, UK, 2 - 3 July 2008


    Gabriella Carrozza, Marcello Cinque, Domenico Cotroneo and Roberto Natella


    On-line failure detection is an essential means to control and assess the dependability of complex and critical software systems. In such context, effective detection strategies are required, in order to minimize the possibility of catastrophic consequences. This objective is however difficult to achieve in complex systems, especially due to the several sources of non-determinism (e.g., multi-threading and distributed interaction) which may lead to software hangs, i.e., the system is active but no longer capable of delivering its services. The paper proposes a detection approach to uncover application hangs. It exploits multiple indirect data gathered at the operating system level to monitor the system and to trigger alarms if the observed behavior deviates from the expected one. By means of fault injection experiments conducted on a research prototype, it is shown how the combination of several operating system monitors actually leads to an high quality of detection, at an acceptable overhead.


    PDF filePDF Version of this Paper (345kb)