Visible to the public DScope: Detecting Real-World Data Corruption Hang Bugs in Cloud Server Systems

TitleDScope: Detecting Real-World Data Corruption Hang Bugs in Cloud Server Systems
Publication TypeConference Paper
Year of Publication2018
AuthorsDai, Ting, He, Jingzhu, Gu, Xiaohui, Lu, Shan, Wang, Peipei
Conference NameProceedings of the ACM Symposium on Cloud Computing
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-6011-1
Keywordscomposability, cyber physical systems, data corruption, False Data Detection, Human Behavior, performance bug detection, pubcrawl, resilience, Resiliency, static analysis

Cloud server systems such as Hadoop and Cassandra have enabled many real-world data-intensive applications running inside computing clouds. However, those systems present many data-corruption and performance problems which are notoriously difficult to debug due to the lack of diagnosis information. In this paper, we present DScope, a tool that statically detects data-corruption related software hang bugs in cloud server systems. DScope statically analyzes I/O operations and loops in a software package, and identifies loops whose exit conditions can be affected by I/O operations through returned data, returned error code, or I/O exception handling. After identifying those loops which are prone to hang problems under data corruption, DScope conducts loop bound and loop stride analysis to prune out false positives. We have implemented DScope and evaluated it using 9 common cloud server systems. Our results show that DScope can detect 42 real software hang bugs including 29 newly discovered software hang bugs. In contrast, existing bug detection tools miss detecting most of those bugs.

Citation Keydai_dscope:_2018