A New Global Consistent Checkpoint Based on OS Virtualization

  • Hongliang Yu
  • Xiaojia Xiang
  • Jiwu Shu

Abstract

Checkpoint can store and recovery applications when faults happen and is becoming critical to large information systems. Unfortunately, existing checkpoint tools have some limitations such as: not transparent to applications, ignoring file system states, cluster checkpoint is not well supported, and so on. We present a light weight OS virtualization based cluster checkpoint. Firstly, a virtual container, IPG (Isolated Process Group), is designed to wrap all target applications together and produce checkpoint transparently and completely. Secondly, each IPG has its independent namespace built on an exclusively owned LV (Logical Volume), which can be checkpointed synchronously with the IPG’s memory to guarantee the consistency. Finally, distributed applications can be deployed on many IPGs and a cluster checkpoint protocol is presented to orchestrate all IPGs to produce global checkpoints. Experiments and evaluations results illustrate that no overhead will be introduced for applications running in IPGs, and our prototype system works more stable than the traditional library base checkpoint tools.
Published
2010-11-30