A Hybrid Fault Tolerance System for Distributed Environment using Check Point Mechanism and Replication

International Journal of Mobile Computing and Application
© 2017 by SSRG - IJMCA Journal
Volume 4 Issue 2
Year of Publication : 2017
Authors : S. Veerapandi and Dr. K. Alagarsamy
pdf
How to Cite?

S. Veerapandi and Dr. K. Alagarsamy, "A Hybrid Fault Tolerance System for Distributed Environment using Check Point Mechanism and Replication," SSRG International Journal of Mobile Computing and Application, vol. 4,  no. 2, pp. 1-7, 2017. Crossref, https://doi.org/10.14445/23939141/IJMCA-V4I3P101

Abstract:

Managing the distributed environment against the failures plays an important role nowadays. There are so many techniques evolved so far and each have their own merit and demerit. The efficiency of the algorithm depends on how much replication is done and upto what extent the fault tolerance has been achieved. We have here proposed a new method which uses both check point as well as the replication to ensure consistency in the distributed environment. Our method is also easy to implement.

Keywords:

 FTPA, PLR, GiFT

References:

[1] M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, G. Alonso,“ Understanding Replication in Databases and Distributed Systems,” Research supported by EPFLETHZ DRAGON project and OFES).
[2] M. Herlihy and J. Wing.“Linearizability: a correctness condition for concurrent objects,” ACM Trans. on Progr. Languages and Syst., 12(3):463-492, 1990. (IJIDCS) International Journal on Internet and Distributed Computing Systems.Vol: 1 No: 1, 39
[3] M. Ahamad, P.W. Hutto, G. Neiger, J.E. Burns, and P. Kohli., “Causal Memory:Definitions, implementations and Programming,” TR GIT-CC-93/55, Georgia In-stitute of Technology, July 94.
[4] H.P. Reiser, M.J. Danel, and F.J. Hauck., “ A flexible replication framework for scalable andreliable .net services.,” In Proc. of the IADIS Int. Conf. on Applied Computing, volume1, pages 161–169, 2005.
[5] A. Kale, U. Bharambe, “Highly available fault tolerant distributed computing using reflection and replication,” Proceedings of the International Conference on Advances in Computing, Communication and Control ,Mumbai, India Pages: 251-256 ,: 2009
[6] X. China, “Token-Based Sequential Consistency in Asynchronous Distributed System ,” 17 thInternaional Conference on Advanced Information Networking and Applications (AINA'03),March 27-29, ISBN: 0-7695- 1906-7
[7] A. Shye, , J. Blomstedt, , T. Moseley,V. Reddi, , and Daniel A. Connors, “PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures” Pp135-148.
[8] V. Agarwal, Fault Tolerance in Distributed Systems, I. Institute of Technology Kanpur, www.cse.iitk.ac.in/reportrepository, 2004.
[9] H. Jung, D. Shin, H. Kim, and Heon Y. Lee, “Design and Implementation of Multiple FaultTolerant MPI over Myrinet (M3) ,” SC|05 Nov 1218,2005, Seattle, Washington, USA Copyright 2005 ACM.
[10] M. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message passing systems. Technical Report CMU-CS-96-81, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, October 1996.
[11] L. Alvisi and K. Marzullo. Message logging : Pessimistic, optimistic, and causal. In Proceedings of the 15th International Conference on Distributed Computing,Systems (ICDCS 1995), pages ,229–236. IEEE CS Press, May-June 1995.
[12] J. Walters and V. Chaudhary,” Replication-Based Fault Tolerance for MPI Applications,” Ieee Transactions On Parallel And Distributed Systems, Vol. 20, No. 7, July 2009
[13] M Chtepen, F..Claeys, B. Dhoedt, , and P. Vanrolleghem,” Adaptive Task Checkpointing and Replication:Toward Efficient Fault-Tolerant Grids”, IEE Transactions on Parallel and Distributed Systems, Vol. 20, No. 2, Feb 2009
[14] S. Jafar, A. Krings, and T. Gautier,” Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing”, IEEE Transactions On Dependable and Secure Computing, Vol. 6, No. 1, Jan-Mar 2009
[15] X. Yang, Y. Du, Panfeng W. Fu, and Jia “FTPA: Supporting Fault-Tolerant Parallel Computing through Parallel Recomputing,” Ieee Transactions On Parallel And Distributed Systems, Vol. 20, No. 10, October 2009
[16] S. Gorender, and M Raynal, “An Adaptive Programming Model for Fault-Tolerant Distributed Computing” Ieee Transactions On Dependable And Secure Computing, Vol. 4, No. 1, January-March 2007.
[17] A. Luckow B. Schnor, „“Adaptive Checkpoint Replication for Supporting the Fault Tolerance of Applications in the Grid,“ Seventh IEEE International Symposium on Network Computing and Applications, 2008 IEEE.
[18] A. Bouteiller, F. Cappello, T. H Krawezik, Pi Lemarinier, F Magniette, “MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging, ” SC’03, NoV 15-21, 2003, Phoenix, Arizona, USA Copyright 2003 ACM 1-58113-695- 1/03/001
[19] I. Saha, D. Mukhopadhyay and S. Banerjee, “Designing Reliable Architecture For Stateful Fault Tolerance,” Proceedings of the Seventh International Conference on Parallel and Distributed Computing,Applications and Technologies (PDCAT'06) 2006.
[20] N. Gorde, S. Aggarwal, “A Fault Tolerance Scheme for Hierarchical Dynamic Schedulers in Grid” InternationalConference on Parallel Processing Workshops, 2008 IEEE
[21] Y. Li, , Z. Lan, , P. Gujrati and , X. Sun, , “Fault- AwareRuntime Strategies for High-Performance Computing,” IEEE Transactions on Parallel And Distributed Systems, Vol. 20, No. 4, April 2009
[22] G. Jakadeesan, D. Goswami, “A Classification-Based Approach to Fault-Tolerance Support in Parallel Programs”, International Conference on Parallel and Distributed Computing, Applications and Technologies, 2009 IEEE.
[23] D.K. Gifford, “Weighted voting for replicated data,” In SOSP ’79: Proc. of the seventh ACM symposium on Operating systems principles, pages 150–162, 1979.
[24] J. Osrael, L. Froihofer, K.M. Goeschka, S. Beyer,P. Gald´amez, , and F. Mu˜noz. “A system architecture for enhanced availability of tightly coupled distributed systems,” In Proc. of 1st Int. Conf. on Availability, Reliability, and Security.IEEE, 2006
[25] J Maccormick1, C Thekkath, M.Jager,K. Roomp, and L. Peterson , “Niobe: A Practical Replication Protocol.” ACM Journal Name, Vol. V, No. N, Month 20YY.
[26] Cao Huaihu, Zhu Jianming, “An Adaptive Replicas Creation Algorithm with Fault Tolerance in the Distributed Storage Network” 2008 IEEE..
[27] N. Budhiraja, K. Marzullo, F.B. Schneider, and S. Toueg.The Primary-Backup Approach. In SapeMullender, editor, Distributed Systems, pages 199-216. ACM Press, 1993.
[28] V.K Garg,. “Implementing fault-tolerant services using fused state machines,” Tech-nical Report ECE-PDS-2010- 001, Parallel and Distributed Systems Laboratory,ECE Dept. University of Texas at Austin (2010).
[29] N. Xiong, M. Cao, J. He and L. Shu, “A Survey on Faulttolerance in Distributed Network Systems,” 2009 International Conference on Computational Science, 978- 0- 7695-3823-5/09
[30] D. Tian , K. Wu X. Li, “A Novel Adaptive Failure Detector for Distributed Systems,” Proceedings of the 2008 International Conference on Networking, Architecture, and Storage ©2008 , ISBN: 978-0-7695- 3187-8