The Analytics of Clouds and Big Data Computing

International Journal of Computer Science and Engineering
© 2016 by SSRG - IJCSE Journal
Volume 3 Issue 11
Year of Publication : 2016
Authors : Dr.E.Kesavulu Reddy

How to Cite?

Dr.E.Kesavulu Reddy, "The Analytics of Clouds and Big Data Computing," SSRG International Journal of Computer Science and Engineering , vol. 3,  no. 11, pp. 31-35, 2016. Crossref,


Knowledge Discovery in Data (KDD) aims to extract non obvious information using careful and detailed analysis and interpretation. Analytics comprises techniques of KDD, data mining, text mining, statistical and quantitative analysis, explanatory and predictive models, and advanced and interactive visualization to drive decisions and actions. Cloud computing is a versatile technology that can support a wide range of applications. The implementation of data mining techniques based on Cloud computing will allow the users to retrieve meaningful information from virtually integrated data warehouse which can reduces the costs of infrastructure and storage. Data Mining can retrieve the useful and potential information from the cloud. Big Data is usually defined by three characteristics called 3Vs (Volume, Velocity and Variety). It refers to data that are too large, dynamic and complex. In this context, data are difficult to capture, store, manage, and analyze using traditional data management tools. This paper survey approaches, environments, and technologies on areas that are key to Big Data analytics capabilities and discuss how they help building analytics solutions for Clouds.


Data Mining, Data Management, Cloud Computing, Big Data.


[1] F. Schomm, F. Stahl, G. Vossen, Marketplaces for Data: An Initial Survey, SIGMOD Record 42 (1) (2013)
[2] P. S. Yu, On Mining Big Data, in: J. Wang, H.Xiong, Y. Ishikawa,J. Xu, J. Zhou (Eds.), Web-geInformation Management, Vol. 7923,Lecture Notes in Computer Science, Springer-Verlag, Berlin,Heidelberg,2013.
[3] X. Sun, B. Gao, Y. Zhang, W. An, H. Cao, C.Guo, W. Sun, Towards Delivering Analytical Solutions in Cloud: Business Models andTechnical Challenges, in: Proceedings of the IEEE 8th InternationalConference on e-Business Engineering (ICEBE 2011),pp 347-351, IEEE ComputerSociety, Washington,USA, 2011,
[4] A. McAfee, E. Brynjolfsson, Big Data: The Management Revolution, Harvard Business Review, pp 60- 68,2012.
[5] B. Franks, Taming The Big Data Tidal Wave:Finding Opportunitiesin Huge Data Streams with Advanced Analytics, 1st Edition, Wileyand SAS Business Series, Wiley, 2012.
[6] G. Bell, T. Hey, A. Szalay, Beyond the Data Deluge, Science 323 (5919),pp 1297-1298,2009.
[7] T. H. Davenport, J. G. Harris, R. Morison, Analytics atWork: Smarter Decisions, Better Results, Harvard Business Review Press, 2010.
[8] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, The KDD Process for Extracting Useful Knowledge fromVolumes of Data, Communicationsof the ACM 39 (11), pp 27-34,1996.
[9] I. H. Witten, E. Frank, M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann, 2011.
[10] E. A. King, How to Buy Data Mining: A Framework for AvoidingCostly Project Pitfalls in PredictiveAnalytics, DMReview 15 (10).
[11] T. H. Davenport, J. G. Harris, Competing on Analytics: The New Science of Winning, Harvard Business Review Press, 2007.
[12] R. L. Grossman, What is Analytic Infrastructure and Why Should You Care?, ACM SIGKDD Explorations Newsletter 11 (1),5-9,2009.
[13] D. J. Abadi, Data Management in the Cloud: Limitations and Opportunities,IEEE Data Engineering Bulletin 32 (1),3-12,2009.
[14] S. Sakr, A. Liu, D. Batista, M. Alomari, A Survey of Large Scale Data Management Approaches in Cloud Environments, IEEE CommunicationsSurveys Tutorials 13 (3),311-336,2011.
[15] D. S. Katz, S. Jha, M. Parashar, O. Rana, J. B. Weissman, Survey and Analysis of Production Distributed Computing Infrastructures, CoRR abs/1208.2649.
[16] P. R. Krishna, K. I. Varma, Cloud Analytics: A Path Towards Next Generation Affordable BI, hite paper, Infosys ,2012.
[17] D. Jensen, K. Konkel, A. Mohindra, F. Naccarati, E. Sam, Business Analytics in the Cloud,White paper IBW03004-USEN-00, IBM, April2012.
[18] P. Russom, Big Data Analytics, TDWI best practices report, The Data Warehousing Institute(TDWI) Research ,2011.
[19] P. Zikopoulos, C. Eaton, P. Zikopoulos, Understanding Big Data: Analyticsfor Enterprise ClassHadoop and Streaming Data, McGraw-Hill Companies, Inc., 2012.
[20] PivotLinkAnalyticsCLOUD.
[21] J. K. Laurila, D. Gatica-Perez, I. Aad, J. Blom, O. Bornet, T.- M.-T.Do, O. Dousse, J. Eberle, M.Miettinen, The Mobile Data Challenge:Big Data for Mobile Computing Research, 2012.
[22] A. Iosup, A. Lascateu, N. Tapus, CAMEO: Enabling social networksfor Massively Multiplayer Online Games through Continuous Analyticsand Cloud Computing, in: Proceedings of the 9thAnnual Workshopon Network and Systems Support for Games pp 1-6, 2010.
[23] C. Wang, K. Schwan, V. Talwar, G. Eisenhauer, L. Hu, M. Wolf, A Flexible Architecture Integrating Monitoring and Analytics for Managing Large-Scale Data Centers, in: Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC 2011), pp 141-150,New York, USA, 2011
[24] D. Fisher, R. DeLine, M. Czerwinski, S. Drucker, Interactions with Big Data AnalyticsInteractions 19 (3),pp 50-59,2012.
[25] S.Ghemawat, H. Gobio_, S.-T. Leung, The Google File System, in: Proceedings of the 9th ACM Symposium on Operating Systems Principles,pp 29-43, ACM, New York, USA, 2003.
[26] E. Deelman, A. Chervenak, Data management challenges of data intensive scientific work flows, in: Proceedings of the 8th IEEE International Symposium on Cluster Computing and the GridIEEE Computer Society, pp 687-692,2008.
[27] S. Venugopal, R. Buyya, K. Ramamohanarao, A taxonomy of datagrids for distributed datasharing, management and processing, ACM Computing Surveys 38(1),pp1-53,2006.
[28] R. Ananthanarayanan, K. Gupta, P. Pandey, H. Pucha, P. Sarkar,M. Shah, R. Tewari, Cloud Analytics: Proceedings of the Conference onHotTopics in Cloud Computing ), USENIX Association, Berkeley, USA, 2009.
[28] R. Ananthanarayanan, K. Gupta, P. Pandey, H. Pucha, P. Sarkar,M. Shah, R. Tewari, Cloud Analytics: Proceedings of the Conference onHot Topics in Cloud Computing ), USENIX Association, Berkeley, USA, 2009.
[29] F. Schmuck, R. Haskin, GPFS: A Shared-Disk File System for Large Computing Clusters, in: Proceedings of the 1st Conference on File and Storage Technologies (FAST'02), Monterey, Pp 231-244, USA, 2002.
[30] J. Kobielus, In-Database Analytics: The Heart of the Predictive Enterprise,Technical report, Forrester Research, Inc., Cambridge, USA, Nov, 2009.
[31] Amazon red shift.
[32] Amazon data pipeline.
[33] Amazon Elastic MapReduce (EMR).
[34] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, W. Vogels, Dynamo: Amazon's Highly Available Key-Value Store, SIGOPS Operating Systems Review 41 (6) (2007) 205{220.
[35] J. Han, H. E, G. Le, J. Du, Survey on NoSQL database, in the 6th International Conference on Pervasive Computing and Applications (ICPCA2011), IEEE,pp 363-366,South Africa, 2011.
[36] Birst Inc.,
[37] P. Deepak, P. M. Deshpande, K. Murthy, Configurable and Extensible Multi-flows for ProvidingAnalytics as a Service on the Cloud, in the Proceedings of the 2012 Annual SRII Global Conference (SRII 2012),pp 1-10,2012.
[38] J. Dean, S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters Communications of the ACM 51 (1).
[39] Apache Hadoop,
[40] R. S. Barga, J. Ekanayake, W. Lu, Project Daytona: Data Analytics as a Cloud Service, in: A.Kementsietsidis, M. A. V. Salles (Eds.), Proceedings of the International Conference of Data Engineering (ICDE 2012), IEEE Computer Society,pp 1317-1320. 2012. [41] Info chimps cloud overview.
[42] Windows Azure HD Insight. Kementsietsidis, M. A. V. Salles (Eds.), Proceedings of the International Conference of Data Engineering (ICDE 2012), IEEE Computer Society,pp 1317-1320. 2012.