TDT- An Efficient Clustering Algorithm for Large Database

International Journal of Computer Science and Engineering
© 2015 by SSRG - IJCSE Journal
Volume 2 Issue 2
Year of Publication : 2015
Authors : Ms. Kritika Maheshwari, Mr. M.Rajsekaran

pdf
How to Cite?

Ms. Kritika Maheshwari, Mr. M.Rajsekaran, "TDT- An Efficient Clustering Algorithm for Large Database," SSRG International Journal of Computer Science and Engineering , vol. 2,  no. 2, pp. 17-21, 2015. Crossref, https://doi.org/10.14445/23488387/IJCSE-V2I2P104

Abstract:

A lot of side-information is available along with the text documents in online forums. Such side information may be of different kinds, as it may be the links in the document, access behavior from web histories or other nontextual attributes which are embedded into the text document. Such attributes contain huge amount of information for clustering purposes. However, the importance of this side-information is difficult to calculate, mostly when some of the information is noisy. Therefore in these cases it is risky to incorporate side information into the clustering process, because it may either improve the quality of the clustering process, or it can even add some noisy information to it. Therefore, a principled way to perform the clustering process is needed, so as to maximize the advantages from using this side information. And to result the search query efficiently and effectively. An algorithm for text clustering with sideinformation is described here i.e. COATES Algorithm.

Keywords:

Clustering process, Dimensionality reduction, Side Information, Topic Detection tracking

References:

[1] C. C. Aggarwal and C.-X. Zhai, Mining Text Data. New York, NY, USA: Springer, 2012. 
[2] C. C. Aggarwal and P. S. Yu, ―A framework for clustering massive text and categorical data streams,‖ in Proc. SIAM Conf. Data Mining, 2006, pp. 477–481. 
[3] D. Cutting, D. Karger, J. Pedersen, and J. Tukey, ―Scatter/Gather: A cluster-based approach to browsing large document collec- tions,‖ in Proc. ACM SIGIR Conf., New York, NY, USA, 1992, pp. 318–329. 
[4] T. Zhang, R. Ramakrishnan, and M. Livny, ―BIRCH: An efficient data clustering method for very large databases,‖ in Proc. ACM SIGMOD Conf., New York, NY, USA, 1996, pp. 103– 114. 
[5] Shi Zhong, ―Efficient Online Sphercal K-means Clustering‖ in Proc. International Joint Conference on Neural Networks, Montreal, Canada, jii1 -August4, 2005