Hashing And Clustering Based Novelty Detection

International Journal of Computer Science and Engineering
© 2019 by SSRG - IJCSE Journal
Volume 6 Issue 6
Year of Publication : 2019
Authors : Pooja Goyal, Sushil Kumar, Komal Kumar Bhatia

pdf
How to Cite?

Pooja Goyal, Sushil Kumar, Komal Kumar Bhatia, "Hashing And Clustering Based Novelty Detection," SSRG International Journal of Computer Science and Engineering , vol. 6,  no. 6, pp. 1-9, 2019. Crossref, https://doi.org/10.14445/23488387/IJCSE-V6I6P101

Abstract:

Novelty detection is a method of identifying new data from the incoming stream of documents. Many technologies for novelty detection are available at present. To enhance the performance of novelty detection, we present another method for novelty detection. In this work, we organize a new structure for novelty detection using hashing and clustering. This method consists of two stages. The first phase consists of dividing sentences, stopwords removal, text preprocessing, converting the N-grams into hashes from adaptations of documents. The second phase consists of clustering and novelty detection based on a fixed threshold. This method is useful for determining the high number of novel documents by detecting new information in data. This method is useful for finding excess in documents and also providing relevant documents against the query. The main aim of this work is to find appropriate and redundant free documents. The goal is to provide information to the user as fast as feasible.

Keywords:

Clustering, Hashing, Novelty Detection, Plagiarism.

References:

[1] Sendhilkumar, Nachiyar S Nandhini “ Novelty Detection via Topic Modeling “, Department of Information science and Technology ,Anna University, Chennai, Tamil Nadu
[2] Tirthankar Ghosal , Amrita Salam“ A Corpus for Document Level Novelty Detection”, Indian Institute of technology Patna Bihta,Bihar.
[3] Agus T. Kwee, Flora S. Tsai “ Sentence Level Novelty Detection in English and Malay” Nayang Technological University, School of Electrical and Electronic Engineering, Singapore.
[4] Yi Zhang, Flora S. Tsai . Combining Named Entities and Tags for Novel Sentence Detection . Nanyang Technological University 50 Nanyang Avenue Singapore 639798 .
[5] Yiming Yang, Jian Zhang, Jaime Carbonell, Chun Jin . Topic-conditioned Novelty Detection. School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213-8213, USA .
[6] Marek Gajewski, Janusz Kacprzyk, Sławomir Zadrożny. TOPIC DETECTION AND TRACKING: A FOCUSED SURVEY AND A NEW VARIANT. Systems Research Institute, Polish Academy of Sciences, Warszawa, ul. Newelska 6, 01-447 Warszawa, Poland Warsaw School of Information Technology, ul. Newelska 6, 01-447 Warszawa, Poland .
[7] Xiaoyan Li , W. Bruce Croft . Novelty Detection Based on Sentence Level Patterns . Center for Intelligent Information Retrieval Department of Computer Science University of Massachusetts, Amherst MA 01003.
[8] Michael Gamon . Graph-Based Text Representation for Novelty Detection . Microsoft ResearchRedmond, WA 98052
[9] Ming-Feng Tsai, Ming-Hung Hsu, and Hsin-Hsi Chen . Similarity Computation in Novelty Detection. Department of Computer Science and Information Engineering National Taiwan University 1, Section 4, Roosevelt Road, Taipei, Taiwan, 106 .
[10] Saul Schleimer , Daniel S. Wilkerson, Alex Aiken . Winnowing: Local Algorithms for Document Fingerprinting. University of Illinois, Chicago, Computer Science Division UC Berkeley.
[11] Anton Yudhana, Sunardi , Iif Alfiatul Mukaromah . Implementation of Winnowing Algorithm with Dictionary English-Indonesia Technique to Detect Plagiarism . Department of Electrical Engineering Universitas Ahmad Dahlan Yogyakarta, Indonesia, Master of Informatics Engineering Universitas Ahmad Dahlan Yogyakarta, Indonesia .
[12] Norzima Elbegbayan . Winnowing, a Document Fingerprinting Algorithm . Department of Computer Science Linkoping University . [13] Rhio Sutoyo , Insan Ramadhani, Angger Dwi Ardiantma , Sanditya Cakti Bavana . Detecting documents plagiarism using winnowing algorithm and k-gram method . IEEE Conference, 2017. [14] Agung Toto Wibowo, Kadak W. Sudarmadi , Ari M. Barmawi . Comparison between fingerprint and winnowing algorithm to detect plagiarism fraud on Bahasa Indonesia documents . IEEE Conference, 2013. [15] Dubravko. Milijkovic . Review of novelty detection methods. 33rd International Conference .
[16] Marco A. F. Pimentel ,David A. Clifton , Lei Clifton ,Lionel Tarassenko . A review of novelty detection. Journal Signal Processing , Volume 99, 2014.
[17] Markos Markou and Sameer Singh. Novelty Detection: A Review Part 1: Statistical Approaches. PANN Research, Department of Computer Science University of Exeter, Exeter EX4 4PT, UK .
[18] .A. Clifton, H. Yin, and Y. Zhang, “Support vector machine in novelty detection for multi-channel combustion data,” in Proc. 3rd International Conference Advance Neural Network.-Volume Part III, 2006 ,pp.836–843.
[19] E.J.Spinosa, deA.C.P.L.F.deCarvalho, andJ.Gama, “Cluster- based novel concept detection in data streams applied to intrusion detectionincomputernetworks.”inProc. ACMSymp. Application Computer 2008, pp.976–980.
[20] L. Khan, M. Awad, and B. Thuraisingham, “A new intrusion detection system using support vector machines and hierarchical clustering,”TheVLDBJ.,vol.16,no .4,pp.507–521,2007.