Analyzing Tagging Behavior in Clustering Similar Web Resources through Interactive Visual Demonstration

International Journal of Computer Science and Engineering
© 2014 by SSRG - IJCSE Journal
Volume 1 Issue 10
Year of Publication : 2014
Authors : Marjan Farsi

There are millions of web pages which are annotated by the means of freely chosen words called tags and saved in social tagging systems, daily. These Tagging-based systems like provide internet users with the facilities to store their web resources in the web space in order to be accessible and retrievable from everywhere around the world. These sites by organizing data and providing the search facility based on tripartite elements (user, tag, bookmark url), give good information services to the internet users. In recent years, by developing and spreading the usage of social bookmarking sites, many web mining researchers and scientists have become motivated to study the data acquired from these sites to explore new information from these sites. Consequently the techniques for web crawling and data extraction, classifying and clustering algorithms and data visualization methods and tools have been applied for this aim. Usually these acquired and clustered data are analyzed in order to getting the hidden statistical or behavioral facts and concepts embedded in the relation between tripartite elements. In this paper, one aspect of these behavioral facts and concepts, the effect of tagging behavior to find web pages similar in content according to the common tags of the extracted urls, will be analyzed and discussed. All these required data comes from one of these social bookmarking sites, This similarity will be explored through executing an implemented Java application in which the similar web pages will be clustered in similar groups by applying similarity measurement algorithms and k-mean clustering technique. This investigation has been done quantitatively and qualitatively. It means that the statistical facts about tagging behavior in finding similar web pages, which are generated by the produced application, will be reported in Excel sheet format, also the processed data will be visually represented in graph structure by applying ‘Prefuse’ visualizing tool. The relationships between visual objects in each graph will be discussed and analyzed from the tagging behavior point of view.


Key Words:

Data visualization tool, k-mean Clustering, similarity measurement algorithms, social bookmarking sites, Tagging Behavior, Web mining,