A new approach to represent textual documents using CVSM


  • Dr. Brahmananda Reddy
  • Dr. Y. Sagar
  • Dr. P. Subhash






Text Mining, Vector Space Model, Conceptual Vector Space Model, Wordnet, NLTK, Clustering.


Due to advancements in technology, a vast amount of data is produced which is generally in the form of unstructured data. This is where text mining finds its value to discover and retrieve useful information. Text mining is a process of seeking or extracting high quality information. Generally, in text mining, Vector Space Model (VSM) is used which transforms unstructured data to structured data by the use of traditional keyword based approach. One of the problems with this approach is that if a user puts a query, the set of documents are retrieved which match the keywords in the query. To overcome this, a Conceptual Vector Space Model (CVSM) is described in this paper which helps to categorize different documents with the same content which may use different vocabulary. The Conceptual Vector Space Model is implemented with the help of WordNet, Natural Language ToolKit (NLTK).Clustering algorithms are applied on it to form clusters based on concepts.



