Neural Network-based Document Clustering using WordNet Ontologies

Chihli Hung , Stefan Wermter
International Journal of Hybrid Intelligent Systems, Volume 1, pages 127--142, - 2004
Associated documents :  
Three novel text vector representation approaches for neural network based document clustering are proposed. The first is the extended significance vector model (ESVM), the second is the hypernym significance vector model (HSVM) and the last is the hybrid vector space model (HyM). ESVM extracts the relationship between words and their preferred classified labels. HSVM exploits a semantic relationship from the WordNet ontology. A more general term, the hypernym, substitutes for terms with similar concepts. This hypernym semantic relationship supplements the neural model in document clustering. HyM is a combination of a TFxIDF vector and a hypernym significance vector, which combines the advantages and reduces the disadvantages from both unsupervised and supervised vector representation approaches. According to our experiments, the self-organising map (SOM) model based on the HyM text vector representation approach is able to improve classification accuracy and to reduce the average quantization error (AQE) on 10,000 full-text articles.

 

@Article{HW04, 
 	 author =  {Hung, Chihli and Wermter, Stefan},  
 	 title = {Neural Network-based Document Clustering using WordNet Ontologies}, 
 	 journal = {International Journal of Hybrid Intelligent Systems},
 	 number = {},
 	 volume = {1},
 	 pages = {127--142},
 	 year = {2004},
 	 month = {},
 	 publisher = {IOS Press},
 	 doi = {}, 
 }