A Scalable Meta-Classifier Combining Search and Classification Techniques for Multi-Level Text Categorization

Michael Philip Oakes , Stefan Wermter , Nandita Tripathi
International Journal of Computational Intelligence and Applications, Volume 14, Number 4, doi: 10.1142/S1469026815500200 - Dec 2015
Associated documents :  
Nowadays, documents are increasingly associated with multi-level category hierarchies rather than a flat category scheme. As the volume and diversity of documents grow, so do the size and complexity of the corresponding category hierarchies. To be able to access such hierarchically classified documents in real time, we need fast automatic methods to navigate these hierarchies. Today’s data domains are also very different from each other, such as medicine and politics. These distinct domains can be handled by different classifiers. A document representation system which incorporates the inherent category structure of the data should also add useful semantic content to the data vectors and thus lead to better separability of classes. In this paper we present a scalable meta-classifier to tackle today’s problem of multi-level data classification in the presence of large datasets. To speed up the classification process, we use a search-based method to detect the level 1 category of a test document. For this purpose we use a category-hierarchy-based vector representation. We evaluate the meta-classifier by scaling to both longer documents as well as to a larger category set and show it to be robust in both cases. We test the architecture of our metaclassifier using six different base classifiers (Random Forest, C4.5, Multilayer Perceptron, Naïve Bayes, BayesNet and PART). We observe that even though there is a very small variation in the performance of different architectures, all of them perform much better than the corresponding single baseline classifiers. We conclude that there is substantial potential in this meta-classifier architecture, rather than the classifiers themselves, which successfully improves classification performance.

 

@Article{OWT15, 
 	 author =  {Oakes, Michael Philip and Wermter, Stefan and Tripathi, Nandita},  
 	 title = {A Scalable Meta-Classifier Combining Search and Classification Techniques for Multi-Level Text Categorization}, 
 	 journal = {International Journal of Computational Intelligence and Applications},
 	 number = {4},
 	 volume = {14},
 	 pages = {},
 	 year = {2015},
 	 month = {Dec},
 	 publisher = {World Scientific Publishing},
 	 doi = {10.1142/S1469026815500200}, 
 }