A Scalable Meta-Classifier Combining Search and Classification Techniques for Multi-Level Text Categorization
International Journal of Computational Intelligence and Applications,
Volume 14,
Number 4,
doi: 10.1142/S1469026815500200
- Dec 2015
Nowadays, documents are increasingly associated with multi-level category hierarchies
rather than a flat category scheme. As the volume and diversity of documents grow, so do the size
and complexity of the corresponding category hierarchies. To be able to access such hierarchically
classified documents in real time, we need fast automatic methods to navigate these hierarchies.
Todays data domains are also very different from each other, such as medicine and politics. These
distinct domains can be handled by different classifiers. A document representation system which
incorporates the inherent category structure of the data should also add useful semantic content to
the data vectors and thus lead to better separability of classes. In this paper we present a scalable
meta-classifier to tackle todays problem of multi-level data classification in the presence of large
datasets. To speed up the classification process, we use a search-based method to detect the level 1
category of a test document. For this purpose we use a category-hierarchy-based vector
representation. We evaluate the meta-classifier by scaling to both longer documents as well as to a
larger category set and show it to be robust in both cases. We test the architecture of our metaclassifier using six different base classifiers (Random Forest, C4.5, Multilayer Perceptron, Naïve
Bayes, BayesNet and PART). We observe that even though there is a very small variation in the
performance of different architectures, all of them perform much better than the corresponding
single baseline classifiers. We conclude that there is substantial potential in this meta-classifier
architecture, rather than the classifiers themselves, which successfully improves classification
performance.
@Article{OWT15, author = {Oakes, Michael Philip and Wermter, Stefan and Tripathi, Nandita}, title = {A Scalable Meta-Classifier Combining Search and Classification Techniques for Multi-Level Text Categorization}, journal = {International Journal of Computational Intelligence and Applications}, number = {4}, volume = {14}, pages = {}, year = {2015}, month = {Dec}, publisher = {World Scientific Publishing}, doi = {10.1142/S1469026815500200}, }