Hybrid classifiers based on semantic data subspaces for two-level text categorization

Michael Philip Oakes , Stefan Wermter

International Journal of Hybrid Intelligent Systems, Volume 10, Number 1, pages 33--41, doi: 10.3233/HIS-130163 - Mar 2013

Associated documents :

Many organizations are nowadays keeping their data in the form of multi-level categories for easier manageability. An example of this is the Reuters Corpus which has news items categorized in a hierarchy of up to five levels. The volume and diversity of documents available in such category hierarchies is also increasing daily. As such, it becomes difficult for a traditional classifier to efficiently handle multi-level categorization of such a varied document space. In this paper, we present hybrid classifiers involving various two-classifier and four-classifier combinations for two-level text categorization. We show that the classification accuracy of the hybrid combination is better than the classification accuracies of all the corresponding single classifiers. The constituent classifiers of the hybrid combination operate on different subspaces obtained by semantic separation of data. Our experiments show that dividing a document space into different semantic subspaces increases the efficiency of such hybrid classifier combinations. We further show that hierarchies with a larger number of categories at the first level benefit more from this general hybrid architecture.

@Article{OW13,
 	 author =  {Oakes, Michael Philip and Wermter, Stefan},
 	 title = {Hybrid classifiers based on semantic data subspaces for two-level text categorization},
 	 booktitle = {None},
 	 journal = {International Journal of Hybrid Intelligent Systems},
 	 editors = {None},
 	 number = {1},
 	 volume = {10},
 	 pages = {33--41},
 	 year = {2013},
 	 month = {Mar},
 	 publisher = {IOS Press Amsterdam},
 	 doi = {10.3233/HIS-130163},
 }