Primitive Operation Aggregation Algorithms for Improving Taxonomies for Large-Scale Hierarchical Classifiers - Yahoo! JAPAN R&D

Publications

CONFERENCE (INTERNATIONAL) Primitive Operation Aggregation Algorithms for Improving Taxonomies for Large-Scale Hierarchical Classifiers

The Fifth International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA　2011)

February 01, 2013

Naive implementations of hierarchical classifiers that classify documents into large-scale taxonomy structures may face the contradiction between relevancy and efficiency performances. To address this problem, we focused on tax- onomy modification algorithms for gradually improving the relevance performances of large-scale hierarchical classifiers. We developed four taxonomy modification algorithms that aggregate primitive operations before investigating hierarchi- cal relevance performances. All but one produced taxonomy sequences that generate classifiers exhibiting practical effi- ciencies. One algorithm, which strictly maintains balanced proportions of taxonomy structures, generated a taxonomy sequence producing classifiers that exhibit stable relevancy performances. Another algorithm, which roughly maintains the proportions of taxonomy structure, but strictly maintains the maximum size of the training corpus for each local classifier, generated a taxonomy producing a classifier that exhibited the best relevance performance in our experiment. The base classification system we developed for this experiment uses an approach that locates local classifiers per parent node of tax- onomies. It is able to classify documents into directed-acyclic- graph structured taxonomies. The system reached the level of practical hierarchical classification systems that efficiently and relevantly predict documents into over 10,000 taxonomy classes.

Semantic Web

PDF : Primitive Operation Aggregation Algorithms for Improving Taxonomies for Large-Scale Hierarchical Classifiers