WU Sen, JIANG Dan-dan, WANG Qiang. HABOS clustering algorithm for categorical data[J]. Chinese Journal of Engineering, 2016, 38(7): 1017-1024. DOI: 10.13374/j.issn2095-9389.2016.07.018
Citation: WU Sen, JIANG Dan-dan, WANG Qiang. HABOS clustering algorithm for categorical data[J]. Chinese Journal of Engineering, 2016, 38(7): 1017-1024. DOI: 10.13374/j.issn2095-9389.2016.07.018

HABOS clustering algorithm for categorical data

  • The clustering algorithm based on sparse feature vector for categorical attributes(CABOSFVC) is an efficient high-dimensional clustering method for categorical data. Sparse feature dissimilarity(SFD) is used to calculate the distance and sparse feature vector is used to achieve data compression. However,CABOSFVC algorithm is dependent upon SFD upper limit parameter for which there is no guidance for configuration. Aimed at solving the problem that CABOSFVC algorithm is sensitive to this parameter,a new heuristic hierarchical clustering algorithm of categorical data based on SFD(HABOS) was proposed in this paper. With the constraint of the upper limit number of clusters,this algorithm applied agglomerative hierarchical clustering and the new internal clustering validation index based on SFD(CVISFD) which was used to measure the results heuristically to achieve the best choice of the clustering level. Three UCI benchmark data sets were used to compare the improved algorithm with the traditional ones. The empirical tests show that HABOS increases the clustering accuracy and stability effectively.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return