 
          Given a partially labelled dataset, Generalized Category Discovery (GCD) aims to categorize all unlabelled images, regardless of whether they belong to known or unknown classes. Existing approaches typically depend on either single-level semantics or manually designed abstract hierarchies, which limit their generalizability and scalability.
To address these limitations, we introduce a SE hierArchical Learning framework (SEAL), guided by naturally occurring and easily accessible hierarchical structures. Within SEAL, we propose a Hierarchical Semantic-Guided Soft Contrastive Learning approach that exploits hierarchical similarity to generate informative soft negatives, addressing the limitations of conventional contrastive losses that treat all negatives equally. Furthermore, a Cross-Granularity Consistency (CGC) module is designed to align the predictions from different levels of granularity.
SEAL consistently achieves state-of-the-art performance on fine-grained benchmarks, including the SSB benchmark, Oxford-Pet, and the Herbarium19 dataset, and further demonstrates generalization on coarse-grained datasets. 
            Overview of our proposed SEAL framework.
In contrast to prior GCD approaches that either rely solely on single-granularity information or depend on abstract hierarchies, we embed explicit semantic structure via three key elements: (1) a semantic-aware multi-task framework; (2) a cross-granularity consistency module to align predictions across levels; and (3) a hierarchical soft contrastive learning strategy to mitigate the ''equivalent negative'' assumption by weighting dissimilarity according to semantic proximity.
We present benchmark results of our method and compare it with state-of-the-art techniques in GCD as well as three robust baselines derived from novel category discovery. All methods are based on the DINO and DINOv2 pre-trained backbone. This comparative evaluation encompasses performance on the fine-grained SSB benchmark, and more results are presented in the paper. Our method consistently achieves state-of-the-art performance on the SSB benchmark based on both DINO and DINOv2 pretrained backbones.
 
                    We present a T-SNE visualization comparing the feature representations learned by the baseline and ours. For clarity, we randomly select 20 categories, including 10 from the Old set and 10 from the New set. As shown in the figure below, our method yields tighter, better-separated clusters, indicating stronger inter-class discrimination. The zoomed view further reveals that the model preserves coarse-to-fine semantics: visually diverse subcategories within the broader Cab group lie close together, yet each remains distinct. This confirms that our method captures hierarchical structure while retaining fine-grained separability.
 
                    This work is supported by National Natural Science Foundation of China (Grant No. 62306251), Hong Kong Research Grant Council - Early Career Scheme (Grant No. 27208022), Hong Kong Research Grant Council - General Research Fund (Grant No. 17211024), and HKU Seed Fund for Basic Research.
@inproceedings{He2025SEAL,
  author    = {Zhenqi He and Yuanpei Liu and Kai Han},
  title     = {SEAL: Semantic-Aware Hierarchical Learning for Generalized Category Discovery},
  booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
  year      = {2025},
  }