
Share
CoHD leverages hierarchical decoding and counting mechanisms to enhance accuracy in segmenting objects described by referring expressions, especially in scenes with multiple or non-target items.
In a recent paper from researchers at Tsinghua University, Huazhong University of Science and Technology, and Tencent TEG, a novel framework called CoHD (Counting-Aware Hierarchical Decoding) is introduced to address the challenges in Generalized Referring Expression Segmentation (GRES). GRES extends the classic referring expression segmentation (RES) task by handling complex scenarios involving multiple or non-target objects. This new approach aims to improve the precision and comprehensiveness of object representation, particularly in multi-granularity contexts.
The key technical innovation in CoHD is its hierarchical decoding mechanism, which decouples the intricate referring semantics into different granularities using a visual-linguistic hierarchy. This allows for more precise representation of objects at various levels of detail. Additionally, CoHD incorporates counting ability to better handle multiple and non-target scenarios, which are often ambiguous in traditional binary classification methods.
For practitioners, this framework offers several advantages:
Hierarchical Decoding:
Counting Ability:

Visual-Linguistic Hierarchy:
Dynamic Selective Aggregation:
Counting Mechanism:
CoHD was evaluated on several benchmarks including gRefCOCO, Ref-ZOM, R-RefCOCO, and RefCOCO. The results demonstrate significant improvements over state-of-the-art GRES methods:
CoHD represents a significant step forward in the field of generalized referring expression segmentation. By decoupling object information into different granularities and incorporating counting ability, the framework addresses key challenges in handling complex scenarios. The experimental results on multiple benchmarks confirm its effectiveness and potential for real-world applications.
Tags
Original Sources
↗ https://arxiv.org/pdf/2405.15658
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
28 May 2024
88 articles
Related Articles
Related Articles
More Stories