Precise environmental perception is critical for the reliability of autonomous driving systems. While collaborative perception mitigates the limitations of single-agent perception through information sharing, it encounters a fundamental communication-performance trade-off. Existing communication-efficient approaches typically assume MB-level data transmission per collaboration, which may fail due to practical network constraints. To address these issues, we propose InfoCom, an information-aware framework establishing the pioneering theoretical foundation for communication-efficient collaborative perception via extended Information Bottleneck principles. Departing from mainstream feature manipulation, InfoCom introduces a novel information purification paradigm that theoretically optimizes the extraction of minimal sufficient task-critical information under Information Bottleneck constraints. Its core innovations include: i) An Information-Aware Encoding condensing features into minimal messages while preserving perception-relevant information; ii) A Sparse Mask Generation identifying spatial cues with negligible communication cost; and iii) A Multi-Scale Decoding that progressively recovers perceptual information through mask-guided mechanisms rather than simple feature reconstruction. Comprehensive experiments across multiple datasets demonstrate that InfoCom achieves near-lossless perception while reducing communication overhead from megabyte to kilobyte-scale, representing 440-fold and 90-fold reductions per agent compared to Where2comm and ERMVP, respectively.
InfoCom is a communication-efficient collaborative perception framework based on a novel information purification paradigm, consisting of three core modules: (1) Information-Aware Encoding condenses task-critical information from high-dimensional intermediate features into minimal sufficient representations by extending the Information Bottleneck principle; (2) Sparse Mask Generation identifies essential spatial cues with minimal communication overhead; (3) Multi-Scale Decoding progressively recovers perceptual information through mask-guided reconstruction.
Experimental results reveal three superiorities inherent to InfoCom. i) \textit{Exceptional communication efficiency}: InfoCom requires only kilobyte-level communication volume, comparable to Late Collaboration but significantly lower than other feature-based solutions. Specifically, its bandwidth consumption is over 400 times lower than Where2comm, only 1\% of that of ERMVP, and over 4000 times lower than Standard Collaboration. ii) \textit{Superior perception performance}: InfoCom maintains perception performance on par with the bandwidth-intensive Standard Collaboration despite minimal communication overhead, while significantly outperforms Where2comm. Moreover, ERMVP exhibits the smallest performance gap relative to InfoCom. iii) \textit{Optimal communication-performance trade-off}: InfoCom demonstrates state-of-the-art efficiency in performance gain per unit bandwidth. For example, on the OPV2V dataset, InfoCom achieves $1.8 \times 10^{-2}$ average performance gain per kilobyte, substantially exceeding Where2comm ($3.2 \times10^{-5}$) and ERMVP ($1.7 \times 10^{-4}$).
@inproceedings{Wei2026infocom,
author = {Quanmin Wei, Penglin Dai, Wei Li, Bingyi Liu, Xiao Wu},
title = {InfoCom: Kilobyte-Scale Communication-Efficient Collaborative Perception with Information Bottleneck},
booktitle = {AAAI Conference on Artificial Intelligence (AAAI)},
year = {2026}
}