artificial intelligence computer Qualcomm Qualcomm Arduino Ventuno Q – single-board computer for robotics and AI systems
SOURCE: HI-TECH.UA
MAR 13, 2026
NVIDIA AI releases C-RADIOv4 vision backbone unifying SigLIP2, DINOv3, SAM3 for classification, dense prediction, segmentation workloads at scale
SOURCE: MARKTECHPOST.COM
FEB 06, 2026
By
-
February 6, 2026
How do you combine SigLIP2, DINOv3, and SAM3 into a single vision backbone without sacrificing dense or segmentation performance? NVIDIA’s C-RADIOv4 is a new agglomerative vision backbone that distills three strong teacher models, SigLIP2-g-384, DINOv3-7B, and SAM3, into a single student encoder. It extends the AM-RADIO and RADIOv2.5 line, keeping similar computational cost while improving dense prediction quality, resolution robustness, and drop-in compatibility with SAM3.
The key idea is simple. Instead of choosing between a vision language model, a self supervised dense model, and a segmentation model, C-RADIOv4 tries to approximate all three at once with one backbone.

https://www.arxiv.org/pdf/2601.17237
The RADIO family uses agglomerative distillation. A single ViT style student is trained to match both dense feature maps and summary tokens from several heterogeneous teachers.

Earlier RADIO models combined DFN CLIP, DINOv2, and SAM. They already supported multi resolution training but showed ‘mode switching’, where the representation changed qualitatively as input resolution changed. Later work such as PHI-S, RADIOv2.5, and FeatSharp added better multi resolution distillation and regularization, but the teacher set was still limited.
C-RADIOv4 upgrades the teachers:
The student is trained so that its dense features match DINOv3 and SAM3, while its summary tokens match SigLIP2 and DINOv3. This gives one encoder that can support classification, retrieval, dense prediction, and segmentation.
C-RADIOv4 uses stochastic multi resolution training rather than a small fixed set of resolutions.
Training samples input sizes from two partitions:
SigLIP2 operates natively at 384 pixels. Its features are upsampled by a factor of 3 using FeatSharp to align with 1152 pixel SAM3 features. SAM3 is trained with mosaic augmentation at 1152 × 1152.
This design smooths the performance curve over resolution and improves low resolution behavior. For example, on ADE20k linear probing, C-RADIOv4-H reaches around:
The scaling trend is close to DINOv3-7B while using roughly an order of magnitude fewer parameters.
Distilling from large vision models tends to copy their artifacts, not just their useful structure. SigLIP2 has border noise patterns, and ViTDet style models can show window boundary artifacts. Direct feature regression can force the student to reproduce those patterns.
C-RADIOv4 introduces two shift equivariant mechanisms to suppress such noise:
In addition, training uses DAMP, which injects multiplicative noise into weights. This further improves robustness to corruptions and small distribution shifts.
The summary loss in previous RADIO models used cosine distance between student and teacher embeddings. Cosine distance removes magnitude but not directional dispersion on the sphere. Some teachers, such as SigLIP2, produce embeddings concentrated in a narrow cone, while DINOv3 variants produce more spread out embeddings.
If raw cosine distance is used, teachers with wider angular dispersion contribute larger losses and dominate optimization. In practice, DINOv3 tended to overshadow SigLIP2 in the summary term.
C-RADIOv4 replaces this with an angle normalized loss. The squared angle between student and teacher embeddings is divided by the teacher’s angular dispersion. Measured dispersions show SigLIP2-g-384 around 0.694, while DINOv3-H+ and DINOv3-7B are around 2.12 and 2.19. Normalizing by these values equalizes their influence and preserves both vision language and dense semantics.
On ImageNet-1k zero shot classification, C-RADIOv4-H reaches about 83.09 % top-1 accuracy. It matches or improves on RADIOv2.5-H and C-RADIOv3-H across resolutions, with the best performance near 1024 px.
On k-NN classification, C-RADIOv4-H improves over RADIOv2.5 and C-RADIOv3, and matches or surpasses DINOv3 starting around 256 px. DINOv3 peaks near 192–256 px and then degrades, while C-RADIOv4 keeps stable or improving performance at higher resolutions.
Dense and 3D aware metrics show the intended tradeoff. On ADE20k, PASCAL VOC, NAVI, and SPair, C-RADIOv4-H and the SO400M variant outperform earlier RADIO models and are competitive with DINOv3-7B on dense benchmarks. For C-RADIOv4-H, typical scores are:

https://www.arxiv.org/pdf/2601.17237
On Probe3d, which includes Depth Normals, Surface Normals, NAVI, and SPair, C-RADIOv4-H achieves the best NAVI and SPair scores in the RADIO family. Depth and Surface metrics are close to those of C-RADIOv3-H, with small differences in either direction, rather than a uniform improvement.
C-RADIOv4 is designed to be a drop in replacement for the Perception Encoder backbone in SAM3. The SAM3 decoder and memory components remain unchanged. A reference implementation is provided in a SAM3 fork. Qualitative examples show that segmentation behavior is preserved for both text prompts such as “shoe”, “helmet”, “bike”, “spectator” and box prompts, and in some reported cases C-RADIOv4 based SAM3 resolves failure cases from the original encoder.
For deployment, C-RADIOv4 exposes a ViTDet-mode configuration. Most transformer blocks use windowed attention, while a few use global attention. Supported window sizes range from 6 × 6 to 32 × 32 tokens, subject to divisibility with patch size and image resolution. On an A100, the SO400M model with window size at most 12 is faster than the SAM3 ViT-L+ encoder across a wide range of input sizes, and the Huge model with window size 8 is close in latency.
This makes C-RADIOv4 a practical backbone for high resolution dense tasks where full global attention at all layers is too expensive.
Check out the Paper, Repo, Model-1 and Model-2. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
LATEST NEWS
Gene Editing
Scientists Discover Simple Trick That Boosts mRNA Therapy Delivery 20-Fold
MAR 15, 2026
WHAT'S TRENDING
Data Science
5 Imaginative Data Science Projects That Can Make Your Portfolio Stand Out
OCT 05, 2022
SOURCE: HI-TECH.UA
MAR 13, 2026
SOURCE: BIOENGINEER.ORG
MAR 07, 2026
SOURCE: GOODMENPROJECT.COM
MAR 07, 2026
SOURCE: MOTORSPORT.NEXTGEN-AUTO.COM
FEB 27, 2026
SOURCE: RETAILTECHINNOVATIONHUB.COM
FEB 27, 2026
SOURCE: NEWS.UCSB.EDU
FEB 20, 2026