Meta AI: All You Need to Know about DINO and SAM


SOURCE: AIMAGAZINE.COM
JAN 17, 2026

By Tom Chapman

January 16, 2026

Meta AI's segment anything model (SAM) specialises in image segmentation. Picture: Meta AI/YouTube

Meta AI's DINO and SAM are reshaping tasks from everyday image understanding to high-stakes applications such as medical triage in emergencies

AI is transforming how we extract meaning from visual data.

Two of Meta AI’s most influential innovations in computer vision are DINO and SAM.

Though rooted in the same research ecosystem, they serve distinct purposes and together are reshaping tasks from everyday image understanding to high-stakes applications such as medical triage in emergencies.

DINO: Self-supervised vision foundation learning

DINO, an acronym for distillation with no labels, is Meta AI’s approach to creating powerful visual representations without relying on labelled data.

Youtube Placeholder

Traditionally, computer vision models require vast quantities of human-annotated images to learn to recognise objects or scenes.

DINO upends this by using a self-supervised learning (SSL) framework, where the model learns from the inherent structure of images themselves.

At its core, DINO uses a student–teacher setup.

A large teacher model guides a student model to produce similar visual representations from different views of the same input.

Over time, the student learns generalisable visual features that capture semantic information across image domains.

The DINOv2 family, the next iteration, was trained on around 142 million diverse images without labels, yielding features that are robust enough to support many tasks out of the box – from image classification to depth estimation, semantic segmentation and beyond – often without fine-tuning.

Now, DINOv3 scales SSL for images to produce Meta AI's strongest universal vision backbones, enabling breakthrough performance across diverse domains.

SAM for flexible segmentation

While DINO focuses on learning general visual features, the segment anything model (SAM) specialises in image segmentation – the task of dividing an image into meaningful regions or objects.

Youtube Placeholder

SAM is designed to be 'promptable': given simple prompts like clicks, boxes or text, it generates high-quality, pixel-accurate masks for objects or regions of interest.

SAM stands out because it is trained to generalise broadly.

It can produce accurate segmentation results even for objects it hasn’t seen before, making it a flexible tool for many domains.

The model’s capability to respond to different kinds of prompts means it can be deployed interactively for annotation, automated workflows or as part of more complex vision pipelines.

Since its initial launch, Meta has continued to refine SAM.

The model now supports accelerated segmentation and integration with multimodal systems that blend vision with natural language or other AI components, widening its applicability from healthcare to robotics and augmented reality.

The DARPA triage challenge

Advancements in computer vision, robotics and machine learning are now being put to the test in some of the most demanding environments imaginable.

The three-year Triage Challenge launched by the US Defense Advanced Research Projects Agency (DARPA) is aimed at transforming medical triage using autonomous systems.

The objective is to detect physiological signatures that indicate injury severity using stand-off sensors mounted on drones and robots, operating in low- or no-connectivity environments.

To simulate real-world MCIs, challenge scenarios include darkness, fog, dust, loud explosions and flashing lights.

Casualties may be buried under rubble or obscured from view.

Teams are scored on how many victims they identify, how accurately they classify injuries and how quickly they flag urgent cases before life-saving intervention windows close.

DINO and SAM are two of Meta AI’s most influential innovations in computer vision. Picture: Meta AI

Pushing the boundaries of medical AI

The University of Pennsylvania’s Penn Robotic Non-contact Triage and Observation (PRONTO) team brings together surgeons from Penn Medicine with robotics and computer vision researchers from Penn Engineering and the GRASP lab.

Their approach combines autonomous aerial and ground robots with Meta AI’s DINO and SAM models to perform rapid, non-contact injury assessment.

During Phase 1 of the DARPA Triage Challenge in 2024, PRONTO deployed a drone to survey the scene and locate victims, alongside a ground robot for stable imaging and vital-sign capture.

Visual data from these platforms was processed using SAM 2, enabling robust segmentation of bodies and regions of interest even in degraded visual conditions.

PRONTO’s system runs multiple parallel injury-classification pipelines that integrate SAM, DINO and Grounding DINO, an open-vocabulary object detection model.

Grounding DINO allows the system to use text prompts such as 'wound?' or 'blood?' to identify injury-related features within segmented image regions.

DINO then extracts high-level visual features, which feed into customised neural networks trained to detect and characterise injuries.

This approach allows the system to estimate heart rate, respiration, awareness and the presence of wounds or amputations without physical contact.

Each casualty’s location and clinical signature is visualised on a mobile interface for first responders, enabling medics to prioritise limited resources more effectively.

One of the most significant outcomes of the DARPA challenge is data.

Historically, there has been little definitive evidence comparing the effectiveness of different triage techniques across scenarios.

By capturing detailed, standardised datasets across increasingly realistic simulations, DARPA is creating an infrastructure for evidence-based evaluation of mass casualty response strategies.

As the challenge moves into its final phase, teams will explore how newer versions of DINO and SAM can further improve triage performance.

Professor Eric Eaton, Team Lead for PRONTO at the University of Pennsylvania

“We are really interested in making this application work in the real world,” says Professor Eric Eaton, Team Lead for PRONTO.

“The people I have on my team are trauma surgeons that deal with this in the trenches every day and researchers working on state-of-the-art robotics and machine learning.

“Together, we are looking to develop technologies that could be useful in saving lives.”

Once confined to research benchmarks, foundation vision models like DINO and SAM are demonstrating how general-purpose AI can be adapted to meet some of the most urgent challenges in emergency medicine and defence, where every second truly counts.