VLM Visual Language Model Perception

Context is king: How Avride uses cloud VLMs as a safety net for delivery robots

It is important to clarify: we do not use VLMs to drive the robot. Using a heavy cloud model to steer in real time would ...

23h

Free-text answers and LLMs reveal hidden reasons behind human choices

Why do people make the choices they do? Researchers from the Center Synergy of Systems (SynoSys) at TUD Dresden University of Technology, the Max Planck Institute for Human Development, and the ...

The LancetOpinion

Deception in clinical large language models: an under-recognised safety risk

Large language models (LLMs) are rapidly being integrated into clinical workflows, supporting tasks such as diagnosis ...

News-Medical.Net

Study reveals hidden brain circuit behind flexible visual thinking

Nuttida Rungratsameetaweemana is challenging a story neuroscience has told for decades. According to the conventional account ...

Interesting Engineering on MSN

Video: New AI model gives humanoid robots 90 percent success in complex missions

Flexion Robotics has introduced Reflect v1.0, a robotics intelligence platform that enables humanoid robots ...

Communications of the ACM

The Race to Reliable Visual Understanding

The biggest innovation over the last year is that inference-time scaling techniques that have been pioneered in natural language models have now come to visual language models,” said Eric Heim, chief ...

IEEE

Revisiting InternVL: A Systematic Technical Framework for Building Powerful Open-Source Vision-Language Models

Abstract: Building a powerful vision-language model (VLM) necessitates a holistic system design encompassing model architecture, data curation, and training paradigms. In this paper, we present a ...

InfoQ

The AI Productivity Paradox in Test Automation: Moving beyond Structural Validation to Perception and Intent

A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...

Microsoft

Decomposed On-Policy Distillation for Vision-Language Reasoning: Steering Gradients for Visual Grounding

While on-policy distillation offers dense supervision for training small reasoning models, its optimization dynamics in the multimodal domain remain under-explored. In this work, we challenge the ...

IEEE

Component-Specific Prompt Tuning for Deepfake Detection

Abstract: With the development of deep learning technology, the facial images generated by deepfake technology have reached a level of authenticity that is difficult to distinguish, posing a serious ...

Frontiers

ActionX: pre-training action experts with reinforcement learning for vision-language action models

Vision-Language Action (VLA) models have enabled language-driven robotic manipulation by integrating language instructions, visual perception, and action generation. However, existing VLA approaches ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results