Publications

Journal Articles

Vero: An Open RL Recipe for General Visual Reasoning

ECCV, 2026

We introduce Vero, a family of fully open vision-language models (VLMs) designed for general visual reasoning across diverse domains such as charts, science, and spatial understanding. By scaling reinforcement learning (RL) data with the 600K-sample Vero-600K dataset and task-routed rewards, Vero achieves state-of-the-art performance on 30 challenging benchmarks, demonstrating that broad data coverage is the key driver of strong RL scaling. All data, code, and models are publicly released.

Gabriel Sarch, Linrong Cai, Qunzhong Wang, Haoyang Wu, Danqi Chen, Zhuang Liu
Download Paper | Download Slides

Holistic Surgical Phase Recognition with Hierarchical Input Dependent State Space Models

In submission to IEEE TMI, 2025

We propose a hierarchical input-dependent state space model that leverages the linear scalability of SSMs for decision-making over full-length videos. Our framework couples a temporally consistent visual encoder with an SSM head to propagate temporal information. The temporal module consists of two components: a local aggregation block for fine-grained dynamics and a global relation block for long-range dependencies.

Haoyang Wu, Tsun-Hsuan Wang, Mathias Lechner, Ramin Hasani, Jennifer A. Eckhoff, Paul Pak, Ozanan R. Meireles, Guy Rosman, Yutong Ban, Daniela Rus
Download Paper | Download Slides