Holistic Surgical Phase Recognition with Hierarchical Input Dependent State Space Models

Haoyang Wu, Tsun-Hsuan Wang, Mathias Lechner, Ramin Hasani, Jennifer A. Eckhoff, Paul Pak, Ozanan R. Meireles, Guy Rosman, Yutong Ban, Daniela Rus
Download Paper | Download Slides

In submission to IEEE TMI, 2025

We propose a hierarchical input-dependent state space model that leverages the linear scalability of SSMs for decision-making over full-length videos. Our framework couples a temporally consistent visual encoder with an SSM head to propagate temporal information. The temporal module consists of two components: a local aggregation block for fine-grained dynamics and a global relation block for long-range dependencies.