Shrikar Madhu

Embodied AI · Computer Vision · World Models

I'm a Graduate Researcher at NYU Courant, working with Prof. Saining Xie on long-context video understanding, efficient sequence modeling, and spatial reasoning in vision systems.

My broader interests sit at the intersection of computer vision, continual learning, and spatial supersensing for embodied AI — building visual systems that perceive, reason about, and act in physical environments.

Before NYU, I was a Pre-Doctoral Fellow at the Kotak IISc AI-ML Research Centre, where I was advised by Prof. Suresh Sundaram and published at CVPR 2025. I also co-founded Physi.Fit — an idea that started with my own ACL injury and became a funded AI healthcare platform.

sm12761@nyu.edu GitHub LinkedIn Scholar CV

Affiliations

New York University — Courant Institute Graduate Researcher, Computer Vision Lab · Prof. Saining Xie

2025 – Present

Kotak IISc AI-ML Research Centre Pre-Doctoral Scholar · Dr. Suresh Sundaram

2024 – 2025

Indian Institute of Science — AI & Robotics Lab Research Assistant

2022 – 2023

PES University B.Tech. Computer Science & Engineering (Data Science) · GPA 3.8

2019 – 2023

Publications & Patents

CVPR 2025

VISTA-CLIP: Visual Incremental Self-Tuned Adaptation for Efficient Continual Panoptic Segmentation

D. Manjunath, S. Madhu, et al.

Continual learning framework using visual prompt tuning (<1M parameters) for panoptic segmentation in autonomous driving. Achieves state-of-the-art 62.3 mIoU on Cityscapes while preventing 53% catastrophic forgetting.

Paper Project

Submitted · ICRA 2025

IndraEye: Infrared Electro-Optical UAV-based Perception Dataset

D. Manjunath, S. Madhu, et al.

First multi-sensor, multi-domain slant-angle EO-IR dataset (50K+ aligned pairs) for UAV perception. Addresses occlusion and scale challenges in aerial object detection and semantic segmentation. Pix2Pix GAN cross-modal synthesis achieves 0.85 SSIM, improving all-weather perception by 28%.

Paper

ACM ICMVA 2023 · Oral

Detection of Conversational Health in a Multimodal Conversation Graph

S. Madhu, et al.

Graph labeling framework for ranking multimodal conversations via emotional concordance scoring. Uses late fusion in graph attention networks to learn node importance through representation learning — outperforming prior state-of-the-art.

Paper

US Patent 18/352,300

Method and System for Conducting Interactive Rehabilitation Sessions with Continuous Monitoring

S. Madhu, Y. Mahamuni, K. Suresh

Patented deep learning framework fusing video-based pose estimation with multi-modal biomarkers via attention-based GNNs, for real-time orthopedic rehabilitation monitoring. Deployed across 17 healthcare facilities.

USPTO

Experience

Graduate Researcher NYU Computer Vision Lab · Prof. Saining Xie

Sep 2025 – Present

Built Pico-LLM (Gated Linear Attention) achieving O(N) complexity, outperforming Transformers 8–16× on 5k-token sequences with 94% reasoning accuracy via GRPO-based RL post-training.
Long-context video understanding system on 8× A100 GPUs with custom CUDA kernels: 7% mAP improvement, 40% memory reduction over baseline.
Sparse transformer spatial reasoning: 50+ ablation experiments across 3 benchmarks, 2.1× inference speedup.

Pre-Doctoral Scholar Kotak IISc AI-ML Research Centre

Jun 2024 – Jul 2025

Developed VISTA-CLIP for continual panoptic segmentation — published CVPR 2025. SOTA 62.3 mIoU with visual prompt tuning, 53% less catastrophic forgetting.
Edge deployment with selective parameter adaptation and knowledge distillation: 3.2× lower latency at 98% accuracy on embedded devices.
Curated first multi-sensor EO-IR slant-angle aerial perception dataset (50K+ pairs) — submitted ICRA 2025.

Co-Founder & AI Research Engineer Physi.Fit, Inc.

May 2022 – Jul 2025

Built from a personal ACL injury into a $250K seed-funded (100X.VC) platform deployed across 17 facilities serving 100+ patients. Acquired by a Khosla Ventures-backed healthcare company.
30 FPS real-time pose estimation pipeline (Mobile MediaPipe, CNNs, Video-LLaVA) across 10K+ sessions — 92% motion accuracy, 25% reduction in therapy time.
Granted U.S. patent for deep learning framework for continuous monitoring in rehabilitation.

Research Assistant IISc AI & Robotics Lab

Jul 2022 – Sep 2023

Hybrid spiking neural network (SEFRON + Temporal-SNN) with STDP learning: +4.35% over CNN on CIFAR-10 at 60% lower energy — targeting neuromorphic edge AI applications.

Honors & Awards

CVPR 2025 Publication First author · VISTA-CLIP · Continual Panoptic Segmentation

2025

Kotak IISc Pre-Doctoral Fellowship Competitive AI-ML research grant at Indian Institute of Science

2024

Qualcomm AI Hackathon — Finalist RAG system with 85% retrieval accuracy

2025

Times Invest BLR — Top 10 AI Startups Physi.Fit recognized among top AI startups in Bangalore

2023

IIT Bombay Robotics Cup — Runner-Up National robotics competition

2022

VMware Hackathon — National Finalist

2023

PESU Venture Labs × ACM-W Hackathon — Winner

2021

U.S. Patent Granted AI-based rehabilitation monitoring system · PTO #18/352,300

2023

Technical

Vision & ML Computer Vision, Continual Learning, Embodied AI, VLMs, Multimodal Learning, RL (GRPO, PPO), Panoptic Segmentation

Frameworks PyTorch, JAX, TensorFlow, LangChain, LlamaIndex, CUDA

Languages Python, C/C++, Java, SQL, Shell

Infra Distributed Training, AWS, GCP, Docker, Kubernetes