Shrikar Madhu

Embodied AI · Computer Vision · World Models

I'm a Graduate Researcher at NYU Courant, working with Prof. Saining Xie on long-context video understanding, efficient sequence modeling, and spatial reasoning in vision systems.

My broader interests sit at the intersection of computer vision, continual learning, and spatial supersensing for embodied AI — building visual systems that perceive, reason about, and act in physical environments.

Before NYU, I was a Pre-Doctoral Fellow at the Kotak IISc AI-ML Research Centre, where I was advised by Prof. Suresh Sundaram and published at CVPR 2025. I also co-founded Physi.Fit — an idea that started with my own ACL injury and became a funded AI healthcare platform.

Shrikar Madhu

Affiliations

New York University — Courant Institute Graduate Researcher, Computer Vision Lab · Prof. Saining Xie
2025 – Present
Kotak IISc AI-ML Research Centre Pre-Doctoral Scholar · Dr. Suresh Sundaram
2024 – 2025
Indian Institute of Science — AI & Robotics Lab Research Assistant
2022 – 2023
PES University B.Tech. Computer Science & Engineering (Data Science) · GPA 3.8
2019 – 2023

Publications & Patents

CVPR 2025

VISTA-CLIP: Visual Incremental Self-Tuned Adaptation for Efficient Continual Panoptic Segmentation

D. Manjunath, S. Madhu, et al.

Continual learning framework using visual prompt tuning (<1M parameters) for panoptic segmentation in autonomous driving. Achieves state-of-the-art 62.3 mIoU on Cityscapes while preventing 53% catastrophic forgetting.

Submitted · ICRA 2025

IndraEye: Infrared Electro-Optical UAV-based Perception Dataset

D. Manjunath, S. Madhu, et al.

First multi-sensor, multi-domain slant-angle EO-IR dataset (50K+ aligned pairs) for UAV perception. Addresses occlusion and scale challenges in aerial object detection and semantic segmentation. Pix2Pix GAN cross-modal synthesis achieves 0.85 SSIM, improving all-weather perception by 28%.

ACM ICMVA 2023 · Oral

Detection of Conversational Health in a Multimodal Conversation Graph

S. Madhu, et al.

Graph labeling framework for ranking multimodal conversations via emotional concordance scoring. Uses late fusion in graph attention networks to learn node importance through representation learning — outperforming prior state-of-the-art.

US Patent 18/352,300

Method and System for Conducting Interactive Rehabilitation Sessions with Continuous Monitoring

S. Madhu, Y. Mahamuni, K. Suresh

Patented deep learning framework fusing video-based pose estimation with multi-modal biomarkers via attention-based GNNs, for real-time orthopedic rehabilitation monitoring. Deployed across 17 healthcare facilities.

Experience

Graduate Researcher NYU Computer Vision Lab · Prof. Saining Xie

Sep 2025 – Present
  • Built Pico-LLM (Gated Linear Attention) achieving O(N) complexity, outperforming Transformers 8–16× on 5k-token sequences with 94% reasoning accuracy via GRPO-based RL post-training.
  • Long-context video understanding system on 8× A100 GPUs with custom CUDA kernels: 7% mAP improvement, 40% memory reduction over baseline.
  • Sparse transformer spatial reasoning: 50+ ablation experiments across 3 benchmarks, 2.1× inference speedup.

Pre-Doctoral Scholar Kotak IISc AI-ML Research Centre

Jun 2024 – Jul 2025
  • Developed VISTA-CLIP for continual panoptic segmentation — published CVPR 2025. SOTA 62.3 mIoU with visual prompt tuning, 53% less catastrophic forgetting.
  • Edge deployment with selective parameter adaptation and knowledge distillation: 3.2× lower latency at 98% accuracy on embedded devices.
  • Curated first multi-sensor EO-IR slant-angle aerial perception dataset (50K+ pairs) — submitted ICRA 2025.

Co-Founder & AI Research Engineer Physi.Fit, Inc.

May 2022 – Jul 2025
  • Built from a personal ACL injury into a $250K seed-funded (100X.VC) platform deployed across 17 facilities serving 100+ patients. Acquired by a Khosla Ventures-backed healthcare company.
  • 30 FPS real-time pose estimation pipeline (Mobile MediaPipe, CNNs, Video-LLaVA) across 10K+ sessions — 92% motion accuracy, 25% reduction in therapy time.
  • Granted U.S. patent for deep learning framework for continuous monitoring in rehabilitation.

Research Assistant IISc AI & Robotics Lab

Jul 2022 – Sep 2023
  • Hybrid spiking neural network (SEFRON + Temporal-SNN) with STDP learning: +4.35% over CNN on CIFAR-10 at 60% lower energy — targeting neuromorphic edge AI applications.

Honors & Awards

CVPR 2025 Publication First author · VISTA-CLIP · Continual Panoptic Segmentation
2025
Kotak IISc Pre-Doctoral Fellowship Competitive AI-ML research grant at Indian Institute of Science
2024
Qualcomm AI Hackathon — Finalist RAG system with 85% retrieval accuracy
2025
Times Invest BLR — Top 10 AI Startups Physi.Fit recognized among top AI startups in Bangalore
2023
IIT Bombay Robotics Cup — Runner-Up National robotics competition
2022
VMware Hackathon — National Finalist
2023
PESU Venture Labs × ACM-W Hackathon — Winner
2021
U.S. Patent Granted AI-based rehabilitation monitoring system · PTO #18/352,300
2023

Technical

Vision & ML Computer Vision, Continual Learning, Embodied AI, VLMs, Multimodal Learning, RL (GRPO, PPO), Panoptic Segmentation
Frameworks PyTorch, JAX, TensorFlow, LangChain, LlamaIndex, CUDA
Languages Python, C/C++, Java, SQL, Shell
Infra Distributed Training, AWS, GCP, Docker, Kubernetes