Leshu Li - Passionate Researcher | Hardware-Software Co-design for Efficient AI

About Me

Welcome! I am a first-year Ph.D. student in Computer Science at New York University's Courant Institute, advised by Prof. Sai Qian Zhang. My research interests lie at the intersection of computer architecture and machine learning systems. I focus on Hardware–Software Co-design for Efficient AI. I am passionate about advancing efficient AI across algorithms, architectures, and hardware. Before joining NYU, I had the privilege of working with Prof. Yang (Katie) Zhao at the University of Minnesota, and Prof. Yingyan (Celine) Lin at Georgia Tech.

Education

New York University, Courant Institute
Ph.D. in Computer Science
University of Minnesota, Twin Cities
M.S. in Electrical and Computer Engineering
Sichuan University
B.Eng. in Telecommunications Engineering

Research Interests

Efficient AI
Hardware-Software Co-design
Computer Architecture

Publications & Manuscripts

RTGS: Real-Time 3D Gaussian Splatting SLAM via Multi-Level Redundancy Reduction

L. Li*, J. Qin*, P. Jie, Z. Wan, H. Qu, Y. Han, P. Zhen, H. Zhang, Y. Cao, T. Cheng, Y. Zhao

58th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2025 Architecture Algorithm

Real-time 3D Gaussian Splatting SLAM system with multi-level redundancy reduction for edge devices.

Pocket-SLAM: Rendering-Area-Aware Pruning for Memory-Efficient 3DGS-SLAM

L. Li, P. Jie, Y. Zhao

International Conference on Robotics and Automation (ICRA), 2026 Algorithm

Memory-efficient 3DGS-SLAM with rendering-area-aware pruning for large-scale deployment.

Gaussian Blending Unit: An Edge GPU Plug-in for Real-Time Gaussian-Based Rendering in AR/VR

Z. Ye, Y. Fu, J. Zhang, L. Li, Y. Zhang, S. Li, C. Wan, C. Wan, C. Li, S. Prathipati, Y. Lin

31st IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2025 Architecture Algorithm

Dedicated hardware module for efficient Gaussian-based rendering on edge devices for AR/VR applications.

3D Gaussian Rendering Can Be Sparser: Efficient Rendering via Learned Fragment Pruning

Z. Ye, C. Wan, C. Li, J. Hong, S. Li, L. Li, Y. Zhang, Y. C. Lin

Conference on Neural Information Processing Systems (NeurIPS), 2024 Algorithm

Efficient 3D Gaussian rendering through learned fragment pruning techniques.

LAMB: A Training-Free Method to Enhance the Long-Context Understanding of SSMs via Attention-Guided Token Filtering

Z. Ye, Z. Wang, K. Xia, J. Hong, L. Li, L. Whalen, C. Wan, Y. Fu, Y. C. Lin, S. Kundu

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL), 2025 Algorithm

Training-free method for enhancing long-context understanding in State Space Models.

DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

L. Li, F. Yu, B. McDanel, S. Q. Zhang

Machine Learning and Systems (MLSys), 2026 (Under Review) System

Distributed speculative decoding approach for efficient large model serving across edge and cloud infrastructure.

Experience

Research Experience

Research Assistant

Zhao Lab, University of Minnesota | Jun. 2024 – Present

Advisor: Prof. Yang (Kaite) Zhao | Minneapolis, MN, USA

Conducting research on optimizing 3DGS-based SLAM for edge devices
Developing memory-efficient 3DGS SLAM for large-scale outdoor deployment

Research Intern

EIC Lab, Georgia Institute of Technology | Mar. 2024 – May 2025

Advisor: Prof. Yingyan (Celine) Lin | Atlanta, GA, USA

Improved 3DGS rendering performance on edge devices
Enhanced Diffusion Large Language Models (dLLM & Llada)

Research Intern

Sai Lab, New York University | Aug. 2025 – Present

Advisors: Prof. Sai Qian Zhang & Prof. Bradley McDanel | New York, NY, USA

Designing speculative distributed LLM system for efficient inference
Efficient generative AI in drug delivery with human-in-the-loop strategies

Industry Experience

LLM Pre-training Intern

REDstar@hi Lab, Xiaohongshu (REDnote) | Sep. 2025 – Present

Shanghai, China

Collaborating with Infra team on training and inference bottlenecks
Exploring efficient scaling strategies and GPU-friendly architectures
Investigating Attention, MoE, and optimizer strategies using Megatron and FSDP

Curriculum Vitae

You can download my full CV in PDF format below:

Download CV (PDF)

Contact

Feel free to reach out if you're interested in collaboration or have any questions about my research.

Email

lileshu0412@gmail.com

Location

University of Minnesota, Twin Cities
Minneapolis, MN, USA

Links

Please reach out via email or connect on social media below