Publications

For the complete list, please see my Google Scholar Profile.

Preprints

  1. ArXiv
    POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference
    Aditya K Kamath, Ramya Prabhu, Jayashree Mohan, Simon Peter, Ramachandran Ramjee, and Ashish Panwar
    arXiv preprint arXiv:2410.18038, Preprints

2025

  1. ArXiv
    vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
    Ramya Prabhu, Ajay Nayak, Jayashree Mohan, Ramachandran Ramjee, and Ashish Panwar
    ACM International Conference on Architectural Support for Programming Languages and Operating Systems 2025, 2025