About me

I am a researcher and engineer working on efficient AI, with a particular focus on sparse attention in long-context inference. My current work is grounded in a theoretical background built over fifteen years of research in networking, scheduling, and distributed optimization.

Since the rise of large language models, I have shifted my main research effort toward AI. What carried over from my earlier work is not only mathematical technique, but also a way of thinking about efficiency, constraints, tradeoffs, and deployable system design. I am especially interested in problems where theory and engineering need to reinforce each other.

My earlier research centered on networking theory. That experience continues to shape how I approach modern AI problems: how to reason about limited memory, long context, sparse computation, and performance bottlenecks in systems that must work at scale.

I see myself as both a researcher and an engineer. In the LLM era, good ideas matter, but I think the ability to quickly and efficiently implement, test, and scale them matters just as much. A central theme of my work is turning theoretical structure into mechanisms that are useful in real AI systems.