Abstract:
Many constrained control problems in queueing and scheduling admit elegant structures, yet reinforcement learning methods rarely exploit them. In this talk, I will present a framework for structured reinforcement learning that uncovers and leverages the threshold form of optimal policies in constrained Markov decision processes. Using this insight, we design a natural policy-gradient algorithm that provably converges to the global optimum while learning only a single scalar per queue. I will illustrate the framework through dynamic prioritization in wireless streaming systems, showing how theory-driven structure enables both faster learning and real-time deployment.
Archana Bura, PostDoctoral Researcher
Department of Electrical and Computer Engineering, UC San Diego
More information about the speaker: Archana Bura – Google Scholar
Email informed-ai@bristol.ac.uk to register your interest.