搜索结果: 1-3 共查到“管理学 Regret Bounds”相关记录3条 . 查询时间(0.093 秒)
Regret Bounds for Reinforcement Learning with Policy Advice
Regret Bounds Reinforcement LearningPolicy Advice
2013/6/13
In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with p...
Further Optimal Regret Bounds for Thompson Sampling
Further Optimal Regret Bounds Thompson Sampling
2012/11/23
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several s...
We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after $T$ step...