Contextual Bandits with Cross-learning

Balseiro, Santiago; Golrezaei, Negin; Mahdian, Mohammad; Mirrokni, Vahab; Schneider, Jon

Computer Science > Machine Learning

arXiv:1809.09582v2 (cs)

[Submitted on 25 Sep 2018 (v1), revised 3 Jan 2020 (this version, v2), latest version 15 Nov 2021 (v3)]

Title:Contextual Bandits with Cross-learning

Authors:Santiago Balseiro, Negin Golrezaei, Mohammad Mahdian, Vahab Mirrokni, Jon Schneider

View PDF

Abstract:In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $a$ to perform, and receives some reward $r_{a,t}(c)$. We consider the variant of this problem where in addition to receiving the reward $r_{a,t}(c)$, the learner also learns the values of $r_{a,t}(c')$ for all other contexts $c'$; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions (in this setting the context is the decision maker's private valuation for each auction). We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve $\tilde{O}(\sqrt{CKT})$ regret against all stationary policies, where $C$ is the number of contexts, $K$ the number of actions, and $T$ the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on $C$ and achieve regret $O(\sqrt{KT})$ (when contexts are stochastic with known distribution), $\tilde{O}(K^{1/3}T^{2/3})$ (when contexts are stochastic with unknown distribution), and $\tilde{O}(\sqrt{KT})$ (when contexts are adversarial but rewards are stochastic).

Comments:	48 pages, 5 figures
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1809.09582 [cs.LG]
	(or arXiv:1809.09582v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1809.09582

Submission history

From: Negin Golrezaei [view email]
[v1] Tue, 25 Sep 2018 16:40:44 UTC (16 KB)
[v2] Fri, 3 Jan 2020 23:04:28 UTC (1,682 KB)
[v3] Mon, 15 Nov 2021 19:22:44 UTC (1,779 KB)

Computer Science > Machine Learning

Title:Contextual Bandits with Cross-learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Contextual Bandits with Cross-learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators