Y. Abbasi-yadkori, D. Pal, and C. Szepesvari, Improved algorithms for linear stochastic bandits, NIPS, 2011.

A. Agarwal, D. P. Foster, D. J. Hsu, S. M. Kakade, and A. Rakhlin, Stochastic convex optimization with bandit feedback, NIPS, pp.1035-1043, 2011.

R. , The continuum-armed bandit problem, SIAM J. Control Optim, vol.33, issue.6, pp.1926-1951, 1995.

S. Agrawal and N. Goyal, Thompson sampling for contextual bandits with linear payoffs, ICML, 2013.

B. Awerbuch and R. Kleinberg, Online linear optimization and adaptive routing, J. Comput. Syst. Sci, vol.74, issue.1, pp.97-114, 2008.

S. Bubeck and N. Cesa-bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, vol.5, pp.1-122, 2012.

A. Burnetas and M. Katehakis, Optimal adaptive policies for sequential allocation problems, Advances in Applied Mathematics, vol.17, issue.2, pp.122-142, 1996.

A. Carpentier and M. Valko, Revealing graph bandits for maximizing local influence, AISTATS, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01304020

N. Cesa-bianchi and G. Lugosi, Combinatorial bandits, J. Comput. Syst. Sci, vol.78, issue.5, pp.1404-1422, 2012.

W. Chen, Y. Wang, and Y. Yuan, Combinatorial multi-armed bandit: General framework and applications, ICML, 2013.

R. Combes, S. Magureanu, A. Proutiere, and C. Laroche, Learning to rank: Regret lower bound and efficient algorithms, SIGMETRICS, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01257894

R. Combes and A. Proutiere, Unimodal bandits: Regret lower bounds and optimal algorithms, ICML, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01092662

R. Combes, S. Talebi, A. Proutiere, and M. Lelarge, Combinatorial bandits revisited, NIPS, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01257796

V. Dani, T. Hayes, and S. Kakade, Stochastic linear optimization under bandit feedback, COLT, 2008.

A. Durand and C. Gagné, Thompson sampling for combinatorial bandits and its application to online feature selection, Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014.

S. Filippi, O. Cappe, A. Garivier, and C. Szepesvári, Parametric bandits: The generalized linear case, NIPS, pp.586-594, 2010.

Y. Gai, B. Krishnamachari, and R. Jain, Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations, IEEE/ACM Trans. on Networking, vol.20, issue.5, pp.1466-1478, 2012.

A. Garivier and O. Cappé, The KL-UCB algorithm for bounded stochastic bandits and beyond, COLT, 2011.

K. Glashoff and S. Gustafson, Linear Optimization and Approximation, 1983.

A. Gopalan, S. Mannor, and Y. Mansour, Thompson sampling for complex online problems, ICML, 2014.

T. L. Graves and T. L. Lai, Asymptotically efficient adaptive choice of control laws in controlled markov chains, SIAM J. Control and Optimization, vol.35, issue.3, pp.715-743, 1997.

A. György, T. Linder, G. Lugosi, and G. Ottucsák, The on-line shortest path problem under partial monitoring, Journal of Machine Learning Research, vol.8, issue.10, 2007.

U. Herkenrath, The n-armed bandit with unimodal structure, Metrika, vol.30, issue.1, pp.195-210, 1983.

J. Honda and A. Takemura, An asymptotically optimal bandit algorithm for bounded support models, COLT, 2010.

E. Kaufmann, O. Cappé, and A. Garivier, On the complexity of best-arm identification in multi-armed bandit models, Journal of Machine Learning Research, vol.17, issue.1, pp.1-42, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01024894

E. Kaufmann, N. Korda, and R. Munos, Thompson sampling: An asymptotically optimal finite-time analysis, ALT, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00830033

J. Komiyama, J. Honda, H. Kashima, and H. Nakagawa, Regret lower bound and optimal algorithm in dueling bandit problem, COLT, 2015.

B. Kveton, Z. Wen, A. Ashkan, and C. Szepesvari, Cascading bandits: Learning to rank in the cascade model, NIPS, 2015.

B. Kveton, Z. Wen, A. Ashkan, and C. Szepesvari, Tight regret bounds for stochastic combinatorial semi-bandits, AISTATS, 2015.

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.

T. Lattimore and C. Szepesvari, The end of optimism? an asymptotic analysis of finite-armed linear bandits, 2016.

S. Magureanu, R. Combes, and A. Proutiere, Lipschitz bandits: Regret lower bounds and optimal algorithms. COLT, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01092791

H. Robbins, Some aspects of the sequential design of experiments, Herbert Robbins Selected Papers, pp.169-177, 1985.

P. Rusmevichientong and J. Tsitsiklis, Linearly parameterized bandits, Math. Oper. Res, vol.35, issue.2, 2010.

Z. Wen, A. Ashkan, H. Eydgahi, and B. Kveton, Efficient learning in large-scale combinatorial semi-bandits, ICML, 2015.

J. Yu and S. Mannor, Unimodal bandits, ICML, 2011.