# An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits

@article{Takemura2019AnAR, title={An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits}, author={Kei Takemura and Shinji Ito}, journal={2019 IEEE International Conference on Data Mining (ICDM)}, year={2019}, pages={1318-1323} }

Combinatorial linear semi-bandits (CLS) are widely applicable frameworks of sequential decision-making, in which a learner chooses a subset of arms from a given set of arms associated with feature vectors. Existing algorithms work poorly for the clustered case, in which the feature vectors form several large clusters. This shortcoming is critical in practice because it can be found in many applications, including recommender systems. In this paper, we clarify why such a shortcoming occurs, and… Expand

#### One Citation

Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

- Mathematics, Computer Science
- AAAI
- 2021

This paper shows that the CUCB algorithm proposed by Qin, Chen, and Zhu (2014) has the optimal regret bound Õ(d √ kT + dk) for the partition matroid constraints and proposes an algorithm that modifies the reward estimates of arms in the C UCB algorithm and demonstrates that it enjoys the optimal regrets bound. Expand

#### References

SHOWING 1-10 OF 25 REFERENCES

Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation

- Computer Science
- AAAI
- 2017

By the adaptation of a minimumcost maximum-flow network, a practical algorithm based on Thompson sampling is derived for the (contextual) combinatorial problem, thus resolving the problem of computational intractability. Expand

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

- Computer Science, Mathematics
- ICML
- 2015

This paper considers efficient learning in large-scale combinatorial semi-bandits with linear generalization, and proposes two learning algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) and CombLinUCB, which are computationally efficient and provably statistically efficient under reasonable assumptions. Expand

Thompson Sampling for Contextual Bandits with Linear Payoffs

- Computer Science, Mathematics
- ICML
- 2013

A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed. Expand

Stochastic Linear Optimization under Bandit Feedback

- Mathematics, Computer Science
- COLT
- 2008

A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented. Expand

Linear Thompson Sampling Revisited

- Computer Science, Mathematics
- AISTATS
- 2017

Thompson sampling can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach. Expand

Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation

- Computer Science
- SDM
- 2014

Experiments conducted on real-wold movie recommendation dataset demonstrate that the principled approach called contextual combinatorial bandit can effectively address the above challenges and hence improve the performance of recommendation task. Expand

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

- Computer Science, Mathematics
- COLT
- 2012

For the first time, it is shown that Thompson Sampling algorithm achieves logarithmic expected regret for the stochastic multi-armed bandit problem. Expand

Improved Algorithms for Linear Stochastic Bandits

- Computer Science, Mathematics
- NIPS
- 2011

A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement. Expand

Contextual Bandits with Linear Payoff Functions

- Mathematics, Computer Science
- AISTATS
- 2011

An O (√ Td ln (KT ln(T )/δ) ) regret bound is proved that holds with probability 1− δ for the simplest known upper confidence bound algorithm for this problem. Expand

Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations

- Mathematics, Computer Science
- IEEE/ACM Transactions on Networking
- 2012

New efficient policies are shown to achieve regret that grows logarithmically with time, and polynomially in the number of unknown variables, for this combinatorial multi-armed bandit problem. Expand