Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
Published in Proceedings of the 18th ACM Conference on Recommender Systemse, 2024
We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB).
Recommended citation: Shimizu, Tatsuhiro, Koichi Tanaka, Ren Kishimoto, Haruka Kiyohara, Masahiro Nomura, and Yuta Saito. "Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits." In Proceedings of the 18th ACM Conference on Recommender Systems, pp. 733-741. 2024. http://tatsu432.github.io/files/OPCB.pdf