Simulating Bandit Learning from User Feedback for Extractive Question Answering / Gao et al. 2022