Learning with Abandonment

Consider a demand response provider that wants to learn a personalized policy for each user, but the platform faces the risk of a user abandoning the platform if she is dissatisfied with the actions of the platform. For example, the platform will want to personalize the thermostat control for the user, but faces the risk that the user unsubscribes forever if they are mistreated. We propose a general thresholded learning model for scenarios like this, and discuss the structure of optimal policies.