Header and Body 3


This technique, called “apprenticeship scheduling,” captures this domain knowledge in the form of a scheduling policy. Its objective is to learn scheduling policies through expert demonstration and validate that schedules produced by these policies are of comparable quality to those generated by human or synthetic experts. This approach efficiently utilizes domain-expert demonstrations without the need to train within an environment emulator. Rather than explicitly modeling a reward function and relying upon dynamic programming or constraint solvers, which become computationally unfeasible for large-scale problems of interest, they use action-driven learning to extract the strategies of domain experts in order to efficiently schedule tasks.

This approach uses pairwise comparisons between the actions taken (e.g. schedule agent a to complete task Ti  at time t) to learn the relevant model parameters and scheduling policies demonstrated by the training examples. The approach is validated using both a synthetic data set of solutions for a variety of scheduling problems and a real-world data set of demonstrations from human experts solving a variant of the weapon-to-target assignment problem. The synthetic and real-world problem domains used to empirically validate the approach represent two of the most challenging classes within a well-established class taxonomy.