This invention describes a system for allocating resources according to specified time and labor constraints. Nearly every profession has a need for optimized resource allocation, making this technology applicable for a wide variety of fields including healthcare, manufacturing and military engagements.
Coordinating agents to complete a set of tasks with temporal and resource constraints is a challenging problem requiring human domain experts to employ knowledge paradigms learned through years of apprenticeship. A process to manually codify this domain knowledge within a computational framework is necessary to scale beyond the “single-expert, single-trainee” apprenticeship model. However, human domain experts often have difficulty describing their decision-making processes, causing the codification of this knowledge to become laborious. The Inventors have developed a new approach to capture domain-expert heuristics through a pairwise ranking formulation that accurately learns multifaceted heuristics on both synthetic and real world data sets.
This technique, called “apprenticeship scheduling,” captures this domain knowledge in the form of a scheduling policy. Its objective is to learn scheduling policies through expert demonstration and validate that schedules produced by these policies are of comparable quality to those generated by human or synthetic experts. This approach efficiently utilizes domain-expert demonstrations without the need to train within an environment emulator. Rather than explicitly modeling a reward function and relying upon dynamic programming or constraint solvers, which become computationally unfeasible for large-scale problems of interest, they use action-driven learning to extract the strategies of domain experts in order to efficiently schedule tasks.
This approach uses pairwise comparisons between the actions taken (e.g. schedule agent a to complete task Ti at time t) to learn the relevant model parameters and scheduling policies demonstrated by the training examples. The approach is validated using both a synthetic data set of solutions for a variety of scheduling problems and a real-world data set of demonstrations from human experts solving a variant of the weapon-to-target assignment problem. The synthetic and real-world problem domains used to empirically validate the approach represent two of the most challenging classes within a well-established class taxonomy.
- Approach allows human decision-making heuristics to be applied to problems that expand beyond a one-on-one apprenticeship model
- Model-free approach does not require enumerating or iterating through a large state-space
- Approach can be trained to solve scheduling problems on both synthetic and real-world data sets