ICAPS-2019

Download the ICAPS 2019 Poster Proceedings New Certificate of Attendance New

Personalized Medication and Activity Planning in PDDL+ • Fares K. Alaboud, Andrew Coles The emergence of planners capable of reasoning with continuous dynamics, as expressed in PDDL+, has increased the range of problems that fall within the capabilities of PDDL planners. One such problem is planning patients’ activities and medication regimes, considering non-linear medication pharmacokinetics. In this paper we explore the application of contemporary PDDL+ planners to this problem. To address their performance limitations, we present a linearize–validate cycle; tasks are solved by iterative refinement of a linear approximation of the domain, solved by a linear planner, then validated at each stage against the full non-linear semantics. In doing this we allow this domain to fall within the capabilities of current planners; and in our evaluation we use OPTIC to demonstrate this. Temporal Brittleness Analysis of Task Networks for Planetary Rovers • Tiago Stegun Vaquero, Steve Chien, Jagriti Agrawal, Wayne Chi, Terrance Huntsberger We propose a new method to analyze the temporal brittleness of task networks, which allows the detection and enumeration of activities that, with modest task execution duration variation make the execution of the task network dynamically uncontrollable. In this method, we introduce a metric for measuring an activity brittleness – defined as the degree of acceptable deviation of its nominal duration – and describe how that measurement is mapped to task network structure. Complementary to existing work on plan robustness analysis which informs how likely a task network is to succeed or not, the proposed analysis and metric go deeper to pinpoint the sources of potential brittleness due to temporal constraints and to focus either human designers and/or automated task network generators (e.g. scheduler/planners) to address sources of undesirable brittleness. We apply the approach to a set of task networks (called sol types) in development for NASA’s next planetary rover and present common patterns that are sources of brittleness. These techniques are currently under evaluation for potential use supporting operations of the Mars 2020 rover. Optimizing Parameters for Uncertain Execution and Rescheduling Robustness • Wayne Chi, Jagriti Agrawal, Steve Chien, Elyse Fosse, Usha Guduri We describe the application of using Monte Carlo simulation to optimize a schedule for both execution and rescheduling robustness and activity score in the face of execution uncertainties. We apply these techniques to the problem of optimizing a schedule for a planetary rover with very limited onboard computation. We search in the activity input parameter space where a) the onboard scheduler is a one shot non-backtracking scheduler in which once an activity is placed it is never moved or deleted and b) the activity priority determines the order in which activities are considered for placement in the schedule. We show that simulation driven search for activity parameters outperforms alternative schemes that use static priority assignment. Our approach can be viewed as using simulation feedback to determine problem specific heuristics much like squeaky wheel optimization. These techniques are currently baselined for use in the ground operations of NASA's next planetary rover, the Mars 2020 rover. The Clustered Dial-a-Ride Problem • Fabian Feitsch, Sabine Storandt We study a generalization of the classical dial-a-ride problem, with an application to public transport planning in rural areas. In the classical dial-a-ride problem, n users each specify a pick-up and a delivery location, and the aim is to plan the least cost route to cater all requests. This can be modeled as a traveling salesmen problem in a complete graph with precedence constraints (pick-ups need to happen before deliveries). In this paper, we consider the clustered dial-a-ride problem, where we do not operate on a complete graph but on a graph composed of serially numbered cliques where each clique is connected to the next one via a single edge. This setting is inspired by door-to-door transportation for people from remote villages who want to get to another village or the next city by a bus which operates on demand. We argue that in case the optimal route exhibits certain structural properties, it can be computed significantly faster. To make use of this observation, we devise a classification algorithm which can decide whether the optimal route exhibits these structural properties before computing it. Extensive experiments on artificial and real-world instances reveal that the majority of optimal routes indeed have the desired properties and that our classifier is an efficient tool to recognize the respective instances. Exact Methods for Extended Rotating Workforce Scheduling Problems • Lucas Kletzander, Nysret Musliu, Johannes Gärtner, Werner Schafhauser, Thomas Krennwallner In many professions daily demand for different shifts varies during the week. The rotating workforce scheduling problem deals with the creation of repeating schedules for such demand and is therefore of high practical relevance. This paper investigates solving this real-life problem with several new practically relevant features. This includes early recognition of certain infeasibility criteria, complex rest time constraints regarding weekly rest time, and optimization goals to deal with optimal assignments of free weekends. We introduce a state-of-the-art constraint model and evaluate it with different extensions. The evaluation shows that many real-life instances can be solved to optimality using a constraint solver. Our approach is under deployment in a state-of-the-art commercial solver for rotating workforce scheduling. Towards Automating Crime Prevention through Environmental Design (CPTED) Analysis to Predict Burglary • Leanne Monchuk, Simon Parkinson, James Kitchen The design of the built environment (such as housing developments, street networks) can increase the opportunity for crime and disorder to occur. For example, a housing development with poor surveillance can provide an opportunity for offenders to commit residential burglary and avoid detection. Crime Prevention through Environmental Design (CPTED) aims to reduce crime and disorder through the design and manipulation of the built environment. The police typically play an important role in the delivery and application of CPTED by assessing planning applications, identifying design features that may provide an opportunity for crime and offering remedial advice. In England and Wales, it is common practice for police specialists – Designing out Crime Officers (DOCOs) – to review architectural site plans during the planning process. However, owing to significant cuts to policing budgets, the number of DOCOs in post is reducing whilst the demand for new housing is on the increase. In this novel work, it is demonstrated that key knowledge about the opportunities for crime and disorder within the built environment can be elicited from a purposive sample of 28 experienced DOCOs, encoded in a domain model and utilised by Automated Planning techniques to automatically assess architectural site plans for future crime risk. ZAC: A Zone pAth Construction Approach for Effective Real-Time Ride Sharing • Meghna Lowalekar, Pradeep Varakantham, Patrick Jaillet · UTRC Best Applications Paper Award Real-time ridesharing systems such as UberPool, Lyft Line, GrabShare have become hugely popular as they reduce the costs for customers, improve per trip revenue for drivers and reduce traffic on the roads by grouping customers with similar itineraries. The key challenge in these systems is to group the right requests to travel in available vehicles in real time, so that the objective (e.g., requests served, revenue or delay) is optimized. The most relevant existing work has focused on generating as many relevant feasible (with respect to available delay for customers) combinations of requests (referred to as trips) as possible in real-time. Since the number of trips increases exponentially with the increase in vehicle capacity and number of requests, unfortunately, such an approach has to employ ad hoc heuristics to identify relevant trips. To that end, we propose an approach that generates many zone (abstraction of individual locations) paths – where each zone path can represent multiple trips (combinations of requests) – and assigns available vehicles to these zone paths to optimize the objective. The key advantage of our approach is that these zone paths are generated using a combination of offline and online methods, consequently allowing for the generation of many more relevant combinations in real-time than competing approaches. We demonstrate that our approach outperforms (with respect to both objective and runtime) the current best approach for ridesharing on both real world and synthetic datasets. Reinforcement Learning Based Querying in Camera Networks for Efficient Target Tracking • Anil Sharma, Saket Anand, Sanjit Kaul Surveillance camera networks are a useful monitoring infrastructure that can be used for various visual analytics applications, where high-level inferences and predictions could be made based on target tracking across the network. Most multi-camera tracking works focus on re-identification problems and trajectory association problems. However, as camera networks grow in size, the volume of data generated is humongous, and scalable processing of this data is imperative for deploying practical solutions. In this paper, we address the largely overlooked problem of scheduling cameras for processing by selecting one where the target is most likely to appear next. The inter-camera handover can then be performed on the selected cameras via re-identification or another target association technique. We model this scheduling problem using reinforcement learning and learn the camera selection policy using Q-learning. We do not assume the knowledge of the camera network topology but we observe that the resulting policy implicitly learns it. We evaluate our approach using NLPR MCT dataset, which is a real multi-camera multi-target tracking benchmark and show that the proposed policy substantially reduces the number of frames required to be processed at the cost of a small reduction in recall. Solution Approaches for an Automotive Paint Shop Scheduling Problem • Felix Winter, Emir Demirović, Nysret Musliu, Christoph Mrkvicka In the paint shops of the automotive supply industry a large number of synthetic material pieces need to be painted every day to provide the large variety of items required for car manufacturing. Because of the sophisticated automated production process and the tight due dates requested by car manufacturers, finding an optimized production schedule becomes a challenging task that is at the present time performed by multiple human planners. In this paper we formulate and solve a novel real life paint shop scheduling problem from the automotive supply industry which introduces unique constraints and objectives that do not appear in the existing literature. Additionally, we provide a new collection of benchmark instances based on real life planning scenarios that can be used to evaluate solution techniques for the problem. Exact methods are only able to solve few of the smaller instances in reasonable running time. Therefore, we propose a metaheuristic method based on local search that uses novel neighborhood relations and various ways to escape local optima. Our approach is able to provide feasible solutions for all instances within reasonable running time. Mixed Integer Programming versus Evolutionary Computation for Optimizing a Hard Real-World Staff Assignment Problem • Jannik Peters, Daniel Stephan, Isabel Amon, Hans Gawendowicz, Julius Lischeid, Lennart Salabarria, Jonas Umland, Felix Werner, Martin S. Krejca, Ralf Rothenberger, Timo Kötzing, Tobias Friedrich Assigning staff to engagements according to hard constraints while optimizing several objectives is a task encountered by many companies on a regular basis. Simplified versions of such assignment problems are NP-hard. Despite this, a typical approach to solving them consists of formulating them as mixed integer programming (MIP) problems and using a state-of-the-art solver to get solutions that closely approximate the optimum. In this paper, we consider a complex real-world staff assignment problem encountered by the professional service company KPMG, with the goal of finding an algorithm that solves it faster and with a better solution than a commercial MIP solver. We follow the evolutionary algorithm (EA) metaheuristic and design a search heuristic which iteratively improves a solution using domain-specific mutation operators. Furthermore, we use a flow algorithm to optimally solve a subproblem, which tremendously reduces the search space for the EA. For our real-world instance of the assignment problem, given the same total time budget of 100 hours, a parallel EA approach finds a solution that is only 1.7 % away from an upper bound for the (unknown) optimum within under five hours, while the MIP solver Gurobi still has a gap of 10.5 %.

Towards Stable Symbol Grounding with Zero-Suppressed State AutoEncoder • Masataro Asai, Hiroshi Kajino While classical planning has been an active branch of AI, its applicability is limited to the tasks precisely modeled by humans. Fully automated high-level agents should be instead able to find a symbolic representation of an unknown environment without supervision, otherwise it exhibits the knowledge acquisition bottleneck. Meanwhile, Latplan (Asai and Fukunaga 2018) partially resolves the bottleneck with a neural network called State AutoEncoder (SAE). SAE obtains the propositional representation of the image-based puzzle domains with unsupervised learning, generates a state space and performs classical planning. In this paper, we identify the problematic, stochastic behavior of the SAE-produced propositions as a new sub-problem of symbol grounding problem, the symbol stability problem. Informally, symbols are stable when their referents (e.g. propositional values) do not change against small perturbation of the observation, and unstable symbols are harmful for symbolic reasoning. We analyze the problem in Latplan both formally and empirically, and propose “Zero-Suppressed SAE”, an enhancement that stabilizes the propositions. We show that it finds the more stable propositions and the more compact representations, resulting in an improved success rate of Latplan. It is robust against various hyperparameters and eases the tuning effort, and also provides a weight pruning capability as a side effect. Deep Policies for Width-Based Planning in Pixel Domains • Miquel Junyent, Anders Jonsson, Vicenç Gómez Width-based planning has demonstrated great success in recent years due to its ability to scale independently of the size of the state space. For example, Bandres et al. (2018) introduced a rollout version of the Iterated Width algorithm whose performance compares well with humans and learning methods in the pixel setting of the Atari games suite. In this setting, planning is done on-line using the “screen” states and selecting actions by looking ahead into the future. However, this algorithm is purely exploratory and does not leverage past reward information. Furthermore, it requires the state to be factored into features that need to be pre-defined for the particular task, e.g., the B-PROST pixel features. In this work, we extend width-based planning by incorporating an explicit policy in the action selection mechanism. Our method, called π-IW, interleaves width-based planning and policy learning using the state-actions visited by the planner. The policy estimate takes the form of a neural network and is in turn used to guide the planning step, thus reinforcing promising paths. Surprisingly, we observe that the representation learned by the neural network can be used as a feature space for the width-based planner without degrading its performance, thus removing the requirement of pre-defined features for the planner. We compare π-IW with previous width-based methods and with AlphaZero in simple environments and show that π-IW has superior performance. We also show that our proposed algorithm outperforms previous width-based methods in the pixel setting of Atari games suite. Unsupervised Grounding of Plannable First-Order Logic Representation from Images • Masataro Asai Recently, there is an increasing interest in obtaining the relational structures of the environment in the Reinforcement Learning community. However, the resulting “relations” are not the discrete, logical predicates compatible to the symbolic reasoning such as classical planning or goal recognition. Meanwhile, Latplan (Asai and Fukunaga 2018) bridged the gap between deep-learning perceptual systems and symbolic classical planners. One key component of the system is a Neural Network called State AutoEncoder (SAE), which encodes an image-based input into a propositional representation compatible to classical planning. To get the best of both worlds, we propose First-Order State AutoEncoder, an unsupervised architecture for grounding the first-order logic predicates. Each predicate models a relationship between objects by taking the interpretable arguments and returning a propositional value. In the experiment using 8-Puzzle and a photorealistic Blocksworld environment, we show that (1) the resulting predicates capture the interpretable relations (e.g. spatial), (2) they help obtaining the compact, abstract model of the environment, and finally, (3) the resulting model is compatible to symbolic classical planning. Resource Constrained Deep Reinforcement Learning • Abhinav Bhatia, Pradeep Varakantham, Akshat Kumar In urban environments, supply resources have to be constantly matched to the ""right"" locations (where customer demand is present) so as to improve quality of life. For instance, ambulances have to be matched to base stations regularly so as to reduce response time for emergency incidents in EMS (Emergency Management Systems); vehicles (cars, bikes, scooters etc.) have to be matched to docking stations so as to reduce lost demand in shared mobility systems. Such problem domains are challenging owing to the demand uncertainty, combinatorial action spaces (due to allocation) and constraints on allocation of resources (e.g., resource capacity, minimum and maximum number of resources at locations and regions). Existing systems typically employ myopic and greedy optimization approaches to optimize allocation of supply resources to locations. Such approaches typically are unable to handle surges or variances in demand patterns well. Recent research has demonstrated the ability of Deep RL methods in adapting well to highly uncertain environments. However, existing Deep RL methods are unable to handle combinatorial action spaces and constraints on allocation of resources. To that end, we have developed three approaches on top of the well known actor critic approach, DDPG (Deep Deterministic Policy Gradient) that are able to handle constraints on resource allocation. More importantly, we demonstrate that they are able to outperform leading approaches on simulators validated on semi-real and real data sets. Fast Feature Selection for Linear Value Function Approximation • Bahram Behzadian, Soheil Gharatappeh, Marek Petrik Linear value function approximation is a standard approach to solving reinforcement learning problems with large state spaces. Since designing good approximation features is difficult, automatic feature selection is an important research topic. We propose a new method for feature selection, which is based on a low-rank factorization of the transition matrix. Our approach derives features directly from high-dimensional raw inputs, such as image data. The method is easy to implement using SVD, and our experiments show that it is faster and more stable than alternative methods. Learning Interpretable Models Expressed in Linear Temporal Logic • Alberto Camacho, Sheila A. McIlraith We examine the problem of learning models that characterize the high-level behaviour of a system based on observation traces. Our aim is to develop models that are human interpretable. To this end, we introduce the problem of learning a Linear Temporal Logic (LTL) formula that parsimoniously captures a given set of positive and negative example traces. Our approach to learning LTL exploits a symbolic state representation, searching through a space of labeled skeleton formulae to construct an alternating automaton that models observed behaviour, from which the LTL can be read off. Construction of interpretable behaviour models is central to a diversity of applications related to planning and plan recognition. We showcase the relevance and significance of our work in the context of behaviour description and discrimination: i) active learning of a human-interpretable behaviour model that describes observed examples obtained by interaction with an oracle ii) passive learning of a classifier that discriminates individual agents, based on the human-interpretable signature way in which they perform particular tasks. Experiments demonstrate the effectiveness of our symbolic model learning approach in providing human-interpretable models and classifiers from reduced example sets. Entropy based Independent Learning in Anonymous Multi-Agent Settings • Tanvi Verma, Pradeep Varakantham, Hoong Chuin Lau Efficient sequential matching of supply and demand is a problem of interest in many online to offline services. For instance, Uber, Lyft, Grab for matching taxis to customers; Ubereats, Deliveroo, FoodPanda etc for matching restaurants to customers. In these online to offline service problems, individuals who are responsible for supply (e.g., taxi drivers, delivery bikes or delivery van drivers) earn more by being at the ”right” place at the ”right” time. We are interested in developing approaches that learn to guide individuals to be in the ”right” place at the ”right” time (to maximize revenue) in the presence of other similar ”learning” individuals and only local aggregated observation of other agents states (e.g., only number of other taxis in same zone as current agent). Existing approaches in Multi-Agent Reinforcement Learning (MARL) are either not scalable (e.g., about 40000 taxis/cars for a city like Singapore) or assumptions of common objective (or action coordination) or centralized learning are not viable. A key characteristic of the domains of interest is that the interactions between individuals are anonymous, i.e., the outcome of an interaction (competing for demand) is dependent only on the number and not on the identity of the agents. We model these problems using the Anonymous MARL (AyMARL) model. To ensure scalability and individual learning, we focus on improving performance of independent reinforcement learning methods, specifically Deep Q-Networks (DQN) and Advantage Actor Critic (A2C) for AyMARL. The key contribution of this paper is in employing principle of maximum entropy to provide a general framework of independent learning that is both empirically effective (even with only local aggregated information of agent state distribution) and theoretically justified. Finally, our approaches provide a significant improvement with respect to joint and individual revenue on a generic simulator for online to offline services and a real world taxi problem over existing approaches. More importantly, this is achieved while having the least variance in revenues earned by the learning individuals, an indicator of fairness. Learning Classical Planning Strategies with Policy Gradient • Pawel Gomoluch, Dalal Alrajeh, Alessandra Russo A common paradigm in classical planning is heuristic forward search. Forward search planners often rely on simple best-first search which remains fixed throughout the search process. In this paper, we introduce a novel search framework capable of alternating between several forward search approaches while solving a particular planning problem. Selection of the approach is performed using a trainable stochastic policy, mapping the state of the search to a probability distribution over the approaches. This enables using policy gradient to learn search strategies tailored to a specific distributions of planning problems and a selected performance metric, e.g. the IPC score. We instantiate the framework by constructing a policy space consisting of five search approaches and a two-dimensional representation of the planner’s state. Then, we train the system on randomly generated problems from five IPC domains using three different performance metrics. Our experimental results show that the learner is able to discover domain-specific search strategies, improving the planner’s performance relative to the baselines of plain bestfirst search and a uniform policy. Size-Independent Neural Transfer for RDDL Planning • Sankalp Garg, Aniket Bajpai, Mausam Neural planners for RDDL MDPs produce deep reactive policies in an offline fashion. These scale well with large domains, but are sample inefficient and time-consuming to train from scratch for each new problem. To mitigate this, recent work has studied neural transfer learning, so that a generic planner trained on other problems of the same domain can rapidly transfer to a new problem. However, this approach only transfers across problems of the same size. We present the first method for neural transfer of RDDL MDPs that can transfer across problems of different sizes. Our architecture has two key innovations to achieve size independence: (1) a state encoder, which outputs a fixed length state embedding by max pooling over varying number of object embeddings, (2) a single parameter-tied action decoder that projects object embeddings into action probabilities for the final policy. On the two challenging RDDL domains of SysAdmin and Game Of Life, our approach powerfully transfers across problem sizes and has superior learning curves over training from scratch.

Replanning for Situated Robots • Michael Cashmore, Andrew Coles, Bence Cserna, Erez Karpas, Daniele Magazzeni, Wheeler Ruml Planning enables intelligent agents, such as robots, to act so as to achieve their long term goals. To make the planning process tractable, a relatively low fidelity model of the world is often used, which sometimes leads to the need to replan. The typical view of replanning is that the robot is given the current state, the goal, and possibly some data from the previous planning process. However, for robots (or teams of robots) that exist in continuous physical space, act concurrently, have deadlines, or must otherwise consider durative actions, things are not so simple. In this paper, we address the problem of replanning for situated robots. Relying on previous work on situated temporal planning, we frame the replanning problem as a situated temporal planning problem, where currently executing actions are handled via Timed Initial Literals (TILs), under the assumptions that actions cannot be interrupted. We then relax this assumption, and address situated replanning with interruptible actions. We bridge the gap between the low-level model of the robot and the high-level model used for planning by the novel notion of a bail out action generator, which relies on the low-level model to generate high-level actions that describe possible ways to interrupt currently executing actions. Because actions can be interrupted at different times during their execution, we also propose a novel algorithm to handle temporal planning with time-dependent durations. A Hierarchical Approach to Active Semantic Mapping Using Probabilistic Logic and Information Reward POMDPs • Tiago S. Veiga, Miguel Silva, Rodrigo Ventura, Pedro U. Lima Maintaining a semantic map of a complex and dynamic environment, where the uncertainty originates in both noisy perception and unexpected changes, is a challenging problem. In particular, we focus on the problem of maintaining a semantic map of an environment by a mobile agent. In this paper we address this problem in a hierarchical fashion. Firstly, we employ a probabilistic logic model representing the semantic map, as well as the associated uncertainty. Secondly, we model the interaction of the robot with the environment with a set of information-reward POMDP models, one for each partition of the environment (e.g., a room). The partition is performed in order to address the scalability limitations of POMDP models over very large state spaces. We then use probabilistic inference to determine which POMDP and policy to execute next. Experimental results show the efficiency of this architecture in real domestic service robotic scenarios. Mars On-site Shared Analytics, Information, and Computing • Joshua Vander Hook, Tiago Stegun Vaquero, Federico Rossi, Martina Troesch, Marc Sanchez-Net, Joshua Schoolcraft, Jean-Pierre de la Croix, Steve Chien We study the use of distributed computation in a representative multi-robot planetary exploration mission. We model a network of small rovers with access to computing resources from a static base station based on current design efforts and extrapolation from the Mars 2020 rover autonomy. The key algorithmic problem is simultaneous scheduling of computation, communication, and caching of data, as informed by an autonomous mission planner. We consider minimum makespan scheduling and present a consensus-backed scheduler for shared-world, distributed scheduling based on an Integer Linear Program. We validate the pipeline with simulation and field results. Our results are intended to provide a baseline comparison and motivating application domain for future research into network-aware decentralized scheduling and resource allocation. Provable Infinite-Horizon Real-Time Planning for Repetitive Tasks • Fahad Islam, Oren Salzman, Maxim Likhachev In manufacturing and automation settings, robots often have to perform highly-repetitive manipulation tasks in structured environments. In this work we are interested in settings where tasks are similar, yet not identical (e.g., due to uncertain orientation of objects) and motion planning needs to be extremely fast. Preprocessing-based approaches prove to be very beneficial in these settings—they analyze the configuration-space offline to generate some auxiliary information which can then be used in the query phase to speedup planning times. Typically, the tighter the requirement is on query times the larger the memory footprint will be. In particular, for high-dimensional spaces, providing real-time planning capabilities is extremely challenging. While there are planners that guarantee real-time performance by limiting the planning horizon, we are not aware of general-purpose planners capable of doing it for infinite horizon (i.e., planning to the goal). To this end, we propose a preprocessingbased method that provides provable bounds on the query time while incurring only a small amount of memory overhead in the query phase. We evaluate our method on a 7-DOF robot arm and show a speedup of over tenfold in query time when compared to the PRM algorithm, while provably guaranteeing a maximum query time of less than 3 milliseconds. Learning Heuristic Functions for Mobile Robot Path Planning Using Deep Neural Networks • Takeshi Takahashi, He Sun, Dong Tian, Yebin Wang Resorting to certain heuristic functions to guide the search, the computation efficiency of prevailing path planning algorithms such as A*, D* and their variants is solely determined by how good the heuristic function approximates the true path cost. In this study, we propose a novel approach to learn heuristic functions using a deep neural network (DNN) to improve the computation efficiency. Even though DNNs have been widely used for object segmentation, natural language processing, and perception, their role in helping to solve path planning has not been well investigated. This work shows how DNNs can be applied to path planning problems and what kind of loss functions is suitable for learning such a heuristic. Our preliminary results show that an appropriately designed and trained DNN can learn a heuristic which effectively guides conventional path planning algorithms and speeds up the path generation. Goal Reasoning in a CLIPS-based Executive for Integrated Planning and Execution • Tim Niemueller, Till Hofmann, Gerhard Lakemeyer The close integration of planning and execution is a challenging problem. Key questions are how to organize and explicitly represent the program flow to enable reasoning about it, how to dynamically create goals from run-time information and decide on-line which to pursue, and how to unify representations used during planning and execution. In this work, we present an integrated system that uses a goal reasoning model which represents this flow and supports dynamic goal generation. With an explicit world model representation, it allows reasoning about the current state of the world, the progress of the execution flow, and what goals should be pursued – or postponed or abandoned. Our executive implements a specific goal lifecycle with compound goal types that combine sub-goals by conjunctions, disjunctions, concurrency, or that impose temporal constraints. Goals also provide a frame of reference for execution monitoring. The current system can utilize PDDL as the underlying modeling language with extensions to aid execution and it contains well-defined extension points for domain-specific code. It has been used successfully in several scenarios. POMDP-based Candy Server: Lessons Learned from a Seven Day Demo • Marcus Hoerger, Joshua Mun Liang Song, Hanna Kurniawati, Alberto Elfes An autonomous robot must decide a good strategy to achieve its long term goal, despite various types of uncertainty. The Partially Observable Markov Decision Processes (POMDPs) is a principled framework to address such a decision making problem. Despite the computational intractability of solving POMDPs, the past decade has seen substantial advancement in POMDP solvers. This paper presents our experience in enabling on-line POMDP solving to become the sole motion planner for a robot manipulation demo at IEEE SIMPAR and ICRA 2018. The demo scenario is a candy-serving robot: A 6-DOFs robot arm must pick-up a cup placed on a table by a user, use the cup to scoop candies from a box, and put the cup of candies back on the table. The average perception error is _3cm (≈ the radius of the cup), affecting the position of the cup and the surface level of the candies. This paper presents a strategy to alleviate the curse of history issue plaguing this scenario, the perception system and its integration with the planner, and lessons learned in enabling an online POMDP solver to become the sole motion planner of this entire task. The POMDP-based system were tested through a 7 days live demo at the two conferences. In this demo, 150 runs were attempted and 98% of them were successful. We also conducted further experiments to test the capability of our POMDP-based system when the environment is relatively cluttered by obstacles and when the user moves the cup while the robot tries to pick it up. In both cases, our POMDP-based system reaches a success rate of 90% and above. POMHDP: Search-based Belief Space Planning using Multiple Heuristics • Sung-Kyun Kim, Oren Salzman, Maxim Likhachev Robots operating in the real world encounter substantial uncertainty that cannot be modeled deterministically before the actual execution. This gives rise to the necessity of robust motion planning under uncertainty also known as belief space planning. Belief space planning can be formulated as Partially Observable Markov Decision Process (POMDP). However, computing optimal policies for non-trivial POMDPs is computationally intractable. Building upon recent progress from the search community, we propose a novel anytime POMDP solver, Partially Observable Multi-Heuristic Dynamic Programming (POMHDP), that leverages multiple heuristics to efficiently compute high-quality solutions while guaranteeing asymptotic convergence to an optimal policy. Through iterative forward search, POMHDP utilizes domain knowledge to solve POMDPs with specific goals and an infinite horizon. We demonstrate the efficacy of our proposed framework on a real-world, highly-complex, truck unloading application. Trajectory Tracking Control for Robotic Vehicles using Counterexample Guided Training of Neural Networks • Arthur Clavière, Souradeep Dutta, Sriram Sankaranarayanan We investigate approaches to train neural networks for controlling vehicles to follow a fixed reference trajectory robustly, while respecting limits on their velocities and accelerations. Here robustness means that if a vehicle starts inside a fixed region around the reference trajectory, it remains within this region while moving along the reference from an initial set to a target set. We consider the combination of two ideas in this paper: (a) demonstrations of the correct control obtained from a model-predictive controller (MPC) and (b) falsification approaches that actively search for violations of the property, given a current candidate. Thus, our approach builds an initial training set using the MPC loop and creates a first candidate neural network controller. This controller is repeatedly analyzed using falsification that searches for counterexample trajectories, and the resulting counterexamples are used to create new training examples. This process proceeds iteratively until the falsifier no longer succeeds within a given computational budget. We propose falsification approaches using a combination of random sampling and gradient descent to systematically search for violations. We evaluate our combined approach on a variety of benchmarks that involve controlling dynamical models of cars and quadrotor aircraft. Generalized Lazy Search for Robot Motion Planning: Interleaving Search and Edge Evaluations via Event-based Toggles • Aditya Mandalika, Sanjiban Choudhury, Oren Salzman, Siddhartha Srinivasa · Best Student Paper Award Lazy search algorithms are efficient at solving problems where edge evaluation is the bottleneck in computation, as is the case in robotic motion planning. The optimal algorithm in this class, LazySP, does so by lazily restricting edge evaluation only to the shortest path. However, this comes at the expense of search effort, i.e. LazySP has to recompute the search tree every time an edge is found to be invalid. This can get prohibitively expensive when dealing with large graphs or highly-cluttered environments. Our key insight is that both edge evaluation and search effort must be balanced to minimize the total planning time. Our contribution is two-fold. First, we propose a framework, Generalized Lazy Search (GLS), that seamlessly toggles between search and evaluation to manage wasted efforts. We show that for a choice of toggle, GLS is provably more efficient than LazySP. Secondly, we leverage prior experience in terms of edge probabilities to derive policies within the GLS framework that minimize expected planning time. We show that GLS armed with such priors significantly outperforms competitive baselines on a number of simulated environments in R2 and 7-DoF manipulation. Open-world Reasoning for Service Robots • Yuqian Jiang, Nick Walker, Justin Hart, Peter Stone A service robot accepting verbal commands from a human operator is likely to encounter requests that reference objects not currently represented in its knowledge base. In domestic or office settings, the construction of a complete knowledge base would be cumbersome and unlikely to succeed in most real-world deployments. The world that such a robot operates in is thus “open” in the sense that some objects that it must act on in the real world are not described in its internal representation. However, when an operator gives a command referencing an object that the robot has not yet observed (and thus not incorporated into its knowledge base), we can think of the object as being hypothetical to the robot. This paper presents a novel method for closing the robot’s world model for planning purposes by introducing hypothetical objects into the robot’s knowledge base, reasoning about these hypothetical objects, and acting on these hypotheses in the real world. We use our implementation of this method on a domestic service robot as an illustrative demonstration to explore how it works in practice. Speeding Up Search-based Motion Planning via Conservative Heuristics • Ishani Chatterjee, Maxim Likhachev, Ashwin Khadke, Manuela Veloso Weighted A* search (wA*) is a popular tool for robot motionplanning. Its efficiency however depends on the quality of heuristic function used. In fact, it has been shown that the correlation between the heuristic function and the true costto-goal affects heavily the efficiency of the search, when used with a large weight on the heuristics. Motivated by this observation, we investigate the problem of computing heuristics that explicitly aim to minimize the amount of efforts the search has to do to find a feasible plan. The key observation we exploit is that while heuristics tries to guide the search along what looks like an optimal path towards the goal, there are other paths that are clearly sub-optimal yet are much easier to compute. For example, in motion planning domains like footstep-planning for humanoids, a heuristic that guides the search along a path away from obstacles is less likely to encounter local minima compared with the heuristics that guides the search along an optimal but closeto-obstacles path. We utilize this observation to define the concept of conservative heuristics and propose a simple algorithm for computing such a heuristic function. Experimental analysis on (1) humanoid footstep planning (simulation), (2) path planning for a UAV (simulation), and a real-world experiment in footstep-planning for a NAO robot shows the utility of the approach.

About ICAPS

Contact

About ICAPS

Contact

Accepted Papers

Main Track

Applications Track

Planning & Learning Track

Robotics Track