Byron Boots

email: bbootscs.washington.edu
office: Bill and Melinda Gates Center (CSE2) 210
telephone: (206) 616-8017

I am the Amazon Professor of Machine Learning in the Paul G. Allen School of Computer Science and Engineering (CSE) at the University of Washington where I direct the UW Robot Learning Laboratory. I am also a Principal Research Scientist in the Seattle Robotics Lab at NVIDIA Research, and I am co-chair of the IEEE Robotics and Automation Society Technical Committee on Robot Learning.

My group performs fundamental and applied research in machine learning, artificial intelligence, and robotics with a focus on developing theory and systems that tightly integrate perception, learning, and control. Our work touches on a range of problems including computer vision, state estimation, localization and mapping, high-speed navigation, motion planning, and robotic manipulation. The algorithms that we develop use and extend theory from deep learning and neural networks, nonparametric statistics, graphical models, nonconvex optimization, quantum physics, online learning, reinforcement learning, and optimal control. (See Google Scholar and Publications below.) We have been honored with several awards for our work.

Prior to joining the faculty at the University of Washington, I was an Assistant Professor in the School of Interactive Computing within the College of Computing at Georgia Tech, and, before that, I was a post-doc in the Robotics and State Estimation Lab directed by Dieter Fox at the University of Washington. I received my Ph.D. from the Machine Learning Department in the School of Computer Science at Carnegie Mellon University where I was a member of the Sense, Learn, Act (SELECT) Lab, which was co-directed by Carlos Guestrin and my advisor Geoff Gordon.

Teaching
 

  • CSE478: Autonomous Robotics -- Winter 2023

  • Previous Courses at UW:
  • CSE/AMATH 579: Intelligent Control Through Learning and Optimization -- Fall 2022
  • CSE478: Autonomous Robotics -- Winter 2022
  • CSEP546: Machine Learning -- Fall 2021
  • CSE446: Machine Learning -- Winter 2021
  • CSE599U: Reinforcement Learning -- Fall 2020
  • CSE599W: Reinforcement Learning -- Spring 2020
  • CSE446: Machine Learning -- Winter 2020

  • Previous Courses at Georgia Tech:
  • CS8803: Adaptive Control and Reinforcement Learning -- Spring 2019
  • CS4641/7641: Machine Learning -- Fall 2018
  • CS8803: Statistical Techniques in Robotics -- Spring 2018
  • CS8803: Statistical Techniques in Robotics -- Spring 2017
  • CS4641: Machine Learning -- Fall 2016
  • CS8803: Statistical Techniques in Robotics -- Spring 2016
  • CS8803: Statistical Techniques in Robotics -- Spring 2015
  • CS4001: Computing, Society, and Professionalism -- Fall 2014

  • Refereed Conference & Journal Publications
    Authors Title Year Journal/Proceedings
    J. Sacks & B. Boots. Learning to Optimize in Model Predictive Control. 2022 Proceedings of the 2022 IEEE Conference on Robotics and Automation (ICRA-2022)
    Abstract: Sampling-based Model Predictive Control (MPC) is a flexible control framework that can reason about non-smooth dynamics and cost functions. Recently, significant work has focused on the use of machine learning to improve the performance of MPC, often through learning or fine-tuning the dynamics or cost function. In contrast, we focus on learning to optimize more effectively. In other words, to improve the update rule within MPC. We show that this can be particularly useful in sampling-based MPC, where we often wish to minimize the number of samples for computational reasons. Unfortunately, the cost of computational efficiency is a reduction in performance; fewer samples results in noisier updates. We show that we can contend with this noise by learning how to update the control distribution more effectively and make better use of the few samples that we have. Our learned controllers are trained via imitation learning to mimic an expert which has access to substantially more samples. We test the efficacy of our approach on multiple simulated robotics tasks in sample-constrained regimes and demonstrate that our approach can outperform a MPC controller with the same number of samples.
    BibTeX:
    @inproceedings{Sacks-ICRA-22,  
    Author    = "{Sacks, Jacob and Boots, Byron}",
    booktitle =  "{IEEE} International Conference on Robotics and Automation ({ICRA}) ", 
    Title     = "{Learning to Optimize in Model Predictive Control}",   
    year      = {2022}
    }
    S. Adhikary & B. Boots. Sampling Over Riemannian Manifolds Using Kernel Herding. 2022 Proceedings of the 2022 IEEE Conference on Robotics and Automation (ICRA-2022)
    Abstract: Kernel herding is a deterministic sampling algorithm designed to draw `super samples' from probability distributions when provided with their kernel mean embeddings in a reproducing kernel Hilbert space (RKHS). Empirical expectations of functions in the RKHS formed using these super samples tend to converge even faster than random sampling from the true distribution itself. Standard implementations of kernel herding have been restricted to sampling over flat Euclidean spaces, which is not ideal for applications such as robotics where more general Riemannian manifolds may be appropriate. We propose to adapt kernel herding to Riemannian manifolds by (1) using geometry-aware kernels that incorporate the appropriate distance metric for the manifold and (2) using Riemannian optimization to constrain herded samples to lie on the manifold. We evaluate our approach on problems involving various manifolds commonly used in robotics including the SO(3) manifold of rotation matrices, the spherical manifold used to encode unit quaternions, and the manifold of symmetric positive definite matrices. We demonstrate that our approach outperforms existing alternatives on the task of resampling from an empirical distribution of weighted particles, a problem encountered in applications such as particle filtering. We also demonstrate how Riemannian kernel herding can be used as part of the kernel recursive approximate Bayesian computation algorithm to estimate parameters of black-box simulators, including inertia matrices of an Adroit robot hand simulator. Our results confirm that exploiting geometric information through our approach to kernel herding yields better results than alternatives including standard kernel herding with heuristic projections.
    BibTeX:
    @inproceedings{Adhikary-ICRA-22,  
    Author    = "{Adhikary, Sandesh and Boots, Byron}",
    booktitle =  "{IEEE} International Conference on Robotics and Automation ({ICRA}) ", 
    Title     = "{Sampling Over Riemannian Manifolds Using Kernel Herding}",   
    year      = {2022}
    }
    A. Lambert, B. Hou, R. Scalise, S. Srinivasa, & B. Boots. Stein Variational Probabilistic Roadmaps. 2022 Proceedings of the 2022 IEEE Conference on Robotics and Automation (ICRA-2022)
    Abstract: Efficient and reliable generation of global path plans are necessary for safe execution and deployment of autonomous systems. In order to generate planning graphs which adequately resolve the topology of a given environment, many sampling-based motion planners resort to coarse, heuristically-driven strategies which often fail to generalize to new and varied surroundings. Further, many of these approaches are not designed to contend with partial-observability. We posit that such uncertainty in environment geometry can, in fact, help \textit{drive} the sampling process in generating feasible, and probabilistically-safe planning graphs. We propose a method for Probabilistic Roadmaps which relies on particle-based Variational Inference to efficiently cover the posterior distribution over feasible regions in configuration space. Our approach, Stein Variational Probabilistic Roadmap (SV-PRM), results in sample-efficient generation of planning-graphs and large improvements over traditional sampling approaches. We demonstrate the approach on a variety of challenging planning problems, including real-world probabilistic occupancy maps and high-dof manipulation problems common in robotics.
    BibTeX:
    @inproceedings{Lambert-ICRA-22,  
    Author    = "{Lambert, Alexander, and Hou, Brian and Scalise, Rosario and S. Srinivasa, Siddhartha and Boots, Byron}",
    booktitle =  "{IEEE} International Conference on Robotics and Automation ({ICRA}) ", 
    Title     = "{Stein Variational Probabilistic Roadmaps}",   
    year      = {2022}
    }
    H. Nichols, M. Jimenez, Z. Goddard, M. Sparapany, B. Boots, & A. Mazumdar. Adversarial Sampling-Based Motion Planning. 2022 IEEE Robotics and Automation Letters (Presented at ICRA-2022)
    Abstract: There are many scenarios in which a mobile agent may not want its path to be predictable. Examples include preserving privacy or confusing an adversary. However, this desire for deception can conflict with the need for a low path cost. Optimal plans such as those produced by RRT* may have low path cost, but their optimality makes them predictable. Similarly, a deceptive path that features numerous zig-zags may take too long to reach the goal. We address this trade-off by drawing inspiration from adversarial machine learning. We propose a new planning algorithm, which we title Adversarial RRT*. Adversarial RRT* attempts to deceive machine learning classifiers by incorporating a predicted measure of deception into the planner cost function. Adversarial RRT* considers both path cost and a measure of predicted deceptiveness in order to produce a low-cost trajectory that still has deceptive properties. We demonstrate the performance of Adversarial RRT*, with entropy as a measure of deception, using a Dubins car and show how percent of paths misclassified is increased from 18% to 53% while keeping path cost within 21% of the optimal planner. In addition, Adversarial RRT* is able to deceive a separate classifier that was designed independently. The reduction in classification accuracy from 48% to 30% for this separate classifier illustrates that the proposed methods may have broader deceptive performance.
    BibTeX:
    @inproceedings{Nichols-RAL-22,  
    Author    = "{Nichols,Hayden  and Jimenez, Mark and Goddard, Zachary and Sparapany, Michael and 
    Boots, Michael and Mazumdar, Anirban}",
    booktitle =  "IEEE Robotics and Automation Letters ", 
    Title     = "{Adversarial Sampling-Based Motion Planning}",   
    year      = {2022}
    }
    K. Van Wyk, M. Xie, A. Li, M. Rana, B. Babich, B. Peele, Q. Wan, I. Akinola, B. Sundaralingam, D. Fox, B. Boots, & N. Ratliff. Geometric Fabrics: Generalizing Classical Mechanics to Capture the Physics of Behavior. 2022 IEEE Robotics and Automation Letters (Presented at ICRA-2022)
    Abstract: Classical mechanical systems are central to controller design in energy shaping methods of geometric control. However, their expressivity is limited by position-only metrics and the intimate link between metric and geometry. Recent work on Riemannian Motion Policies (RMPs) has shown that shedding these restrictions results in powerful design tools, but at the expense of theoretical guarantees. In this work, we generalize classical mechanics to what we call geometric fabrics, whose expressivity and theory enable the design of systems that outperform RMPs in practice. Geometric fabrics strictly generalize classical mechanics forming a new physics of behavior by first generalizing them to Finsler geometries and then explicitly bending them to shape their behavior. We develop the theory of fabrics and present both a collection of controlled experiments examining their theoretical properties and a set of robot system experiments showing improved performance over a well-engineered and hardened implementation of RMPs, our current state-of-the-art in controller design.
    BibTeX:
    @inproceedings{VanWyk-RAL-22,  
    Author    = "{Van Wyk, Karl and Xie, Man and Li, Anqi and Rana, Muhammad and Babich, Buck and Peele, Bryan and Wan, Qian and Akinola, Iretiayo and Sundaralingam, Balakumar and Fox, Dieter  and Boots, Byron and Ratliff, Nathan }",
    booktitle =  "IEEE Robotics and Automation Letters ", 
    Title     = "{Geometric Fabrics: Generalizing Classical Mechanics to Capture the Physics of Behavior}",   
    year      = {2022}
    }
    M. Bhardwaj, S. Choudhury, B. Boots, & S. Srinivasa. Leveraging Experience in Lazy Search. 2021 Autonomous Robots (AURO)
    Abstract: Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a good edge selector chooses edges that are not only likely to be invalid, but also eliminates future paths from consideration. We wish to learn such a selector by leveraging prior experience. We formulate this problem as a Markov Decision Process (MDP) on the state of the search problem. While solving this large MDP is generally intractable, we show that we can compute oracular selectors that can solve the MDP during training. With access to such oracles, we use imitation learning to find effective policies. If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly. We evaluate our algorithms on a wide range of 2D and 7D problems and show that the learned selector outperforms baseline commonly used heuristics. We further provide a novel theoretical analysis of lazy search in a Bayesian framework as well as regret guarantees on our imitation learning based approach to motion planning.
    BibTeX:
    @inproceedings{Bhardwaj-AURO-21,  
    Author    = "{Bhardwaj,Mohak  and Chowdhury, Sanjiban and
                 Boots, Byron and Srinivasa, Siddhartha}",
    booktitle =  "Autonomous Robots ({AURO})", 
    Title     = "{Leveraging Experience in Lazy Search}",   
    year      = {2021}
    }
    A. Shaban, A. Rahimi, T. Ajanthan, B. Boots, & Richard Hartley. Few-shot Weakly-Supervised Object Detection via Directional Statistics. 2021 Winter Conference on Applications of Computer Vision (WACV)
    Abstract: Detecting novel objects from few examples has become an emerging topic in computer vision recently. However, these methods need fully annotated training images to learn new object categories which limits their applicability in real world scenarios such as field robotics. In this work, we propose a probabilistic multiple instance learning approach for few-shot Common Object Localization (COL) and few-shot Weakly Supervised Object Detection (WSOD). In these tasks, only image-level labels, which are much cheaper to acquire, are available. We find that operating on features extracted from the last layer of a pre-trained Faster-RCNN is more effective compared to previous episodic learning based few-shot COL methods. Our model simultaneously learns the distribution of the novel objects and localizes them via expectation-maximization steps. As a probabilistic model, we employ von Mises-Fisher (vMF) distribution which captures the semantic information better than Gaussian distribution when applied to the pre-trained embedding space. When the novel objects are localized, we utilize them to learn a linear appearance model to detect novel classes in new images. Our extensive experiments show that the proposed method, despite being simple, outperforms strong baselines in few-shot COL and WSOD, as well as large-scale WSOD tasks.
    BibTeX:
    @inproceedings{Shaban-WACV-21,  
    Author    = "{Shaban, Amirreza and Rahimi, Amir and Ajanthan, Thalaiyasingam and Boots, Byron and Hartley, Richard}",
    booktitle =  "Winter Conference on Applications of Computer Vision (WACV)", 
    Title     = "{Few-shot Weakly-Supervised Object Detection via Directional Statistics}",   
    year      = {2021}
    }
    Y. Yang, T. Zhang, E. Coumans, J. Tan, & B. Boots. Fast and Efficient Locomotion via Learned Gait Transitions. (Finalist for Best Systems Paper) 2021 Proceedings of the 5th Annual Conference on Robot Learning
    (CoRL-2021)
    Abstract: We focus on the problem of developing energy efficient controllers for quadrupedal robots. Animals can actively switch gaits at different speeds to lower their energy consumption. In this paper, we devise a hierarchical learning framework, in which distinctive locomotion gaits and natural gait transitions emerge automatically with a simple reward of energy minimization. We use reinforcement learning to train a high-level gait policy that specifies gait patterns of each foot, while the low-level whole-body controller optimizes the motor commands so that the robot can walk at a desired velocity using that gait pattern. We test our learning framework on a quadruped robot and demonstrate automatic gait transitions, from walking to trotting and to fly-trotting, as the robot increases its speed. We show that the learned hierarchical controller consumes much less energy across a wide range of locomotion speed than baseline controllers.
    BibTeX:
    @inproceedings{Yang-CoRL-21,  
    Author    = "{Yang, Yuxiang and Zhang, Tingnan and Coumans, Erwin and Tan, Jie and Boots, Byron}",
    booktitle =  "Conference on Robot Learning ({CoRL})", 
    Title     = "{Fast and Efficient Locomotion via Learned Gait Transitions}",   
    year      = {2021}
    }
    M. Bhardwaj, B. Sundaralingam, A. Mousavian, N. Ratliff, D. Fox, F. Ramos, & B. Boots. STORM: An Integrated Framework for Fast Joint-Space Model-Predictive Control for Reactive Manipulation. (Selected for Oral Presentation) 2021 Proceedings of the 5th Annual Conference on Robot Learning
    (CoRL-2021)
    Abstract: Sampling-based model-predictive control (MPC) is a promising tool for feedback control of robots with complex, non-smooth dynamics, and cost functions. However, the computationally demanding nature of sampling-based MPC algorithms has been a key bottleneck in their application to high-dimensional robotic manipulation problems in the real world. Previous methods have addressed this issue by running MPC in the task space while relying on a low-level operational space controller for joint control. However, by not using the joint space of the robot in the MPC formulation, existing methods cannot directly account for non-task space related constraints such as avoiding joint limits, singular configurations, and link collisions. In this paper, we develop a system for fast, joint space sampling-based MPC for manipulators that is efficiently parallelized using GPUs. Our approach can handle task and joint space constraints while taking less than 8ms~(125Hz) to compute the next control command. Further, our method can tightly integrate perception into the control problem by utilizing learned cost functions from raw sensor data. We validate our approach by deploying it on a Franka Panda robot for a variety of dynamic manipulation tasks. We study the effect of different cost formulations and MPC parameters on the synthesized behavior and provide key insights that pave the way for the application of sampling-based MPC for manipulators in a principled manner. We also provide highly optimized, open-source code to be used by the wider robot learning and control community.
    BibTeX:
    @inproceedings{Bhardwaj-CoRL-21,  
    Author    = "{Bhardwaj, Mohak and Sundaralingam, Balakumar and Mousavian, Arsalan and Ratliff, Nathan and Fox, Dieter and Ramos, Fabio and Boots, Byron}",
    booktitle =  "Conference on Robot Learning ({CoRL})", 
    Title     = "{STORM: An Integrated Framework for Fast Joint-Space Model-Predictive Control for Reactive Manipulation}",   
    year      = {2021}
    }
    B. Yang, G. Habibi, P. Lancaster, B. Boots, & J. Smith. Motivating Physical Activity via Competitive Human-Robot Interaction. (Selected for Oral Presentation) 2021 Proceedings of the 5th Annual Conference on Robot Learning
    (CoRL-2021)
    Abstract: This project aims to motivate research in competitive human-robot interaction by creating a robot competitor that can challenge human users in certain scenarios such as physical exercise and games. With this goal in mind, we introduce the Fencing Game, a human-robot competition used to evaluate both the capabilities of the robot competitor and user experience. We develop the robot competitor through iterative multi-agent reinforcement learning, and show that it can perform well against human competitors. Our user study additionally found that our system was able to continuously create challenging and enjoyable interactions for humans and the majority of human subjects considered the system to be entertaining and useful for improving the quality of their exercise.
    BibTeX:
    @inproceedings{YangB-CoRL-21,  
    Author    = "{Yang, B and Habibi, Golnaz and  Lancaster, Patrick and Boots, Byron and Smith, Joshua }",
    booktitle =  "Conference on Robot Learning ({CoRL})", 
    Title     = "{Motivating Physical Activity via Competitive Human-Robot Interaction}",   
    year      = {2021}
    }
    A. Shaban, X. Meng, J. Lee, B. Boots, & D. Fox. Semantic Terrain Classification for Off-Road Autonomous Driving. 2021 Proceedings of the 5th Annual Conference on Robot Learning
    (CoRL-2021)
    Abstract: In this paper, we focus on the problem of estimating the traversability of terrain for autonomous off-road navigation. To produce dense and accurate local traversability maps, a robot must reason about geometric and semantic properties of the environment. %Traditional approaches usually focus on only one aspect of the problem, but we argue that these three aspects are closely connected and should be learned jointly. To achieve this goal, we develop a novel Bird's Eye View Network (BEVNet), a deep neural network that directly predicts dense traversability maps from sparse LiDAR inputs. BEVNet processes both geometric and semantic information in a temporally consistent fashion. More importantly, it uses learned prior and history to \emph{predict} traversability in unseen space and into the future, allowing a robot to better appraise its situation. We quantitatively evaluate BEVNet on both on-road and off-road scenarios, and show that it outperforms a variety of strong baselines.
    BibTeX:
    @inproceedings{Shaban-CoRL-21,  
    Author    = "{Shaban, Amirreza and Meng, Xiangyun and Lee, JoonHo and Boots, Byron and Fox, Dieter}",
    booktitle =  "Conference on Robot Learning ({CoRL})", 
    Title     = "{Semantic Terrain Classification for Off-Road Autonomous Driving}",   
    year      = {2021}
    }
    J. Sacks, K. Choi, K. Bruss, S. Buerger, J. Su, A. Mazumdar, & B. Boots. Machine Learning Methods for Estimating Down-hole Depth of Cut. 2021 Proceedings of the 2021 Geothermal Rising Conference
    Abstract: Depth of cut (DOC) refers to the depth a bit penetrates into the rock during drilling. This is an important quantity for estimating drilling performance. In general, DOC is determined by dividing the rate of penetration (ROP) by the rotational speed. Surface based sensors at the top of the drill string are used to determine both ROP and rotational speed. However, ROP measurements using top-hole sensors are noisy and often require taking a derivative. Filtering reduces the update rate, and both top-hole linear and angular velocity can be delayed relative to downhole behavior. In this work, we describe recent progress towards estimating ROP and DOC using down-hole sensing. We assume downhole measurements of torque, weight-on-bit (WOB), and rotational speed and anticipate that these measurements are physically realizable. Our hypothesis is that these measurements can provide more rapid and accurate measures of drilling performance. We examine a range of machine learning techniques for estimating ROP and DOC based on this local sensing paradigm. We show how machine learning can provide rapid and accurate performance when evaluated on experimental data taken from Sandia’s Hard Rock Drilling Facility. These results have the potential to enable better drilling assessment, improved control, and extended component life-times.
    BibTeX:
    @inproceedings{Sacks_GR-21,  
    Author    = "{Sacks, Jacob and Choi, Kevin and Bruss, Kathryn and Su, Jiann-Cherng and Buerger, Stephen P. and Mazumdar, Anirban and  Boots, Byron}",
    booktitle =  "Proceedings of the 2021 Geothermal Rising Conference", 
    Title     = "{Machine Learning Methods for Estimating Down-hole Depth of Cut}",   
    year      = {2021}
    }
    M. Rana, A. Li, D. Fox, S. Chernova, B. Boots, & N. Ratliff. Towards Coordinated Robot Motions: End-to-End Learning of Motion Policies on Transform Trees. 2021 Proceedings of the International Conference on Intelligent Robots and Systems (IROS-2021)
    Abstract: Generating robot motion that fulfills multiple tasks simultaneously is challenging due to the geometric constraints imposed by the robot. In this paper, we propose to solve multi-task problems through learning structured policies from human demonstrations. Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces. The policy structure provides the user an interface to 1) specifying the spaces that are directly relevant to the completion of the tasks, and 2) designing policies for certain tasks that do not need to be learned. We derive an end-to-end learning objective function that is suitable for the multi-task problem, emphasizing the deviation of motions on task spaces. Furthermore, the motion generated from the learned policy class is guaranteed to be stable. We validate the effectiveness of our proposed learning framework through qualitative and quantitative evaluations on three robotic tasks on a 7-DOF Rethink Sawyer robot.
    BibTeX:
    @inproceedings{Rana-IROS-21,  
    Author    = "{Rana, M Asif and Li, Anqi and Fox, Dieter and Chernova, Sonia and Boots, Byron and Ratliff, Nathan}",
    booktitle =  "{IEEE/RSJ} International Conference on Intelligent Robots and Systems ({IROS})", 
    Title     = "{Towards Coordinated Robot Motions: End-to-End Learning of Motion Policies on Transform Trees}",   
    year      = {2021}
    }
    N. Wagener, B. Boots, & C. Cheng. Safe Reinforcement Learning Using Advantage-Based Intervention. 2021 Proceedings of the 38th International Conference on Machine Learning
    (ICML-2021)
    Abstract: Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this problem for the chance-constrained setting. We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs. Our method comes with strong guarantees on safety during both training and deployment (i.e., after training and without the intervention mechanism) and policy performance compared to the optimal safety-constrained policy. In our experiments, we show that SAILR violates constraints far less during training than standard safe RL and constrained MDP approaches and converges to a well-performing policy that can be deployed safely without intervention.
    BibTeX:
    @inproceedings{Wagener-ICML-21,  
    Author    = "{Wagener, Nolan and Boots, Byron and Cheng, Ching-An}",
    booktitle =  "International Conference on Machine Learning ({ICML})", 
    Title     = "{Safe Reinforcement Learning Using Advantage-Based Intervention}",   
    year      = {2021}
    }
    X. Yan, B. Boots, & C. Cheng. Explaining Fast Improvement in Online Imitation Learning. 2021 Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence
    (UAI-2021)
    Abstract: Online imitation learning (IL) is an algorithmic framework that leverages interactions with expert policies for efficient policy optimization. Here policies are optimized by performing online learning on a sequence of loss functions that encourage the learner to mimic expert actions, and if the online learning has no regret, the agent can provably learn an expert-like policy. Online IL has demonstrated empirical successes in many applications and interestingly, its policy improvement speed observed in practice is usually much faster than existing theory suggests. In this work, we provide an explanation of this phenomenon. We show that adopting a sufficiently expressive policy class in online IL has two benefits: both the policy improvement speed increases and the performance bias decreases.
    BibTeX:
    @inproceedings{Yan-UAI-21,  
    Author    = "{Yan, Xinyan and Boots, Byron and Cheng, Ching-An}",
    booktitle =  "International Conference on Uncertainty in Artificial Intelligence ({UAI})", 
    Title     = "{Explaining Fast Improvement in Online Imitation Learning}",   
    year      = {2021}
    }
    L. Barcelos, A. Lambert, R. Oliveira, P. Borges, B. Boots, & Fabio Ramos. Dual Online Stein Variational Inference for Control and Dynamics. 2021 Proceedings of Robotics: Science and Systems XVII (RSS-2021)
    Abstract: Model predictive control (MPC) schemes have a proven track record for delivering aggressive and robust performance in many challenging control tasks, coping with nonlinear system dynamics, constraints, and observational noise. Despite their success, these methods often rely on simple control distributions, which can limit their performance in highly uncertain and complex environments. MPC frameworks must be able to accommodate changing distributions over system parameters, based on the most recent measurements. In this paper, we devise an implicit variational inference algorithm able to estimate distributions over model parameters and control inputs on-the-fly. The method incorporates Stein Variational gradient descent to approximate the target distributions as a collection of particles, and performs updates based on a Bayesian formulation. This enables the approximation of complex multi-modal posterior distributions, typically occurring in challenging and realistic robot navigation tasks. We demonstrate our approach on both simulated and real-world experiments requiring real-time execution in the face of dynamically changing environments.
    BibTeX:
    @inproceedings{Barcelos-RSS-21,  
    Author    = "{Barcelos, Lucas and Lambert, Alexander and Oliveira, Rafael and Borges, Paulo and Boots, Byron and Ramos, Fabio}",
    booktitle =  "Robotics: Science and Systems ({R:SS})", 
    Title     = "{Dual Online Stein Variational Inference for Control and Dynamics}",   
    year      = {2021}
    }
    A. Li, C. Cheng, A. Rana, M. Xie, K. Van Wyk, N. Ratliff, & B. Boots. RMP^2: A Structured Composable Policy Class for Robot Learning. 2021 Proceedings of Robotics: Science and Systems XVII (RSS-2021)
    Abstract: We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different levels of prior knowledge as well as the ability to transfer policies between robots. However, implementing a system for end-to-end learning RMPflow policies faces several computational challenges. In this work, we re-examine the message passing algorithm of RMPflow and propose a more efficient alternate algorithm, called RMP2, that uses modern automatic differentiation tools (such as TensorFlow and PyTorch) to compute RMPflow policies. Our new design retains the strengths of RMPflow while bringing in advantages from automatic differentiation, including 1) easy programming interfaces to designing complex transformations; 2) support of general directed acyclic graph (DAG) transformation structures; 3) end-to-end differentiability for policy learning; 4) improved computational efficiency. Because of these features, RMP2 can be treated as a structured policy class for efficient robot learning which is suitable encoding domain knowledge. Our experiments show that using structured policy class given by RMP2 can improve policy performance and safety in reinforcement learning tasks for goal reaching in cluttered space.
    BibTeX:
    @inproceedings{Li-RSS-21,  
    Author    = "{Li, Anqi and Cheng, Ching-An and Rana, M Asif and Xie, Man and Van Wyk, Karl and Ratliff, Nathan and Boots, Byron}",
    booktitle =  "Robotics: Science and Systems ({R:SS})", 
    Title     = "{{RMP}$^2$: A Structured Composable Policy Class for Robot Learning}",   
    year      = {2021}
    }
    L. Ke, J. Wang, T. Bhattacharjee, B. Boots, & S. Srinivasa. Grasping with Chopsticks: Combating Covariate Shift in Model-free Imitation Learning for Fine Manipulation. (48% Acceptance Rate) 2021 Proceedings of the 2021 IEEE Conference on Robotics and Automation (ICRA-2021)
    Abstract: Billions of people use chopsticks, a simple yet versatile tool, for fine manipulation of everyday objects. The small, curved, and slippery tips of chopsticks pose a challenge for picking up small objects, making them a suitably complex test case. This paper leverages human demonstrations to develop an autonomous chopsticks-equipped robotic manipulator. Due to the lack of accurate models for fine manipulation, we explore model-free imitation learning, which traditionally suffers from the \emph{covariate shift} phenomenon that causes poor generalization. We propose two approaches to reduce covariate shift, neither of which requires access to an interactive expert or a model, unlike previous approaches. First, we alleviate single-step prediction errors by applying an invariant operator to increase the data support at critical steps for grasping. Second, we generate synthetic corrective labels by adding bounded noise and combining parametric and non-parametric methods to prevent error accumulation. We demonstrate our methods on a real chopstick-equipped robot that we built, and observe the agent's success rate increase from 37.3% to 80%, which is comparable to the human expert performance of 82.6%.
    BibTeX:
    @inproceedings{Ke-ICRA-21,  
    Author    = "{Ke, Liyiming and Wang, Jingqiang and Bhattacharjee, Tapomayukh and Boots, Byron and Srinivasa, Siddhartha}",
    booktitle =  "{IEEE} International Conference on Robotics and Automation ({ICRA})", 
    Title     = "{Grasping with Chopsticks: Combating Covariate Shift in Model-free Imitation Learning for Fine Manipulation}",   
    year      = {2021}
    }
    N. Hatch & B. Boots. The Value of Planning for Infinite-Horizon Model Predictive Control. (48% Acceptance Rate) 2021 Proceedings of the 2021 IEEE Conference on Robotics and Automation (ICRA-2021)
    Abstract: Model Predictive Control (MPC) is a classic tool for optimal control of complex, real-world systems. Although it has been successfully applied to a wide range of challenging tasks in robotics, it is fundamentally limited by the prediction horizon, which, if too short, will result in myopic decisions. Recently, several papers have suggested using a learned value function as the terminal cost for MPC. If the value function is accurate, it effectively allows MPC to reason over an \emph{infinite} horizon. Unfortunately, Reinforcement Learning (RL) solutions to value function approximation can be difficult to realize for robotics tasks. In this paper, we suggest a more efficient method for value function approximation that applies to goal-directed problems, like reaching and navigation. In these problems, MPC is often formulated to track a path or trajectory returned by a planner. However, this strategy is brittle in that unexpected perturbations to the robot will require replanning, which can be costly at runtime. Instead, we show how the intermediate data structures used by modern planners can be interpreted as an approximate \emph{value function}. We show that that this value function can be used by MPC \emph{directly}, resulting in more efficient and resilient behavior at runtime.
    BibTeX:
    @inproceedings{Hatch-ICRA-21,  
    Author    = "{Hatch, Nathan and Boots, Byron}",
    booktitle =  "{IEEE} International Conference on Robotics and Automation ({ICRA})", 
    Title     = "{The Value of Planning for Infinite-Horizon Model Predictive Control}",   
    year      = {2021}
    }
    S. Adhikary, S. Srinivasan, J. Miller, G. Rabusseau, & B. Boots. Quantum Tensor Networks, Stochastic Processes, and Weighted Automata. (30% Acceptance Rate) 2021 Proceedings of the 24th Conference on Artificial Intelligence and Statistics (AISTATS-2021)
    Abstract: Modeling joint probability distributions over sequences has been studied from many perspectives. The physics community developed matrix product states, a tensor-train decomposition for probabilistic modeling, motivated by the need to tractably model many-body systems. But similar models have also been studied in the stochastic processes and weighted automata literature, with little work on how these bodies of work relate to each other. We address this gap by showing how stationary or uniform versions of popular quantum tensor network models have equivalent representations in the stochastic processes and weighted automata literature, in the limit of infinitely long sequences. We demonstrate several equivalence results between models used in these three communities: (i) uniform variants of matrix product states, Born machines and locally purified states from the quantum tensor networks literature, (ii) predictive state representations, hidden Markov models, norm-observable operator models and hidden quantum Markov models from the stochastic process literature,and (iii) stochastic weighted automata, probabilistic automata and quadratic automata from the formal languages literature. Such connections may open the door for results and methods developed in one area to be applied in another.
    BibTeX:
    @inproceedings{Adhikary-AISTATS-21,  
    Author    = "{Adhikary, Sandesh and Srinivasan, Siddarth and Miller, Jacob and Rabusseau, Guillaume and Boots, Byron}",
    booktitle =  "International Conference on Artificial Intelligence and Statistics ({AISTATS})", 
    Title     = "{Quantum Tensor Networks, Stochastic Processes, and Weighted Automata}",   
    year      = {2021}
    }
    M. Bhardwaj, S. Choudhury, & B. Boots. Blending MPC & Value Function Approximation for Efficient Reinforcement Learning. (29% Acceptance Rate) 2021 Proceedings of the 9th International Conference on Learning Representations (ICLR-2021)
    Abstract: Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems that uses a model to make predictions about future behavior. For each state encountered, MPC solves an online optimization problem to choose a control action that will minimize future cost. This is a surprisingly effective strategy, but real-time performance requirements warrant the use of simple models. If the model is not sufficiently accurate, then the resulting controller can be biased, limiting performance. We present a framework for improving on MPC with model-free reinforcement learning (RL). The key insight is to view MPC as constructing a series of local Q-function approximations. We show that by using a parameter λ, similar to the trace decay parameter in TD(λ), we can systematically trade-off learned value estimates against the local Q-function approximations. We present a theoretical analysis that shows how error from inaccurate models in MPC and value function estimation in RL can be balanced. We further propose an algorithm that changes λ over time to reduce the dependence on MPC as our estimates of the value function improve, and test the efficacy our approach on challenging high-dimensional manipulation tasks with biased models in simulation. We demonstrate that our approach can obtain performance comparable with MPC with access to true dynamics even under severe model bias and is more sample efficient as compared to model-free RL.
    BibTeX:
    @inproceedings{Bhardwaj-ICLR-21,  
    Author    = "{Bhardwaj, Mohak and Choudhury, Sanjiban and Boots, Byron}",
    booktitle =  "International Conference on Learning Representations ({ICLR})", 
    Title     = "{Blending MPC {\&} Value Function Approximation for Efficient Reinforcement Learning}",   
    year      = {2021}
    }
    A. Lambert, A. Fishman, D. Fox, B. Boots, & Fabio Ramos. Stein Variational Model Predictive Control.
    (34% Acceptance Rate)
    2020 Proceedings of the 4th Annual Conference on Robot Learning
    (CoRL-2020)
    Abstract: Decision making under uncertainty is critical to real-world, autonomous systems. Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex probability distributions. In this paper, we propose a generalization of MPC that represents a multitude of solutions as posterior distributions. By casting MPC as a Bayesian inference problem, we employ variational methods for posterior computation, naturally encoding the complexity and multi-modality of the decision making problem. We propose a Stein variational gradient descent method to estimate the posterior over control parameters, given a cost function and a sequence of state observations. We show that this framework leads to successful planning in challenging, non-convex optimal control problems.
    BibTeX:
    @inproceedings{Lambert-CoRL-20,  
    Author    = "{Lambert, Alexander and Fishman, Adam and Fox, Dieter and Boots, Byron and Ramos, Fabio}",
    booktitle =  "Conference on Robot Learning ({CoRL})", 
    Title     = "{Stein Variational Model Predictive Control}",   
    year      = {2020}
    }
    X. Da, Z. Xie, D. Hoeller, B. Boots, A. Anandkumar, Y. Zhu, B. Babich, & A. Garg. Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion. (34% Acceptance Rate) 2020 Proceedings of the 4th Annual Conference on Robot Learning
    (CoRL-2020)
    Abstract: We present a hierarchical framework that combines model-based control and reinforcement learning (RL) to synthesize robust controllers for a quadruped (the Unitree Laikago). The system consists of a high-level controller that learns to choose from a set of primitives in response to changes in the environment and a low-level controller that utilizes an established control method to robustly execute the primitives. Our framework learns a controller that can adapt to challenging environmental changes on the fly, including novel scenarios not seen during training. The learned controller is up to 85~percent more energy efficient and is more robust compared to baseline methods. We also deploy the controller on a physical robot without any randomization or adaptation scheme.
    BibTeX:
    @inproceedings{Da-CoRL-20,  
    Author    = "{Da, Xingye and Xie, Zhaoming and Hoeller, David and Boots, Byron and Anandkumar, Animashree and Zhu, Yuke and Babich, Buck and Garg, Animesh}",
    booktitle =  "Conference on Robot Learning ({CoRL})", 
    Title     = "{Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion}",   
    year      = {2020}
    }
    H. Izadinia, B. Boots, & S. Seitz. Nonprehensile Riemannian Motion Predictive Control. 2020 Proceedings of the 17th International Symposium on Experimental Robotics.
    (ISER-2020)
    Abstract: Nonprehensile manipulation involves long horizon underactuated object interactions and physical contact with different objects that can inherently introduce a high degree of uncertainty. In this work, we introduce a novel Real-to-Sim reward analysis technique, called Riemannian Motion Predictive Control (RMPC), to reliably imagine and predict the outcome of taking possible actions for a real robotic platform. Our proposed RMPC benefits from Riemannian motion policy and second order dynamic model to compute the acceleration command and control the robot at every location on the surface. Our approach creates a 3D object-level recomposed model of the real scene where we can simulate the effect of different trajectories. We produce a closed-loop controller to reactively push objects in a continuous action space. We evaluate the performance of our RMPC approach by conducting experiments on a real robot platform as well as simulation and compare against several baselines. We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
    BibTeX:
    @inproceedings{Izadina-ISER-20,  
    Author    = "{Izadinia, H and Boots, Byron and Seitz, S}",
    booktitle =  "International Symposium on Experimental Robotics ({ISER})", 
    Title     = "{Nonprehensile {R}iemannian Motion Predictive Control}",   
    year      = {2020}
    }
    J. Guerin, S. Thiery, E. Nyiri, O. Gibaru, & B. Boots. Combining pretrained CNN feature extractors to enhance clustering of complex natural images. 2020 Neurocomputing
    Abstract: Recently, a common starting point for solving complex unsupervised image classification tasks is to use generic features, extracted with deep Convolutional Neural Networks (CNN) pretrained on a large and versatile dataset (ImageNet). However, in most research, the CNN architecture for feature extraction is chosen arbitrarily, without justification. This paper aims at providing insight on the use of pretrained CNN features for image clustering (IC). First, extensive experiments are conducted and show that, for a given dataset, the choice of the CNN architecture for feature extraction has a huge impact on the final clustering. These experiments also demonstrate that proper extractor selection for a given IC task is difficult. To solve this issue, we propose to rephrase the IC problem as a multi-view clustering (MVC) problem that considers features extracted from different architectures as different “views” of the same data. This approach is based on the assumption that information contained in the different CNN may be complementary, even when pretrained on the same data. We then propose a multi-input neural network architecture that is trained end-to-end to solve the MVC problem effectively. This approach is tested on eight natural image datasets, and produces state-of-the-art results for IC.
    BibTeX:
    @inproceedings{Guerin-Neurocomputing-20,  
    Author    = " {Guerin, Joris and Thiery, Stephane and Nyiri, Eric and Gibaru, Olivier and Boots, Byron}",
    booktitle =  "Neurocomputing ", 
    Title     = "{Intra Order-Preserving Functions for Calibration of Multi-Class Neural Networks}",   
    year      = {2020}
    }
    A. Rahimi, A. Shaban, C. Cheng, B. Boots, & R. Hartley. Intra Order-Preserving Functions for Calibration of Multi-Class Neural Networks. (20% Acceptance Rate) 2020 Proceedings of Advances in Neural Information Processing Systems 34 (NeurIPS-2020)
    Abstract: Predicting calibrated confidence scores for multi-class deep networks is important for avoiding rare but costly mistakes. A common approach is to learn a post-hoc calibration function that transforms the output of the original network into calibrated confidence scores while maintaining the network's accuracy. However, previous post-hoc calibration techniques work only with simple calibration functions, potentially lacking sufficient representation to calibrate the complex function landscape of deep networks. In this work, we aim to learn general post-hoc calibration functions that can preserve the top-k predictions of any deep network. We call this family of functions intra order-preserving functions. We propose a new neural network architecture that represents a class of intra order-preserving functions by combining common neural network components. Additionally, we introduce order-invariant and diagonal sub-families, which can act as regularization for better generalization when the training data size is small. We show the effectiveness of the proposed method across a wide range of datasets and classifiers. Our method outperforms state-of-the-art post-hoc calibration methods, namely temperature scaling and Dirichlet calibration, in multiple settings.
    BibTeX:
    @inproceedings{Shaban-NeurIPS-20,  
    Author    = "{Rahimi, Amir and Shaban, Amirreza and Cheng, Ching-An and Boots, Byron and Hartley, Richard}",
    booktitle =  "Advances in Neural Information Processing Systems ({NeurIPS}) ", 
    Title     = "{Intra Order-Preserving Functions for Calibration of Multi-Class Neural Networks}",   
    year      = {2020}
    }
    A. Rahimi, A. Shaban, T. Ajanthan, R. Hartley, & B. Boots. Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization. (27% Acceptance Rate) 2020 Proceedings of the European Conference on Computer Vision
    (ECCV-2020)
    Abstract: Weakly Supervised Object Localization (WSOL) methods have become increasingly popular since they only require image level labels as opposed to expensive bounding box annotations required by fully supervised algorithms. Typically, a WSOL model is first trained to predict class generic objectness scores on an off-the-shelf fully supervised source dataset and then it is progressively adapted to learn the objects in the weakly supervised target dataset. In this work, we argue that learning only an objectness function is a weak form of knowledge transfer and propose to learn a classwise pairwise similarity function that directly compares two input proposals as well. The combined localization model and the estimated object annotations are jointly learned in an alternating optimization paradigm as is typically done in standard WSOL methods. In contrast to the existing work that learns pairwise similarities, our proposed approach optimizes a unified objective with convergence guarantee and it is computationally efficient for large-scale applications. Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function. For instance, in the ILSVRC dataset, the Correct Localization (CorLoc) performance improves from 72.7% to 78.2% which is a new state-of-the-art for weakly supervised object localization task.
    BibTeX:
    @inproceedings{Shaban-ECCV-20,  
    Author    = "{Rahimi, Amir and Shaban, Amirreza and Ajanthan, Thalaiyasingam and Hartley, Richard and Boots, Byron}",
    booktitle =  "European Conference on Computer Vision ({ECCV}) ", 
    Title     = "{Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization}",   
    year      = {2020}
    }
    A. Fishman, C. Paxton, W. Yang, D. Fox, B. Boots, & N. Ratliff. Collaborative Interaction Models for Optimized Human Robot Teamwork. (47% Acceptance Rate) 2020 Proceedings of the International Conference on Intelligent Robots and Systems (IROS-2020)
    Abstract: Effective human-robot collaboration requires informed anticipation. The robot must simultaneously anticipate what the human will do and react quickly and intuitively when its predictions are wrong. Additionally, the robot must plan its actions to account for the human’s own plan, but with the knowledge that the human’s behavior will change based on what the robot actually does. This cyclical game of predicting a human’s future actions and generating a corresponding motion plan is extremely difficult to model using standard techniques. In this work, we describe a novel framework for finding optimal trajectories in a multi-agent collaborative setting. We use Model Predictive Control (MPC) to simultaneously plan for the robot while predicting the actions of its external collaborators. We use human-robot handovers to demonstrate that with a strong model of the collaborator, our framework produces fluid, reactive human-robot interactions in novel, cluttered environments. Our method efficiently generates coordinated trajectories, and achieves a high success rate in handover, even in the presence of a large amounts of sensor noise.
    BibTeX:
    @inproceedings{Fishman-IROS-20,  
    Author    = "{Fishman, Adam and Paxton, Chris and Yang, Wei and Fox, Dieter and Boots, Byron and Ratliff, Nathan}",
    booktitle =  "{IEEE/RSJ} International Conference on Intelligent Robots and Systems ({IROS}) ", 
    Title     = "{Collaborative Interaction Models for Optimized Human Robot Teamwork}",   
    year      = {2020}
    }
    C. Cheng, M. Mukadam, J. Issac, S. Birchfield, D. Fox, B. Boots, & N. Ratliff. RMPflow: A Geometric Framework for Generation of Multi-Task Motion Policies. 2020 IEEE Transactions on Automation Science and Engineering (T-ASE)
    Abstract: We develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs). RMPs are a class of reactive motion policies designed to parameterize non-Euclidean behaviors as dynamical systems in intrinsically nonlinear task spaces. Given a set of RMPs designed for individual tasks, RMPflow can consistently combine these local policies to generate an expressive global policy, while simultaneously exploiting sparse structure for computational efficiency. We study the geometric properties of RMPflow and provide sufficient conditions for stability. Finally, we experimentally demonstrate that accounting for the geometry of task policies can simplify classically difficult problems, such as planning through clutter on high-DOF manipulation systems.
    BibTeX:
    @inproceedings{Cheng-TSAE-20,  
    Author    = "{Cheng, Ching-An and Mukadam, Mustafa and Issac, Jan and Birchfield, Stan and Fox, Dieter and Boots, Byron and Ratliff, Nathan}",
    booktitle = {IEEE Transactions on Automation Science and Engineering}, 
    Title     = "{RMPflow: A Geometric Framework for Generation of Multi-Task Motion Policies}",   
    year      = {2020}
    }
    M. Bhardwaj, Ankur Handa, Dieter Fox, & B. Boots. Information Theoretic Model Predictive Q-Learning. 2020 Proceedings of Machine Learning Research (Presented at L4DC-2020)
    Abstract: Model-free Reinforcement Learning (RL) algorithms work well in sequential decision-making problems when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately. However, both of these assumptions can be violated in real world problems such as robotics, where querying the system can be prohibitively expensive and real-world dynamics can be difficult to model accurately. Although sim-to-real approaches such as domain randomization attempt to mitigate the effects of biased simulation,they can still suffer from optimization challenges such as local minima and hand-designed distributions for randomization, making it difficult to learn an accurate global value function or policy that directly transfers to the real world. In contrast to RL, Model Predictive Control (MPC) algorithms use a simulator to optimize a simple policy class online, constructing a closed-loop controller that can effectively contend with real-world dynamics. MPC performance is usually limited by factors such as model bias and the limited horizon of optimization. In this work, we present a novel theoretical connection between information theoretic MPC and entropy regularized RL and develop a Q-learning algorithm that can leverage biased models. We validate the proposed algorithm on sim-to-sim control tasks to demonstrate the improvements over optimal control and reinforcement learning from scratch. Our approach paves the way for deploying reinforcement learning algorithms on real robots in a systematic manner.
    BibTeX:
    @inproceedings{Bhardwaj-L4DC-20,  
    Author    = "Bhardwaj, Mohak and Handa, Ankur and Fox, Dieter and Boots, Byron",
    booktitle = {Proceedings of Machine Learning Research }, 
    Title     = "{Information Theoretic Model Predictive Q-Learning}",   
    year      = {2020}
    }
    A. Rana, A. Li, D. Fox, B. Boots, F. Ramos and N. Ratliff. Euclideanizing Flows: Diffeomorphic Reductions for Learning Stable Dynamical Systems. 2020 Proceedings of Machine Learning Research (Presented at L4DC-2020)
    Abstract: Execution of complex tasks in robotics requires motions that have complex geometric structure. We present an approach which allows robots to learn such motions from a few human demonstrations. The motions are encoded as rollouts of a dynamical system on a Riemannian manifold. Additional structure is imposed which guarantees smooth convergent motions to a goal location. The aforementioned structure involves viewing motions on an observed Riemannian manifold as deformations of straight lines on a latent Euclidean space. The observed and latent spaces are related through a diffeomorphism. Thus, this paper presents an approach for learning flexible diffeomorphisms, resulting in a stable dynamical system. The efficacy of this approach is demonstrated through validation on an established benchmark as well demonstrations collected on a real-world robotic system.
    BibTeX:
    @inproceedings{Rana-L4DC-20,  
    Author    = "Rana, M Asif and Li, Anqi and Fox, Dieter and Boots, Byron and Ramos, Fabio and Ratliff, Nathan.",
    booktitle = {Proceedings of Machine Learning Research }, 
    Title     = "{Euclideanizing Flows: Diffeomorphic Reductions for Learning Stable Dynamical Systems}",   
    year      = {2020}
    }
    A. Foris, N. Wagener, B. Boots, & A. Mazumdar. Exploiting Singular Configurations for Controllable, Low-Power Friction Enhancement on Unmanned Ground Vehicles. 2020 IEEE Robotics and Automation Letters (Presented at ICRA-2020)
    Abstract: This paper describes the design, validation, and performance of a new type of adaptive wheel morphology for unmanned ground vehicles. Our adaptive wheel morphology uses a spiral cam to create a system that enables controllable deployment of high friction surfaces. The overall design is modular, battery powered, and can be mounted directly to the wheels of a vehicle without additional wiring. The use of a tailored cam profile exploits a singular configuration to minimize power consumption when deployed and protects the actuator from external forces. Component-level experiments demonstrate that friction on ice and grass can be increased by up to 200%. Two prototypes were also incorporated directly into a 1:5 scale radio-controlled rally car. The devices were able to controllably deploy, increase friction, and greatly improve acceleration capacity on a slippery, synthetic ice surface.
    BibTeX:
    @inproceedings{Foris-ICRA-20,  
    Author    = "Adam Foris and Nolan Wagener and Byron Boots  and Anirban Mazumdar.",
    booktitle = {Proceedings of the IEEE Conference on Robotics and Automation (ICRA)}, 
    Title     = "Exploiting Singular Configurations for Controllable, Low-Power Friction Enhancement on Unmanned Ground Vehicles",   
    year      = {2020}
    }
    M. Bhardwaj, B. Boots, & M. Mukadam. Differentiable Gaussian Process Motion Planning.
    (42% Acceptance Rate)
    2020 Proceedings of the 2020 IEEE Conference on Robotics and Automation (ICRA-2020)
    Abstract: Modern trajectory optimization based approaches to motion planning are fast, easy to implement, and effective on a wide range of robotics tasks. However, trajectory optimization algorithms have parameters that are typically set in advance (and rarely discussed in detail). Setting these parameters properly can have a significant impact on the practical performance of the algorithm, sometimes making the difference between finding a feasible plan or failing at the task entirely. We propose a method for leveraging past experience to learn how to automatically adapt the parameters of Gaussian Process Motion Planning (GPMP) algorithms. Specifically, we propose a differentiable extension to the GPMP2 algorithm, so that it can be trained end-to-end from data. We perform several experiments that validate our algorithm and illustrate the benefits of our proposed learning-based approach to motion planning.
    BibTeX:
    @inproceedings{Bhardwaj-ICRA-20,  
    Author    = "Mohak Bhardwaj and Byron Boots and Mustafa Mukadam.",
    booktitle = {Proceedings of the IEEE Conference on Robotics and Automation (ICRA)}, 
    Title     = "Differential Gaussian Process Motion Planning",   
    year      = {2020}
    }
    A. Mandlekar, F. Ramos, B. Boots, F. Li, A. Garg, & D. Fox. IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data. (42% Acceptance Rate) 2020 Proceedings of the 2020 IEEE Conference on Robotics and Automation (ICRA-2020)
    Abstract: Learning from offline task demonstrations is a problem of great interest in robotics. For simple short-horizon manipulation tasks with modest variation in task instances, offline learning from a small set of demonstrations can produce controllers that successfully solve the task. However, leveraging a fixed batch of data can be problematic for larger datasets and longer-horizon tasks with greater variations. The data can exhibit substantial diversity and consist of suboptimal solution approaches. In this paper, we propose Implicit Reinforcement without Interaction at Scale (IRIS), a novel framework for learning from large-scale demonstration datasets. IRIS factorizes the control problem into a goal-conditioned low-level controller that imitates short demonstration sequences and a high-level goal selection mechanism that sets goals for the low-level and selectively combines parts of suboptimal solutions leading to more successful task completions. We evaluate IRIS across three datasets, including the RoboTurk Cans dataset collected by humans via crowdsourcing, and show that performant policies can be learned from purely offline learning.
    BibTeX:
    @inproceedings{Mandlekar-ICRA-20,  
    Author    = "Ajay Mandlekar and Fabio Ramos and Byron Boots and Fe Fe Li and Animesh Garg and Dieter Fox.",
    booktitle = {Proceedings of the IEEE Conference on Robotics and Automation (ICRA)}, 
    Title     = "IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data",   
    year      = {2020}
    }
    C. Cheng, R. Tachet des Combes, B. Boots, & G. Gordon. A Reduction from Reinforcement Learning to No-Regret Online Learning. 2020 Proceedings of the 23rd Conference on Artificial Intelligence and Statistics (AISTATS-2020)
    Abstract: We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function approximation. The first part admits a standard online-learning analysis, and the second part can be quantified independently of the learning algorithm. Therefore, the proposed reduction can be used as a tool to systematically design new RL algorithms. We demonstrate this idea by devising a simple RL algorithm based on mirror descent and the generative-model oracle. Furthermore, this algorithm admits a direct extension to linearly parameterized function approximators for large-scale applications, with computation and sample complexities independent of states and actions, though at the cost of potential approximation bias.
    BibTeX:
    @article{Cheng20A,
    Author = "Ching-An Cheng and Remi Tachet des Combes and Byron Boots and Geoff Gordon.", booktitle = {Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics},
    Title = "A Reduction from Reinforcement Learning to No-Regret Online Learning",
    year = {2020}
    }
    S. Adhikary, S. Srinivasan, G. Gordon, & B. Boots. Expressiveness and Learning of Hidden Quantum Markov Models. 2020 Proceedings of the 23rd Conference on Artificial Intelligence and Statistics (AISTATS-2020)
    Abstract: Extending classical probabilistic reasoning using the quantum mechanical view of probability has been of recent interest, particularly in the development of hidden quantum Markov models (HQMMs) to model stochastic processes. However, there has been little progress in characterizing the expressiveness of such models and learning them from data. We tackle these problems by showing that HQMMs are a special subclass of the general class of observable operator models (OOMs) that do not suffer from the negative probability problem by design. We also provide a feasible retraction-based learning algorithm for HQMMs using constrained gradient descent on the Stiefel manifold of model parameters. We demonstrate that this approach is faster and scales to larger models than previous learning algorithms for HQMMs.
    BibTeX:
    @article{Adhikary20A,
    Author = "Sandesh Adhikary and Siddarth Srinivasan and Geoff Gordon and Byron Boots.", booktitle = {Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics},
    Title = "Expressiveness and Learning of Hidden Quantum Markov Models",
    year = {2020}
    }
    C. Cheng, J. Lee, K. Goldberg, & B. Boots. Online Learning with Continuous Variations: Dynamic Regret and Reductions. 2020 Proceedings of the 23rd Conference on Artificial Intelligence and Statistics (AISTATS-2020)
    Abstract: We study the dynamic regret of a new class of online learning problems, in which the gradient of the loss function changes continuously across rounds with respect to the learner's decisions. This setup is motivated by the use of online learning as a tool to analyze the performance of iterative algorithms. Our goal is to identify interpretable dynamic regret rates that explicitly consider the loss variations as consequences of the learner's decisions as opposed to external constraints. We show that achieving sublinear dynamic regret in general is equivalent to solving certain variational inequalities, equilibrium problems, and fixed-point problems. Leveraging this identification, we present necessary and sufficient conditions for the existence of efficient algorithms that achieve sublinear dynamic regret. Furthermore, we show a reduction from dynamic regret to both static regret and convergence rate to equilibriums in the aforementioned problems, which allows us to analyze the dynamic regret of many existing learning algorithms in few steps.
    BibTeX:
    @article{Cheng20B,
    Author = "Ching-An Cheng and Jonathan Lee and Ken Goldberg and Byron Boots.", booktitle = {Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics},
    Title = "Online Learning with Continuous Variations: Dynamic Regret and Reductions",
    year = {2020}
    }
    A. H. Qureshi, J. J. Johnson, Y. Qin, T. Henderson, B. Boots, & M. C. Yip. Composing Task-Agnostic Policies with Deep Reinforcement Learning. (26% Acceptance Rate) 2020 Proceedings of the 8th International Conference on Learning Representations (ICLR-2020)
    Abstract: The composition of elementary behaviors to solve challenging transfer learning problems is one of the key elements in building intelligent machines. To date, there has been plenty of work on learning task-specific policies or skills but almost no focus on composing necessary, task-agnostic skills to find a solution to new problems. In this paper, we propose a novel deep reinforcement learning-based skill transfer and composition method that takes the agent's primitive policies to solve unseen tasks. We evaluate our method in difficult cases where training policy through standard reinforcement learning (RL) or even hierarchical RL is either not feasible or exhibits high sample complexity. We show that our method not only transfers skills to new problem settings but also solves the challenging environments requiring both task planning and motion control with high data efficiency.
    BibTeX:
    @inproceedings{Quereshi19a,
    Author = "Ahmed H. Qureshi and Jacob J. Johnson and Yuzhe Qin
    and Taylor Henderson and Byron Boots and Michael C. Yip.", booktitle = {Proceedings of the 2019 International Conference on Learning Representations (ICLR)},
    Title = "Composing Task-Agnostic Policies with Deep Reinforcement Learning",
    year = {2020}
    }
    M. A. Rana, A. Li, H. Ravichandar, M. Mukadam, S. Chernova, B. Boots, N. Ratliff, & D. Fox. Learning Reactive Motion Policies in Multiple Task Spaces from Human Demonstrations. (27% Acceptance Rate) 2019 Proceedings of the 3rd Annual Conference on Robot Learning
    (CoRL-2019)
    Abstract: Complex manipulation skills in constrained environments often involve several subtasks, requiring non-trivial and coordinated movements of different parts of the robot. In this work, we address the challenges associated with learning and reproducing such complex skills. We contribute a learning-based framework that simultaneously captures desired behaviors in relevant subtask spaces in the form of inherently stable reactive policies. Our approach to motion generation involves geometrically consistent combinations of all the subtask policies, resulting in a stable global policy in the configuration space. To this end, our framework explicitly considers the underlying geometry of each subtask space for policy resolution. Further, our framework allows for combinations of learned and user-specified policies. We demonstrate the necessity and efficacy of the proposed approach in the context of multiple constrained manipulation tasks performed by a Franka robot.
    BibTeX:
    @inproceedings{Rana19A,
    Author = "Muhammad Asif Rana and Anqi Li and Harish Ravichandar and Mustafa Mukadam
    and Sonia Chernova and Dieter Fox and Byron Boots and Nathan Ratliff.", booktitle = {Proceedings of the 2019 Conference on Robot Learning (CoRL)},
    Title = "Learning Reactive Motion Policies in Multiple Task Spaces from Human Demonstrations",
    year = {2019}
    }
    M. Mukadam, C. Cheng, D. Fox, B. Boots, & N. Ratliff. Riemannian Motion Policy Fusion through Learnable Lyapunov Function Reshaping. (27% Acceptance Rate) 2019 Proceedings of the 3rd Annual Conference on Robot Learning
    (CoRL-2019)
    Abstract: RMPflow is a recently proposed policy-fusion framework based on differential geometry. While RMPflow has demonstrated promising performance, it requires the user to provide sensible subtask policies as Riemannian motion policies (RMPs: a motion policy and an importance matrix function), which can be a difficult design problem in its own right. We propose RMPfusion, a variation of RMPflow, to address this issue. RMPfusion supplements RMPflow with weight functions that can hierarchically reshape the Lyapunov functions of the subtask RMPs according to the current configuration of the robot and environment. This extra flexibility can remedy imperfect subtask RMPs provided by the user, improving the combined policy's performance. These weight functions can be learned by back-propagation. Moreover, we prove that, under mild restrictions on the weight functions, RMPfusion always yields a globally Lyapunov-stable motion policy. This implies that we can treat RMPfusion as a structured policy class in policy optimization that is guaranteed to generate stable policies, even during the immature phase of learning. We demonstrate these properties of RMPfusion in imitation learning experiments both in simulation and on a real-world robot.
    BibTeX:
    @inproceedings{Cheng19C,
    Author = "Mustafa Mukadam and Ching-An Cheng and Dieter Fox and Byron Boots and Nathan Ratliff.", booktitle = {Proceedings of the 2019 Conference on Robot Learning (CoRL)},
    Title = "Riemannian Motion Policy Fusion through Learnable Lyapunov Function Reshaping",
    year = {2019}
    }
    C. Cheng, X. Yan, & B. Boots. Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods. (27% Acceptance Rate) 2019 Proceedings of the 3rd Annual Conference on Robot Learning
    (CoRL-2019)
    Abstract: Policy gradient methods have demonstrated success in reinforcement learning tasks that have high-dimensional continuous state and action spaces. However, policy gradient methods are also notoriously sample inefficient. This can be attributed, at least in part, to the high variance in estimating the gradient of the task objective with Monte Carlo methods. Previous research has endeavored to contend with this problem by studying control variates (CVs) that can reduce the variance of estimates without introducing bias, including the early use of baselines, state dependent CVs, and the more recent state-action dependent CVs. In this work, we analyze the properties and drawbacks of previous CV techniques and, surprisingly, we find that these works have overlooked an important fact that Monte Carlo gradient estimates are generated by trajectories of states and actions. We show that ignoring the correlation across the trajectories can result in suboptimal variance reduction, and we propose a simple fix: a class of "trajectory-wise" CVs, that can further drive down the variance. We show that constructing trajectory-wise CVs can be done recursively and requires only learning state-action value functions like the previous CVs for policy gradient. We further prove that the proposed trajectory-wise CVs are optimal for variance reduction under reasonable assumptions.
    BibTeX:
    @inproceedings{Cheng19C,
    Author = "Ching-An Cheng and Xinyan Yan and Byron Boots.", booktitle = {Proceedings of the 2019 Conference on Robot Learning (CoRL)},
    Title = "Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods",
    year = {2019}
    }
    A. Li, M. Mukadam, M. Egerstedt, & B. Boots. Multi-Objective Policy Generation for Multi-Robot Systems Using Riemannian Motion Policies. 2019 Proceedings of the 19th International Symposium on Robotics Research
    (ISRR-19)
    Abstract: In the multi-robot systems literature, control policies are typically obtained through descent rules for a potential function which encodes a single team-level objective. However, for multi-objective tasks, it can be hard to design a single control policy that fulfills all the objectives. In this paper, we exploit the idea of decomposing the multi-objective task into a set of simple subtasks. We associate each subtask with a potentially lower-dimensional manifold, and design Riemannian Motion Policies (RMPs) on these manifolds. Centralized and decentralized algorithms are proposed to combine these policies into a final control policy on the configuration space that the robots can execute. We propose a collection of RMPs for simple multi-robot tasks that can be used for building controllers for more complicated tasks. In particular, we prove that many existing multi-robot controllers can be closely approximated by combining the proposed RMPs. Theoretical analysis shows that the multi-robot system under the generated control policy is stable. The proposed framework is validated through both simulated tasks and robotic implementations.
    BibTeX:
    @article{Li19B,
       author    = {Anqi Li and Mustafa Mukadam and Magnus Egerstedt and Byron Boots},
       title     = {Multi-Objective Policy Generation for Multi-Robot Systems Using Riemannian Motion Policies},
       booktitle = {Proceedings of the International Symposium on Robotics Research (ISRR)},   
            year = {2019}
    }
    
    Y. Pan, C. Cheng, K. Saigol, K. Lee, X. Yan, E. Theodorou, & B. Boots. Imitation Learning for Agile Autonomous Driving. 2019 The International Journal of Robotics Research (IJRR)
    Abstract: We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost on-board sensors. By imitating a model predictive controller equipped with advanced sensors, we train a deep neural network control policy to map raw, high-dimensional observations to continuous steering and throttle commands. Compared with recent approaches to similar tasks, our method requires neither state estimation nor on-the-fly planning to navigate the vehicle. Our approach relies on, and experimentally validates, recent imitation learning theory. Empirically, we show that policies trained with online imitation learning overcome well-known challenges related to covariate shift and generalize better than policies trained with batch imitation learning. Built on these insights, our autonomous driving system demonstrates successful high-speed off-road driving, matching the state-of-the-art performance.
    BibTeX:
    @inproceedings{Pan-IJRR-19,
    Author = "Yunpeng Pan and Ching-An Cheng and Kamil Saigol and Keuntaek Lee and Xinyan Yan and Evangelos Theodorou and Byron Boots", booktitle = {The International Journal of Robotics Research (IJRR)},
    Title = "Imitation Learning for Agile Autonomous Driving.",
    year = {2019}
    }
    A. Shaban, A. Rahimi, S. Gould, B. Boots, & R. Hartley. Learning to Find Common Objects Across Image Collections. (25% Acceptance Rate) 2019 Proceedings of the International Conference on Computer Vision (ICCV-2019)
    Abstract: We address the problem of finding a set of images containing a common, but unknown, object category from a collection of image proposals. Our formulation assumes that we are given a collection of bags where each bag is a set of image proposals. Our goal is to select one image from each bag such that the selected images are of the same object category. We model the selection as an energy minimization problem with unary and pairwise potential functions. Inspired by recent few-shot learning algorithms, we propose an approach to learn the potential functions directly from the data. Furthermore, we propose a fast and simple greedy inference algorithm for energy minimization. We evaluate our approach on few-shot common object recognition and object co-localization tasks. Our experiments show that learning the pairwise and unary terms greatly improves the performance of the model over several well-known methods for these tasks. The proposed greedy optimization algorithm achieves performance comparable to state-of-the-art structured inference algorithms while being ~10 times faster.
    BibTeX:
    @article{Shaban19A,
                  author    = {Amirreza Shaban and Amir Rahimi and Stephen Gould and Byron Boots and Richard Hartley,
      title     = {Learning to Find Common Objects Across Image Collections},
      booktitle = {Proceedings of the International Conference on Computer Vision (ICCV)},
      year      = {2019}
    }
    
    A. Li, C. Cheng, B. Boots, & M. Egerstedt. Stable, Concurrent Controller Composition for Multi-Objective Robotic Tasks. (52% Acceptance Rate) 2019 Proceedings of the 58th Conference on Decision and Control (CDC-2019)
    Abstract: Robotic systems often need to consider multiple tasks concurrently. This challenge calls for control synthesis algorithms that are capable of fulfilling multiple control specifications simultaneously while maintaining the stability of the overall system. In this paper, we decompose complex, multi-objective tasks into subtasks, where individual subtask controllers are designed independently and then combined to generate the overall control policy. In particular, we adopt Riemannian Motion Policies (RMPs), a recently proposed controller structure in robotics, and, RMPflow, its associated computational framework for combining RMP controllers. We re-establish and extend the stability results of RMPflow through a rigorous Control Lyapunov Function (CLF) treatment. We then show that RMPflow can stably combine individually designed subtask controllers that satisfy certain CLF constraints. This new insight leads to an efficient CLF-based computational framework to generate stable controllers that consider all the subtasks simultaneously. Compared with the original usage of RMPflow, our framework provides users the flexibility to incorporate design heuristics through nominal controllers for the subtasks. We validate the proposed computational framework through numerical simulation and robotic implementation.
    BibTeX:
    @article{Li19A,
                  author    = {Anqi Li and Ching-An Cheng and Byron Boots and Magnus Egerstedt},
      title     = {Stable, Concurrent Controller Composition for Multi-Objective Robotic Tasks},
      booktitle = {Proceedings of Conference on Decision and Control (CDC)},
      year      = {2019}
    }
    
    K. Kolur, S. Chintalapudi, B. Boots, & M. Mukadam. Online Motion Planning Over Multiple Homotopy Classes with Gaussian Process Inference. (45% Acceptance Rate) 2019 Proceedings of the International Conference on Intelligent Robots and Systems (IROS-2019)
    Abstract: Efficient planning in dynamic and uncertain environments is a fundamental challenge in robotics. In the context of trajectory optimization, the feasibility of paths can change as the environment evolves. Therefore, it can be beneficial to reason about multiple possible paths simultaneously. We build on prior work that considers graph-based trajectories to find solutions in multiple homotopy classes concurrently. Specifically, we extend this previous work to an online setting where the unreachable (in time) part of the graph is pruned and the remaining graph is reoptimized at every time step. As the robot moves within the graph on the path that is most promising, the pruning and reoptimization allows us to retain candidate paths that may become more viable in the future as the environment changes, essentially enabling the robot to dynamically switch between numerous homotopy classes. We compare our approach against the prior work without the homotopy switching capability and show improved performance across several metrics in simulation with a 2D robot in multiple dynamic environments under noisy measurements.
    BibTeX:
    @article{Kolur19a,
    author    = {Keshav Kolur and Sahit Chintalapudi and Byron Boots and Mustafa Mukadam},
    title     = {Online Motion Planning Over Multiple Homotopy Classes with Gaussian Process Inference},
    booktitle = {Proceedings of the International Conference on Intelligent Robots and Systems (IROS)},
    year      = {2019}
    }
    
    N. Wagener, C. Cheng, J. Sacks, & B. Boots. An Online Learning Approach to Model Predictive Control.
    (Winner of Best Student Paper & Finalist for Best Systems Paper)
    2019 Proceedings of Robotics: Science and Systems XV (RSS-2019)
    Abstract: Model predictive control (MPC) is a powerful technique for solving dynamic control tasks. In this paper, we show that there exists a close connection between MPC and online learning, an abstract theoretical framework for analyzing online decision making in the optimization literature. This new perspective provides a foundation for leveraging powerful online learning algorithms to design MPC algorithms. Specifically, we propose a new algorithm based on dynamic mirror descent (DMD), an online learning algorithm that is designed for non-stationary setups. Our algorithm, Dynamic Mirror Decent Model Predictive Control (DMD-MPC), represents a general family of MPC algorithms that includes many existing techniques as special instances. DMD-MPC also provides a fresh perspective on previous heuristics used in MPC and suggests a principled way to design new MPC algorithms. In the experimental section of this paper, we demonstrate the flexibility of DMD-MPC, presenting a set of new MPC algorithms on a simple simulated cartpole and a simulated and real-world aggressive driving task.
    BibTeX:
    @article{Wagener19a,
    author    = {Nolan Wagener and Ching-An Cheng and
                 Jacob Sacks and Byron Boots},
    title     = {An Online Learning Approach to Model Predictive Control},
    booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
    year      = {2019}
    }
    
    M. Bhardwaj, S. Choudhury, B. Boots, & S. Srinivasa. Leveraging Experience in Lazy Search.
    (31% Acceptance Rate)
    2019 Proceedings of Robotics: Science and Systems XV (RSS-2019)
    Abstract: Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a good edge selector chooses edges that are not only likely to be invalid, but also eliminates future paths from consideration. We wish to learn such a selector by leveraging prior experience. We formulate this problem as a Markov Decision Process (MDP) on the state of the search problem. While solving this large MDP is generally intractable, we show that if the latent edge status are known, then we can compute oracular selectors that can solve the MDP during training. With access to such oracles we use imitation learning to find effective policies. If new search problems are sufficiently similar to problems solved during training the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly. We evaluate our algorithms on a wide range of 2-D and 7-D problems, and show that the learned selector outperforms baseline commonly used heuristics.
    BibTeX:
    @article{Bhardwaj19a,
    author    = {Mohak Bhardwaj and Sanjiban Chowdhury and
                 Byron Boots and Siddhartha Srinivasa},
    title     = {Leveraging Experience in Lazy Search},
    booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
    year      = {2019}
    }
    
    W. Sun, A. Vemula, B. Boots, & J. A. Bagnell. Provably Efficient Imitation Learning from Observation Alone. (Selected for Long Talk: 5% Acceptance Rate) 2019 Proceedings of the 36th International Conference on Machine Learning
    (ICML-2019)
    Abstract: We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement learning settings or settings that require access to a near-optimal reset distribution. We also demonstrate the efficacy of FAIL on multiple OpenAI Gym control tasks. Our implementation of FAIL can be found in supplementary materials with scripts to reproduce all experimental results.
    BibTeX:
    @article{Sun-ICML-19,
    author    = {Wen Sun and Anirudh Vemula and Byron Boots and J. Andrew Bagnell},
    title     = {Provably Efficient Imitation Learning from Observation Alone},
    booktitle = {Proceedings of the 2019 International Conference on Machine Learning (ICML)},
    year      = {2019}
    }
    
    C. Cheng, X. Yan, N. Ratliff, & B. Boots. Predictor-Corrector Policy Optimization.
    (Selected for Long Talk: 5% Acceptance Rate)
    2019 Proceedings of the 36th International Conference on Machine Learning
    (ICML-2019)
    Abstract: We present a predictor-corrector framework, called PicCoLO, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning. The new "PicCoLOed" algorithm optimizes a policy by recursively repeating two steps: In the Prediction Step, the learner uses a model to predict the unseen future gradient and then applies the predicted estimate to update the policy; in the Correction Step, the learner runs the updated policy in the environment, receives the true gradient, and then corrects the policy using the gradient error. Unlike previous algorithms, PicCoLO corrects for the mistakes of using imperfect predicted gradients and hence does not suffer from model bias. The development of PicCoLO is made possible by a novel reduction from predictable online learning to adversarial online learning, which provides a systematic way to modify existing first-order algorithms to achieve the optimal regret with respect to predictable information. We show, in both theory and simulation, that the convergence rate of several first-order model-free algorithms can be improved by PicCoLO.
    BibTeX:
    @article{Cheng-ICML-19,
    author    = {Ching-An Cheng and Xinyan Yan and Nathan Ratliff and Byron Boots},
    title     = {Predictor-Corrector Policy Optimization},
    booktitle = {Proceedings of the 2019 International Conference on Machine Learning (ICML)},
    year      = {2019}
    }
    
    A. Lambert, M. Mukadam, B. Sundaralingam, N. Ratliff, B. Boots, & D. Fox. Joint Inference of Kinematic and Force Trajectories with Visuo-Tactile Sensing. (44% Acceptance Rate) 2019 Proceedings of the 2019 IEEE Conference on Robotics and Automation (ICRA-2019)
    Abstract:To perform complex tasks, robots must be able to interact with and manipulate their surroundings. One of the key challenges in accomplishing this is robust state estimation during physical interactions, where the state involves not only the robot and the object being manipulated, but also the state of the contact itself. In this work, within the context of planar pushing, we extend previous inference-based approaches to state estimation in several ways. We estimate the robot, object, and the contact state on multiple manipulation platforms configured with a vision-based articulated model tracker, and either a biomimetic tactile sensor or a force-torque sensor. We show how to fuse raw measurements from the tracker and tactile sensors to jointly estimate the trajectory of the kinematic states and the forces in the system via probabilistic inference on factor graphs, in both batch and incremental settings. We perform several benchmarks with our framework and show how performance is affected by incorporating various geometric and physics based constraints, occluding vision sensors, or injecting noise in tactile sensors. We also compare with prior work on multiple datasets and demonstrate that our approach can effectively optimize over multi-modal sensor data and reduce uncertainty to find better state estimates.
    BibTeX:
    @inproceedings{Lambert-ICRA-19,  
    Author    = "Alexander Lambert and Mustafa Mukadam and Balakumar Sundaralingam and 
                 Nathan Ratliff and Byron Boots and Dieter Fox.",
    booktitle = {Proceedings of the IEEE Conference on Robotics and Automation (ICRA)},
    Title     = "Joint Inference of Kinematic and Force Trajectories with Visuo-Tactile Sensing",  
    year      = {2019}
    }
    B. Sundaralingam, A. Lambert, A. Handa, B. Boots, T. Hermans, S. Birchfield, N. Ratliff, & D. Fox Robust Learning of Tactile Force Estimation through Robot Interaction. (Finalist for Best Manipulation Paper) 2019 Proceedings of the 2019 IEEE Conference on Robotics and Automation (ICRA-2019)
    Abstract: Current methods for estimating force from tactile sensor signals are either inaccurate analytic models or task-specific learned models. In this paper, we explore learning a robust model that maps tactile sensor signals to force. We specifically explore learning a mapping for the SynTouch BioTac sensor via neural networks. We propose a voxelized input feature layer for spatial signals and leverage information about the sensor surface to regularize the loss function. To learn a robust tactile force model that transfers across tasks, we generate ground truth data from three different sources: (1) the BioTac rigidly mounted to a force torque (FT) sensor, (2) a robot interacting with a ball rigidly attached to the same FT sensor, and (3) through force inference on a planar pushing task by formalizing the mechanics as a system of particles and optimizing over the object motion. A total of 140k samples were collected from the three sources. We achieve a median angular accuracy of 3.5 degrees in predicting force direction (66% improvement over the current state of the art) and a median magnitude accuracy of 0.06 N (93% improvement) on a test dataset. Additionally, we evaluate the learned force model in a force feedback grasp controller performing object lifting and gentle placement.
    BibTeX:
    @inproceedings{Sundaralingam-ICRA-19,  
    Author    = "Balakumar Sundaralingam1 and Alexander Lambert and Ankur Handa and 
    			  Byron Boots and Tucker Hermans and Stan Birchfield and Nathan Ratliff and Dieter Fox1.",
    booktitle = {Proceedings of the IEEE Conference on Robotics and Automation (ICRA)}, 
    Title     = "Robust Learning of Tactile Force Estimation through Robot Interaction",   
    year      = {2019}
    }
    C. Cheng, X Yan, E. Theodorou, & B. Boots. Accelerating Imitation Learning with Predictive Models.
    (33% Acceptance Rate)
    2019 Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS-2019)
    Abstract: Sample efficiency is critical in solving real-world reinforcement learning problems, where agent-environment interactions can be costly. Imitation learning from expert advice has proved to be an effective strategy for reducing the number of interactions required to train a policy. Online imitation learning, which interleaves policy evaluation and policy optimization, is a particularly effective technique with provable performance guarantees. In this work, we seek to further accelerate the convergence rate of online imitation learning, thereby making it more sample efficient. We propose two model-based algorithms inspired by Follow-the-Leader (FTL) with pre- diction: MoBIL-VI based on solving variational inequalities and MoBIL-Prox based on stochastic first-order updates. These two methods leverage a model to predict future gradients to speed up policy learning. When the model oracle is learned online, these algorithms can provably accelerate the best known convergence rate up to an order. Our algorithms can be viewed as a generalization of stochastic Mirror-Prox (Juditsky et al., 2011), and admit a simple constructive FTL-style analysis of performance.
    BibTeX:
    @inproceedings{Cheng-AISTATS-19,
    Author = "Ching-An Cheng and Xinyan Yan and Evangelos Theodorou and Byron Boots.", booktitle = {Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics},
    Title = "Accelerating Imitation Learning with Predictive Models",
    year = {2019}
    }
    A. Shaban, C. Cheng, N. Hatch, & B. Boots. Truncated Backpropogation for Bilevel Optimization.
    (33% Acceptance Rate)
    2019 Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS-2019)
    Abstract: Bilevel optimization has been recently revisited for designing and analyzing algorithms in hyperparameter tuning and meta learning tasks. However, due to its nested structure, evaluating exact gradients for high-dimensional problems is computationally challenging. One heuristic to circumvent this difficulty is to use the approximate gradient given by performing truncated back-propagation through the iterative optimization procedure that solves the lower-level problem. Although promising empirical performance has been reported, its theoretical properties are still unclear. In this paper, we analyze the properties of this family of approximate gradients and establish sufficient conditions for convergence. We validate this on several hyperparameter tuning and meta learning tasks. We find that optimization with the approximate gradient computed using few-step back-propagation often performs comparably to optimization with the exact gradient, while requiring far less memory and half the computation time.
    BibTeX:
    @inproceedings{Shaban-AISTATS-19,
    Author = "Amirreza Shaban and Ching-An Cheng and Nathan Hatch and Byron Boots.", booktitle = {Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics},
    Title = "Truncated Backpropogation for Bi-Level Optimization",
    year = {2019}
    }
    A. Quereshi, B. Boots, & M. Yip. Adversarial Imitation via Variational Inverse Reinforcement Learning. (31% Acceptance Rate) 2019 Proceedings of the 7th International Conference on Learning Representations (ICLR-2019)
    Abstract: We consider a problem of learning the reward and policy from expert examples under unknown dynamics in high-dimensional scenarios. Our proposed method builds on the framework of generative adversarial networks and introduces empowerment-regularized maximum-entropy inverse reinforcement learning to learn near-optimal rewards and policies. Empowerment-based regularization prevents the policy from overfitting expert demonstration, thus leads to a generalized behavior which results in learning near-optimal rewards. Our method simultaneously learns empowerment through variational information maximization along with the reward and policy under the adversarial learning formulation. We evaluate our approach on various high-dimensional complex control tasks. We also test our learned rewards in challenging transfer learning problems where training and testing environments are made to be different from each other in terms of dynamics or structure. The results show that our proposed method not only learns near optimal rewards and policies that match expert behavior but also performs better than other state-of-the-art inverse reinforcement learning algorithms.
    BibTeX:
    @inproceedings{Quereshi-ICLR-19,
    Author = "Ahmed Quereshi and Byron Boots and Michael C. Yip", booktitle = {Proceedings of the Seventh International Conference on Learning Representations (ICLR)},
    Title = "Adversarial Imitation via Variational Inverse Reinforcement Learning.",
    year = {2019}
    }
    C. Cheng, M. Mukadam, J. Issac, S. Birchfield, D. Fox, B. Boots, & N. Ratliff. RMPflow: A Computational Graph for Automatic Motion Policy Generation. (52% Acceptance Rate) 2018 Proceedings of the 13th International Workshop on the Algorithmic Foundations of Robotics (WAFR-2018)
    Abstract: We develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs). RMPs are a class of reactive motion policies designed to parameterize non-Euclidean behaviors as dynamical systems in intrinsically nonlinear task spaces. Given a set of RMPs designed for individual tasks, RMPflow can consistently combine these local policies to generate an expressive global policy, while simultaneously exploiting sparse structure for computational efficiency. We study the geometric properties of RMPflow and provide sufficient conditions for stability. Finally, we experimentally demonstrate that accounting for the geometry of task policies can simplify classically difficult problems, such as planning through clutter on high-DOF manipulation systems.
    BibTeX:
    @inproceedings{Cheng-WAFR-18,
    Author = "Ching-An Cheng and Mustafa Mukadam and Jan Issac and Stan Birchfield and Dieter Fox and Byron Boots and Nathan Ratliff.", journal = {Proceedings of the 13th ANnual Workshop on the Algorithmic Foundations of Robotics (WAFR)},
    Title = "RMPflow: A Computational Graph for Automatic Motion Policy Generation.",
    year = {2018}
    }
    B. Amos, I. D. Jimenez, J. Sacks, B. Boots, & Z. Kolter. Differentiable MPC for End-to-end Planning and Control.
    (21% Acceptance Rate)
    2018 Proceedings of Advances in Neural Information Processing Systems 32 (NeurIPS-2018)
    Abstract: We present foundations for using Model Predictive Control (MPC) as a differentiable policy class for reinforcement learning. This provides one way of leveraging and combining the advantages of model-free and model-based approaches. Specifically, we differentiate through MPC by using the KKT conditions of the convex approximation at a fixed point of the controller. Using this strategy, we are able to learn the cost and dynamics of a controller via end-to-end learning. Our experi- ments focus on imitation learning in the pendulum and cartpole domains, where we learn the cost and dynamics terms of an MPC policy class. We show that our MPC policies are significantly more data-efficient than a generic neural network and that our method is superior to traditional system identification in a setting where the expert is unrealizable.
    BibTeX:
    @inproceedings{Amos-NIPS-18,
    Author = "Brandon Amos and Ivan Dario Jimenez Rodriguez and Jacob Sacks and Byron Boots and Zico Kolter.", booktitle = {Proceedings of Advances in Neural Information Processing Systems 32 (NeurIPS)},
    Title = "Differentiable MPC for End-to-end Planning and Control.",
    year = {2018}
    }
    W. Sun, G. J. Gordon, B. Boots, & J. A. Bagnell. Dual Policy Iteration.
    (21% Acceptance Rate)
    2018 Proceedings of Advances in Neural Information Processing Systems 32 (NeurIPS-2018)
    Abstract: A novel class of Approximate Policy Iteration (API) algorithms have recently demonstrated impressive practical performance (e.g., ExIt, AlphaGo-Zero). This new family of algorithms maintains, and alternately optimizes, two policies: a fast, reactive policy (e.g., a deep neural network) deployed at test time, and a slow, non-reactive policy (e.g., Tree Search), that can plan multiple steps ahead. The reactive policy is updated under supervision from the non-reactive policy, while the non-reactive policy is improved via guidance from the reactive policy. In this work we study this class of Dual Policy Iteration (DPI) strategy in an alternating optimization framework and provide a convergence analysis that extends existing API theory. We also develop a special instance of this framework which reduces the update of non-reactive policies to model-based optimal control using learned local models, and provides a theoretically sound way of unifying model-free and model-based RL approaches with unknown dynamics. We demonstrate the efficacy of our approach on various continuous control Markov Decision Processes.
    BibTeX:
    @inproceedings{Sun-NIPS-18,
    Author = "Wen Sun and Geoffrey J. Gordon and Byron Boots and J. Andrew Bagnell", booktitle = {Proceedings of Advances in Neural Information Processing Systems 32 (NeurIPS)},
    Title = "Dual Policy Iteration.",
    year = {2018}
    }
    S. Srinivasan, C. Downey, & B. Boots. Learning and Inference in Hilbert Space with Quantum Graphical Models.
    (21% Acceptance Rate)
    2018 Proceedings of Advances in Neural Information Processing Systems 32 (NeurIPS-2018)
    Abstract: Quantum Graphical Models (QGMs) generalize classical graphical models by adopting the formalism for reasoning about uncertainty from quantum mechanics. Unlike classical graphical models, QGMs represent uncertainty with density matrices in complex Hilbert spaces. Hilbert space embeddings (HSEs) also generalize Bayesian inference in Hilbert spaces. We investigate the link between QGMs and HSEs and show that the sum rule and Bayes rule for QGMs are equivalent to the kernel sum rule in HSEs and a special case of Nadaraya-Watson kernel regression, respectively. We show that these operations can be kernelized, and use these insights to propose a Hilbert Space Embedding of Hidden Quantum Markov Models (HSE-HQMM) to model dynamics. We present experimental results showing that HSE-HQMMs are competitive with state-of-the-art models like LSTMs and PSRNNs on several datasets, while also providing a nonparametric method for maintaining a probability distribution over continuous-valued features.
    BibTeX:
    @inproceedings{Srinivasan-NIPS-18,
    Author = "Siddarth Srinivasan and Carlton Downey and Byron Boots", booktitle = {Proceedings of Advances in Neural Information Processing Systems 32 (NeurIPS)},
    Title = "Learning and Inference in Hilbert Space with Quantum Graphical Models.",
    year = {2018}
    }
    H. Samilbeni, C. Cheng, B. Boots, & M. Deisenroth. Orthogonally Decoupled Variational Gaussian Processes.
    (21% Acceptance Rate)
    2018 Proceedings of Advances in Neural Information Processing Systems 32 (NeurIPS-2018)
    Abstract: Gaussian processes (GPs) provide a powerful non-parametric framework for reasoning over functions. Despite appealing theory, its superlinear computational and memory complexities have presented a long-standing challenge. State-of-the-art sparse variational inference methods trade modeling accuracy against complexity. However, the complexities of these methods still scale superlinearly in the number of basis functions, implying that that sparse GP methods are able to learn from large datasets only when a small model is used. Recently, a decoupled approach was proposed that removes the unnecessary coupling between the complexities of modeling the mean and the covariance functions of a GP. It achieves a linear complexity in the number of mean parameters, so an expressive posterior mean function can be modeled. While promising, this approach suffers from optimization difficulties due to ill-conditioning and non-convexity. In this work, we propose an alternative decoupled parametrization. It adopts an orthogonal basis in the mean function to model the residues that cannot be learned by the standard coupled approach. Therefore, our method extends, rather than replaces, the coupled approach to achieve strictly better performance. This construction admits a straightforward natural gradient update rule, so the structure of the information manifold that is lost during decoupling can be leveraged to speed up learning. Empirically, our algorithm demonstrates significantly faster convergence in multiple experiments.
    BibTeX:
    @inproceedings{Cheng-NIPS-18,
    Author = "Hugh Samilbeni and Ching-An Cheng and Byron Boots and Marc Deisenroth", booktitle = {Proceedings of Advances in Neural Information Processing Systems 32 (NeurIPS)},
    Title = "Orthogonally Decoupled Variational Gaussian Processes.",
    year = {2018}
    }
    J. Dong, B. Boots, F. Dellaert, R. Chandra, & S. Sinha. Learning to Align Images using Weak Geometric Supervision. 2018 Proceedings of the International Conference on 3D Vision (3DV-2018)
    Abstract: Image alignment tasks require accurate pixel correspondences, which are usually recovered by matching local feature descriptors. Such descriptors are often derived using supervised learning on existing datasets with ground truth correspondences. However, the cost of creating such datasets is usually prohibitive. In this paper, we propose a new approach to align two images related by an unknown 2D homography where the local descriptor is learned from scratch from the images and the homography is estimated simultaneously. Our key insight is that a siamese convolutional neural network can be trained jointly while iteratively updating the homography parameters by optimizing a single loss function. Our method is currently weakly supervised because the input images need to be roughly aligned. We have used this method to align images of different modalities such as RGB and near-infra-red (NIR) without using any prior labeled data. Images automatically aligned by our method were then used to train descriptors that generalize to new images. We also evaluated our method on RGB images. On the HPatches benchmark, our method achieves comparable accuracy to deep local descriptors that were trained offline in a supervised setting.
    BibTeX:
    @inproceedings{Dong-3DV-18,
    Author = "Jing Dong and Byron Boots and Frank Dellaert and Ranveer Chandra and Sudipta Sinha", booktitle = {Proceedings of the International Conference on 3D Vision (3DV)},
    Title = "Learning to Align Images using Weak Geometric Supervision.",
    year = {2018}
    }
    M. Mukadam, J. Dong, X. Yan, F. Dellaert, & B. Boots. Continuous-time Gaussian Process Motion Planning via Probabilistic Inference. (IJRR Paper of the Year) 2018 The International Journal of Robotics Research (IJRR)
    Abstract: We introduce a novel formulation of motion planning, for continuous-time trajectories, as probabilistic inference. We first show how smooth continuous-time trajectories can be represented by a small number of states using sparse Gaussian process (GP) models. We next develop an efficient gradient-based optimization algorithm that exploits this sparsity along with GP interpolation. We call this algorithm the Gaussian Process Motion Planner (GPMP). We then detail how motion planning problems can be formulated as probabilistic inference on a factor graph. This forms the basis for GPMP2, a very efficient algorithm that combines GP representations of trajectories with fast, structure-exploiting inference via numerical optimization. Finally, we extend GPMP2 to an incremental algorithm, iGPMP2, that can efficiently replan when conditions change. We benchmark our algorithms against several sampling-based and trajectory optimization-based motion planning algorithms on planning problems in multiple environments. Our evaluation reveals that GPMP2 is several times faster than previous algorithms while retaining robustness. We also benchmark iGPMP2 on replanning problems, and show that it can find successful solutions in a fraction of the time required by GPMP2 to replan from scratch.
    BibTeX:
    @inproceedings{Mukadam-IJRR-18,
    Author = "Mustafa Mukadam and Jing Dong and Xinyan Yan and Frank Dellaert and Byron Boots", booktitle = {The International Journal of Robotics Research (IJRR)},
    Title = "Continuous-time Gaussian Process Motion Planning via Probabilistic Inference.",
    year = {2018}
    }
    M. A. Rana, M. Mukadam, S. R. Ahmadzadeh, S. Chernova, & B. Boots. Learning Generalizable Robot Skills from Demonstrations in Cluttered Environments. (46% Acceptance Rate) 2018 Proceedings of the International Conference on Intelligent Robots and Systems (IROS-2018)
    Abstract: Learning from Demonstration (LfD) is a popular approach to endowing robots with skills without having to program them by hand. Typically, LfD relies on human demonstrations in clutter-free environments. This prevents the demonstrations from being affected by irrelevant objects, who's influence can obfuscate the true intention of the human or the constraints of the desired skill. However, it is unrealistic to assume that the robot's environment can always be restructured to remove clutter when capturing human demonstrations. To contend with this problem, we develop an importance weighted batch and incremental skill learning approach, building on a recent inference-based technique for skill representation and reproduction. Our approach reduces unwanted environmental influences on learned skill, while still capturing the salient human behavior. We provide both batch and incremental versions of our approach and validate our algorithms on a 7-DOF JACO2 manipulator with reaching and placing skills.
    BibTeX:
    @inproceedings{Rana-IROS-18,
    Author = "M. Asif Rana and Mustafa Mukadam and S. Reza Ahmadzadeh and Sonia Chernova and Byron Boots", booktitle = {Proceedings of the International Conference on Intelligent Robots and Systems (IROS)},
    Title = "Learning Generalizable Robot Skills from Demonstrations in Cluttered Environments.",
    year = {2018}
    }
    J. Guerin, O. Gibaru, E. Nyiri, S. Thiery, & B. Boots. Semantically Meaningful View Selection.
    (46% Acceptance Rate)
    2018 Proceedings of the International Conference on Intelligent Robots and Systems (IROS-2018)
    Abstract: An understanding of the nature of objects could help robots to solve both high-level abstract tasks and improve performance at lower-level concrete tasks. Although deep learning has facilitated progress in image understanding, a robot's performance in problems like object recognition often depends on the angle from which the object is observed. Traditionally, robot sorting tasks rely on a fixed top-down view of an object. By changing its viewing angle, a robot can select a more semantically informative view leading to better performance for object recognition. In this paper, we introduce the problem of semantic view selection, which seeks to find good camera poses to gain semantic knowledge about an observed object. We propose a conceptual formulation of the problem, together with a solvable relaxation based on clustering. We then present a new image dataset consisting of around 10k images representing various views of 144 objects under different poses. Finally we use this dataset to propose a first solution to the problem by training a neural network to predict a ``semantic score'' from a top view image and camera pose. The views predicted to have higher scores are then shown to provide better clustering results than fixed top-down views.
    BibTeX:
    @inproceedings{Guerin-IROS-18,
    Author = "Joris Guerin and Olivier Gibaru and Eric Nyiri and Stephane Thiery and Byron Boots", booktitle = {Proceedings of the International Conference on Intelligent Robots and Systems (IROS)},
    Title = "Semantically Meaningful View Selection.",
    year = {2018}
    }
    J. Guerin & B. Boots. Improving Image Clustering With Multiple Pretrained CNN Feature Extractors. (29% Acceptance Rate) 2018 Proceedings of the 29th British Machine Vision Conference (BMVC-2018)
    Abstract: For many image clustering problems, replacing raw image data with features extracted by a pretrained convolutional neural network (CNN), leads to better clustering performance. However, the specific features extracted, and, by extension, the selected CNN architecture, can have a major impact on the clustering results. In practice, this crucial design choice is often decided arbitrarily due to the impossibility of using cross-validation with unsupervised learning problems. However, information contained in the different pretrained CNN architectures may be complementary, even when pretrained on the same data. To improve clustering performance, we rephrase the image clustering problem as a multi-view clustering (MVC) problem that considers multiple different pretrained feature extractors as different "views" of the same data. We then propose a multi-input neural network architecture that is trained end-to-end to solve the MVC problem effectively. Our experimental results, conducted on three different natural image datasets, show that: 1. improves image clustering using multiple pretrained CNNs jointly as feature extractors; 2. improves MPC using an end-to-end approach; and 3. combining both produces state-of-the-art results for the problem of image clustering.
    BibTeX:
    @inproceedings{Guerin-BMVC-18,
    Author = "Joris Guerin and Byron Boots", booktitle = {Proceedings of the 29th British Machine Vision Conference (BMVC)},
    Title = "Improving Image Clustering With Multiple Pretrained CNN Feature Extractors.",
    year = {2018}
    }
    M.Zafar, A.Mehmood, M.Murtaza, S.Zhang, E. Theodorou, S. Hutchinson, & B. Boots. Semi-parametric Approaches to Learning in Model-Based Hierarchical Control of Complex Systems. 2018 Proceedings of the International Symposium on Experimental Robotics (ISER-2018)
    Abstract: For systems with complex and unstable dynamics, such as humanoids, the use of model-based control within a hierarchical framework remains the tool of choice. This is due to the challenges associated with applying model-free reinforcement learning on such problems, such as sample inefficiency and limits on exploration of the state space in the absence of safety or stability guarantees. However, relying purely on physics-based models comes with its own set of problems. For instance, the necessary limits on expressiveness imposed by committing to fixed basis functions, and consequently, their limited ability to learn from data gathered on-line. This gap between theoretical models and real-world dynamics gives rise to a need to incorporate a learning component at some level within the model-based control framework. In this work, we present a highly redundant wheeled inverted-pendulum humanoid as a testbed for experimental validation of some recent approaches proposed to deal with these fundamental issues in the field of robotics, such as: 1. Semi-parametric Gaussian Process-based approaches to computed-torque control of serial robots. 2. Probabilistic Differential Dynamic Programming framework for trajectory planning by high-level controllers. 3. Barrier Certificate based safe-learning approaches for data collection to learn the dynamics of inherently unstable systems. We discuss how a typical model-based hierarchical control framework can be extended to incorporate approaches for learning at various stages of control design and hierarchy, based on the aforementioned tools.
    BibTeX:
    @inproceedings{Zafar-ISER-18,
    Author = "Munzir Zafar and Areeb Mehmood and Mouhyemen Khan and Shimin Zhang and Muhammad Murtaza and Victor Aladele and Evangelos A. Theodorou and Seth Hutchinson and Byron Boots", booktitle = {Proceedings of the International Symposium on Experimental Robotics (ISER)},
    Title = "Semi-parametric Approaches to Learning in Model-Based Hierarchical Control of Complex Systems.",
    year = {2018}
    }
    C. Cheng, X. Yan, N. Wagener, & B. Boots. Fast Policy Learning through Imitation and Reinforcement.
    (Selected for Plenary Presentation: 8% Acceptance Rate)
    2018 Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence
    (UAI-2018)
    Abstract: Imitation learning (IL) consists of a set of tools that leverage expert demonstrations to speed up the process of training policies. While these strategies provide fast convergence, the performance of IL usually varies with the quality of the expert policy. If the expert policy is suboptimal, IL can yield inferior performance compared with policies learned with policy gradient methods. In this paper, we address this problem in a mirror descent framework and propose an elegant randomized algorithm, LOKI. LOKI first runs an IL algorithm for a small but random number of iterations, and then switches to a policy gradient method. We show that if the switching time is properly randomized, LOKI can learn to outperform a suboptimal expert and converge faster than running policy gradient methods from scratch. Finally, we evaluate the performance of LOKI experimentally in several simulated environments.
    BibTeX:
    @inproceedings{Cheng-UAI-18,
    Author = "Ching-An Cheng and Xinyan Yan and Nolan Wagener and Byron Boots.",
    booktitle = {Proceedings of the 34th Conference on Uncertanty in Artificial Intelligence},
    Title = "Fast Policy Learning through Imitation and Reinforcement",
    year = {2018}
    }
    Y. Pan, C. Cheng, K. Saigol, K. Lee, X. Yan, E. Theodorou, & B. Boots Agile Autonomous Driving using End-to-End Deep Imitation Learning. (Finalist for Best Systems Paper) 2018 Proceedings of Robotics: Science and Systems XIV (RSS-2018)
    Abstract: We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost on-board sensors. By imitating a model predictive controller equipped with advanced sensors, we train a deep neural network control policy to map raw, high-dimensional observations to continuous steering and throttle commands. Compared with recent approaches to similar tasks, our method requires neither state estimation nor on-the-fly planning to navigate the vehicle. Our approach relies on, and experimentally validates, recent imitation learning theory. Empirically, we show that policies trained with online imitation learning overcome well-known challenges related to covariate shift and generalize better than policies trained with batch imitation learning. Built on these insights, our autonomous driving system demonstrates successful high-speed off-road driving, matching the state-of-the-art performance.
    BibTeX:
    @inproceedings{Pan-RSS-18,
    Author = "Yunpeng Pan and Ching-An Cheng and Kamil Saigol and Keuntak Lee and Xinyan Yan and Evangelos Theodorou and Byron Boots", booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
    Title = "Agile Autonomous Driving via End-to-End Deep Imitation Learning.",
    year = {2018}
    }
    W. Sun, J. A. Bagnell, & B. Boots Truncated Horizon Policy Search: Combining Reinforcement Learning and Imitation Learning.
    (34% Acceptance Rate)
    2018 Proceedings of the 6th International Conference on Learning Representations (ICLR-2018)
    Abstract: In this paper, we propose to combine imitation and reinforcement learning via the idea of reward shaping using an oracle. We study the effectiveness of the near-optimal cost-to-go oracle on the planning horizon and demonstrate that the cost-to-go oracle shortens the learner’s planning horizon as function of its accuracy: a globally optimal oracle can shorten the planning horizon to one, leading to a one-step greedy Markov Decision Process which is much easier to optimize, while an oracle that is far away from the optimality requires planning over a longer horizon to achieve near-optimal performance. Hence our new insight bridges the gap and interpolates between imitation learning and reinforcement learning. Motivated by the above-mentioned insights, we propose Truncated HORizon Policy Search (THOR), a method that focuses on searching for policies that maximize the total reshaped reward over a finite planning horizon when the oracle is sub-optimal. We experimentally demonstrate that a gradient-based implementation of THOR can achieve superior performance compared to RL baselines and IL baselines even when the oracle is sub-optimal.
    BibTeX:
    @inproceedings{Sun-ICLR-18,
    Author = "Wen Sun and James Andrew Bagnell and Byron Boots", booktitle = {Proceedings of the Sixth International Conference on Learning Representations (ICLR)},
    Title = "Truncated Horizon Policy Search: Combining Reinforcement Learning and Imitation Learning.",
    year = {2018}
    }
    K. Choromanski, C. Downey, & B. Boots Initialization Matters: Orthogonal Predictive State Recurrent Neural Networks. (34% Acceptance Rate) 2018 Proceedings of the 6th International Conference on Learning Representations (ICLR-2018)
    Abstract: Learning to predict complex time-series data is a fundamental challenge in a range of disciplines including Machine Learning, Robotics, and Natural Language Processing. Predictive State Recurrent Neural Networks (PSRNNs) (Downey et al.) are a state-of-the-art approach for modeling time-series data which combine the benefits of probabilistic filters and Recurrent Neural Networks into a single model. PSRNNs leverage the concept of Hilbert Space Embeddings of distributions (Smola et al.) to embed predictive states into a Reproducing Kernel Hilbert Space, then estimate, predict, and update these embedded states using Kernel Bayes Rule. Practical implementations of PSRNNs are made possible by the machinery of Random Features, where input features are mapped into a new space where dot products approximate the kernel well. Unfortunately PSRNNs often require a large number of RFs to obtain good results, resulting in large models which are slow to execute and slow to train. Orthogonal Random Features (ORFs) (Choromanski et al.) is an improvement on RFs which has been shown to decrease the number of RFs required for pointwise kernel approximation. Unfortunately, it is not clear that ORFs can be applied to PSRNNs, as PSRNNs rely on Kernel Ridge Regression as a core component of their learning algorithm, and the theoretical guarantees of ORF do not apply in this setting. In this paper, we extend the theory of ORFs to Kernel Ridge Regression and show that ORFs can be used to obtain Orthogonal PSRNNs (OPSRNNs), which are smaller and faster than PSRNNs. In particular, we show that OPSRNN models clearly outperform LSTMs and furthermore, can achieve accuracy similar to PSRNNs with an order of magnitude smaller number of features needed.
    BibTeX:
    @inproceedings{Choromanski-ICLR-18,
    Author = "Krzysztof Choromanski and Carlton Downey and Byron Boots", booktitle = {Proceedings of the Sixth International Conference on Learning Representations (ICLR)},
    Title = "Initialization matters: Orthogonal Predictive State Recurrent Neural Networks.",
    year = {2018}
    }
    K. Choromanski, V. Sindhwani, B. Jones, D. Jourdan, M. Chociej, & B. Boots A Learning-based Air Data System for Safe and Efficient Control of Fixed-wing Aerial Vehicles. 2018 Proceedings of IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR-2018)
    Abstract: We develop an air data system for aerial robots executing high-speed outdoor missions subject to significant aerodynamic forces on their bodies. The system is based on a combination of Extended Kalman Filtering (EKF) and autoregressive feedforward Neural Networks, relying only on IMU sensors and GPS. This eliminates the need to instrument the vehicle with Pitot tubes and mechanical vanes, reducing associated cost, weight, maintenance requirements and likelihood of catastrophic mechanical failures. The system is trained to clone the behaviour of Pitot-tube measurements on thousands of instrumented simulated and real flights, and does not require a vehicle aerodynamics model. We demonstrate that safe guidance and navigation is possible in executing complex maneuvers in the presence of wind gusts without relying on airspeed sensors. We also demonstrate accuracy enhancements from successful “simulation-to-reality” transfer and dataset aggregation techniques to correct for training-test distribution mismatches when the air-data system and the control stack operate in closed loop.
    BibTeX:
    @inproceedings{Choromanski-SSRR-18,
    Author = "Krzysztof Choromanski and Vikas Sindhwani and Brandon Jones and Damien Jourdan and Maciej Chociej and Byron Boots", booktitle = {Proceedings of IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR)},
    Title = "Learning-based Air Data System for Safe and Efficient Control of Fixed-wing Aerial Vehicles.",
    year = {2018}
    }
    M. Mukadam, J. Dong, F. Dellaert, & B. Boots. STEAP: Simultaneous Trajectory Estimation And Planning. 2018 Autonomous Robots (AURO)
    Abstract: We present a unified probabilistic framework for simultaneous trajectory estimation and planning (STEAP). Estimation and planning problems are usually considered separately, however, within our framework we show that solving them simultaneously can be more accurate and efficient. The key idea is to compute the full continuous-time trajectory from start to goal at each time-step. While the robot traverses the trajectory, the history portion of the trajectory signifies the solution to the estimation problem, and the future portion of the trajectory signifies a solution to the planning problem. Building on recent probabilistic inference approaches to continuous-time localization and mapping and continuous-time motion planning, we solve the joint problem by iteratively recomputing the maximum a posteriori trajectory conditioned on all available sensor data and cost information. Our approach can contend with high-degree-of-freedom (DOF) trajectory spaces, uncertainty due to limited sensing capabilities, model inaccuracy, the stochastic effect of executing actions, and can find a solution in real-time. We evaluate our framework empirically in both simulation and on a mobile manipulator.
    BibTeX:
    @inproceedings{Mukadam-AURO-18,
    Author = "Mustafa Mukadam and Jing Dong and Frank Dellaert and Byron Boots", journal = {Autonomous Robots (AURO)},
    Title = "STEAP: Simultaneous Trajectory Estimation and Planning.",
    year = {2018}
    }
    J. Dong, M. Mukadam, B. Boots, & F. Dellaert. Sparse Gaussian Processes on Matrix Lie Groups: A Unified Framework for Optimizing Continuous-Time Trajectories. (41% Acceptance Rate) 2018 Proceedings of the 2018 IEEE Conference on Robotics and Automation (ICRA-2018)
    Abstract: Continuous-time trajectories are useful for reasoning about robot motion in a wide range of tasks. Sparse Gaussian processes (GPs) can be used as a non-parametric representations for trajectory distributions that enables fast trajectory optimization by sparse GP regression. However, most previous approaches that utilize sparse GPs for trajectory optimization are limited by the fact that the robot state is represented in vector space. In this paper, we first extend previous work to consider the state on general matrix Lie groups, by applying a constant-velocity prior and defining locally linear GPs. Then, we discuss how sparse GPs on Lie groups provide a unified continuous-time framework for trajectory optimization for solving a number of robotics problems including state estimation and motion planning. Finally, we demonstrate and evaluate our approach on several different estimation and motion planning tasks with both synthetic and real-world experiments.
    BibTeX:
    @inproceedings{Dong-ICRA-18,
    Author = "Jing Dong and Mustafa Mukadam and Byron Boots and Frank Dellaert", booktitle = {Proceedings of the 2018 IEEE Conference on Robotics and Automation (ICRA)},
    Title = "Sparse Gaussian Processes on Matrix Lie Groups: A Unified Framework for Optimizing Continuous-Time Trajectories.",
    year = {2018}
    }
    A. Lambert, A. Shaban, A. Raj, Z. Liu, & B. Boots. Deep Forward and Inverse Perceptual Models for Tracking and Prediction. (41% Acceptance Rate) 2018 Proceedings of the 2018 IEEE Conference on Robotics and Automation (ICRA-2018)
    Abstract: We consider the problems of learning forward models that map state to high-dimensional images and inverse models that map high-dimensional images to state in robotics. Specifically, we present a perceptual model for generating video frames from state with deep networks, and provide a framework for its use in tracking and prediction tasks. We show that our proposed model greatly outperforms standard deconvolutional methods and GANs for image generation, producing clear, photo-realistic images. We also develop a convolutional neural network model for state estimation and compare the result to an Extended Kalman Filter to estimate robot trajectories. We validate all models on a real robotic system.
    BibTeX:
    @inproceedings{Lambert-ICRA-18,
    Author = "Alexander Lambert and Amirreza Shaban and Amit Raj and Zhen Liu and Byron Boots", booktitle = {Proceedings of the 2018 IEEE Conference on Robotics and Automation (ICRA)},
    Title = "Deep Forward and Inverse Perceptual Models for Tracking and Prediction.",
    year = {2018}
    }
    J. Molnar, C. Cheng, L. Tiziani, B. Boots, & F. Hammond III. Optical Sensing and Control Methods for Soft Pneumatically Actuated Robotic Manipulators.
    (41% Acceptance Rate)
    2018 Proceedings of the 2018 IEEE Conference on Robotics and Automation (ICRA-2018)
    Abstract: A low-cost optical sensing method for improved measurement and control of soft pneumatic manipulator motion is presented. The core of a soft continuum robot is embedded with several optically-diffuse elastomer sensors which attenuate light depending on their strain mode and degree. The optical sensors measure local strains at the robot’s axial center, and these strain data are combined with measured actuator chamber pressures to determine the pose of the robot under various gravitational and tip loading conditions. Regression analyses using neural networks (NNs) demonstrate that when the soft continuum robot’s base orientation is fixed, the position of its end-effector can be estimated with 3.42 times more accuracy (71% smaller root mean squared error) when using both optical sensor and pressure data (~2.44mm) than when using only pressure data (~8.3mm). When the robot’s base orientation was varied, the combined optical sensor and pressure data provide position estimates which are as much as 37.8 times more accurate (~2.76mm) than pressure data alone (~104mm).
    BibTeX:
    @inproceedings{Molnar-ICRA-18,
    Author = "Jennifer L. Molnar and Ching-An Cheng and Lucas O. Tiziani and Byron Boots and Frank L. Hammond III", booktitle = {Proceedings of the 2018 IEEE Conference on Robotics and Automation (ICRA)},
    Title = "Optical Sensing and Control Methods for Soft Pneumatically Actuated Robotic Manipulators.",
    year = {2018}
    }
    C. Cheng & B. Boots. Convergence of Value Aggregation for Imitation Learning.
    (Winner of Best Overall Paper)
    2018 Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS-2018)
    Abstract: Value aggregation is a general framework for solving imitation learning problems. Based on the idea of data aggregation, it generates a policy sequence by iteratively interleaving policy optimization and evaluation in an online learning setting. While the existence of a good policy in the policy sequence can be guaranteed non-asymptotically, little is known about the convergence of the sequence or the performance of the last policy. In this paper, we debunk the common belief that value aggregation always produces a convergent policy sequence. Moreover, we identify a critical stability condition for convergence and provide a tight non-asymptotic bound on the performance of the last policy. These new theoretical insights let us stabilize problems with regularization, which removes the inconvenient process of identifying the best policy in the policy sequence in stochastic problems.
    BibTeX:
    @inproceedings{Cheng-AISTATS-18,
    Author = "Ching-An Cheng and Byron Boots.", booktitle = {Proceedings of the 21st International Conference on Artificial Intelligence and Statistics},
    Title = "Convergence of Value Aggregagtion for Imitation Learning",
    year = {2018}
    }
    S. Srinivasan, G. J. Gordon, & B. Boots. Learning Hidden Quantum Markov Models.
    (31% Acceptance Rate)
    2018 Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS-2018)
    Abstract: Hidden Quantum Markov Models (HQMMs) can be thought of as quantum probabilistic graphical models that can model sequential data. We extend previous work on HQMMs with three contributions:(1) we show how classical hidden Markov models (HMMs) can be simulated on a quantum circuit, (2) we reformulate HQMMs by relaxing the constraints for modeling HMMs on quantum circuits, and (3) we present a learning algorithm to estimate the parameters of an HQMM from data. While our algorithm requires further optimization to handle larger datasets, we are able to evaluate our algorithm using several synthetic datasets. We show that on HQMM generated data, our algorithm learns HQMMs with the same number of hidden states and predictive accuracy as the true HQMMs, while HMMs learned with the Baum-Welch algorithm require more states to match the predictive accuracy.
    BibTeX:
    @inproceedings{Srinivasan-AISTATS-18,
    Author = "Siddarth Srinivasan and Geoffrey J. Gordon and Byron Boots.", booktitle = {Proceedings of the 21st International Conference on Artificial Intelligence and Statistics},
    Title = "Learning Hidden Quantum Markov Models",
    year = {2018}
    }
    C. Downey, A. Hefny, B. Li, B. Boots, & G. J. Gordon. Predictive State Recurrent Neural Networks.
    (21% Acceptance Rate)
    2017 Proceedings of Advances in Neural Information Processing Systems 31 (NIPS-2017)
    Abstract: We present a new model, Predictive State Recurrent Neural Networks (PSRNNs), for filtering and prediction in dynamical systems. PSRNNs draw on insights from both Recurrent Neural Networks (RNNs) and Predictive State Representations (PSRs), and inherit advantages from both types of models. Like many successful RNN architectures, PSRNNs use (potentially deeply composed) bilinear transfer functions to combine information from multiple sources. We show that such bilinear functions arise naturally from state updates in Bayes filters like PSRs, in which observations can be viewed as gating belief states. We also show that PSRNNs can be learned effectively by combining Backpropogation Through Time (BPTT) with an initialization derived from a statistically consistent learning algorithm for PSRs called two-stage regression (2SR). Finally, we show that PSRNNs can be factorized using tensor decomposition, reducing model size and suggesting interesting connections to existing multiplicative architectures such as LSTMs. We applied PSRNNs to 4 datasets, and showed that we outperform several popular alternative approaches to modeling dynamical systems in all cases.
    BibTeX:
    @inproceedings{Downey-NIPS-17,
    Author = "Carlton Downey and Ahmed Hefny and Boyue Li and Byron Boots and Geoffrey J. Gordon.", booktitle = {Proceedings of Advances in Neural Information Processing Systems (NIPS)},
    Title = "Predictive State Recurrent Neural Networks",
    year = {2017}
    }
    A. Venkatraman, N. Rhinehart, W. Sun, L. Pinto, B. Boots, K. Kitani, & J. A. Bagnell. Predictive State Decoders: Encoding the Future into Recurrent Networks. (21% Acceptance Rate) 2017 Proceedings of Advances in Neural Information Processing Systems 31 (NIPS-2017)
    Abstract: Recurrent neural networks (RNNs) are a vital modeling technique that rely on internal states learned indirectly by optimization of a supervised, unsupervised, or reinforcement training loss. RNNs are used to model dynamic processes that are characterized by underlying latent states whose form is often unknown, precluding its analytic representation inside an RNN. In the Predictive-State Representation (PSR) literature, latent state processes are modeled by an internal state representation that directly models the distribution of future observations, and most recent work in this area has relied on explicitly representing and targeting sufficient statistics of this probability distribution. We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations. PSDs are simple to implement and easily incorporated into existing training pipelines via additional loss regularization. We demonstrate the effectiveness PSDs with experimental results in three different domains: probabilistic filtering, Imitation Learning, and Reinforcement Learning. Our method improves statistical performance of state-of-the-art recurrent baselines and does so with fewer iterations and less data.
    BibTeX:
    @inproceedings{Venkatraman-NIPS-17,
    Author = "Arun Venkatraman and Nicholas Rhinehart and Wen Sun and Lerrel Pinto and Byron Boots and Kris Kitani and James Andrew Bagnell.", booktitle = {Proceedings of Advances in Neural Information Processing Systems (NIPS)},
    Title = "Predictive State Decoders: Encoding the Future into Recurrent Networks",
    year = {2017}
    }
    C. Cheng & B. Boots. Variational Inference for Gaussian Process Models with Linear Complexity. (21% Acceptance Rate) 2017 Proceedings of Advances in Neural Information Processing Systems 31 (NIPS-2017)
    Abstract: Large-scale Gaussian process inference has long faced practical challenges due to time and space complexity that is superlinear in dataset size. While sparse variational Gaussian process models are capable of learning from large-scale data, standard strategies for sparsifying the model can prevent the approximation of complex functions. In this work, we propose a novel variational Gaussian process model that decouples the representation of mean and covariance functions in reproducing kernel Hilbert space. We show that this new parametrization generalizes previous models and yields a variational inference problem that can be solved by stochastic gradient ascent with time and space complexity that is only linear in the number of mean function parameters. This strategy makes the adoption of large-scale expressive Gaussian process models possible. We run several experiments on regression tasks and show that this decoupled approach greatly outperforms previous sparse variational Gaussian process inference procedures.
    BibTeX:
    @inproceedings{Cheng-NIPS-17,
    Author = "Ching-An Cheng and Byron Boots.", booktitle = {Proceedings of Advances in Neural Information Processing Systems (NIPS)},
    Title = "Variational Inference for Gaussian Process Models with Linear Complexity",
    year = {2017}
    }
    M. A. Rana, M. Mukadam, S. R. Ahmadzadeh, S. Chernova, & B. Boots. Towards Robust Skill Generalization: Unifying Learning from Demonstration and Motion Planning.
    (Selected for Plenary Presentation: 8% Acceptance Rate)
    2017 Proceedings of the 1st Annual Conference on Robot Learning
    (CoRL-2017)
    Abstract: In this paper, we develop an efficient and generalizable approach to skill learning and reproduction by combining the strengths of motion planning and trajectory-based learning from demonstration (LfD). Our approach unifies conventional LfD and motion planning using probabilistic inference for generalizable skill reproduction. We find trajectories which are optimal with respect to a given skill and also feasible in different scenarios. To speed up inference, we use factor graphs and numerical optimization. As a part of our approach, we also provide a new probabilistic skill model that requires minimal parameter tuning and is more suited for encoding skill constraints and performing inference in an efficient manner. Preliminary experimental results showing skill generalization over initial robot state and unforeseen obstacles are presented.
    BibTeX:
    @inproceedings{Rana-CoRL-17,
    Author = "M. Asif Rana and Mustafa Mukadam and S. Reza Ahmadzadeh and Sonia Chernova and Byron Boots.", booktitle = {Proceedings of the 2017 Conference on Robot Learning (CoRL)},
    Title = "Towards Robust Skill Generalization: Unifying Learning from Demonstration and Motion Planning",
    year = {2017}
    }
    A. Shaban, S. Bansal, Z. Liu, I. Essa, & B. Boots. One-Shot Learning for Semantic Segmentation.
    (30% Acceptance Rate)
    2017 Proceedings of the 28th British Machine Vision Conference (BMVC-2017)
    Abstract: Low-shot learning methods for image classification support learning from sparse data. We extend these techniques to support dense semantic image segmentation. Specifically, we train a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN). We use this FCN to perform dense pixel-level prediction on a test image for the new semantic class. Our architecture shows a 25% relative meanIoU improvement compared to the best baseline methods for one-shot segmentation on unseen classes in the PASCAL VOC 2012 dataset and is at least 3X faster.
    BibTeX:
    @inproceedings{Shaban-BMVC-17,
    Author = "Amirreza Shaban and Shray Bansal and Zhen Liu and Irfan Essa and Byron Boots.", booktitle = {Proceedings of the 28th British Machine Vision Conference (BMVC)},
    Title = "One-Shot Learning for Semantic Segmentation.",
    year = {2017}
    }
    W. Sun, A. Venkatraman, G. J. Gordon, B. Boots, & J. A. Bagnell. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction. (25% Acceptance Rate) 2017 Proceedings of the 34th International Conference on Machine Learning
    (ICML-2017)
    Abstract: Researchers have demonstrated state-of-the-art performance in sequential decision making problems (e.g., robotics control, sequential prediction) with deep neural network models. One often has access to near-optimal oracles that achieve good performance on the task during training. We demonstrate that AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL) approach of Ross and Bagnell, 2014 --- can leverage such an oracle to achieve faster and better solutions with less training data than a less-informed Reinforcement Learning (RL) technique. Using both feedforward and recurrent neural predictors, we present stochastic gradient procedures on a sequential prediction task, dependency-parsing from raw image data, as well as on various high dimensional robotics control problems. We also provide a comprehensive theoretical study of IL that demonstrates we can expect up to exponentially lower sample complexity for learning with AggreVaTeD than with RL algorithms, which backs our empirical findings. Our results and theory indicate that the proposed approach can achieve superior performance with respect to the oracle when the demonstrator is sub-optimal.
    BibTeX:
    @inproceedings{Sun-ICML-17,
    Author = "Wen Sun and Arun Venkatraman and Geoffrey J. Gordon and Byron Boots and J. Andrew Bagnell.", booktitle = {Proceedings of the 2017 International Conference on Machine Learning (ICML)},
    Title = "Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction",
    year = {2017}
    }
    Y. Pan, X. Yan, E. Theodorou, & B. Boots. Prediction under Uncertainty in Sparse Spectrum Gaussian Processes with Applications to Filtering and Control.
    (25% Acceptance Rate)
    2017 Proceedings of the 34th International Conference on Machine Learning
    (ICML-2017)
    Abstract: Sparse Spectrum Gaussian Processes (SSGPs) are a powerful tool for scaling Gaussian processes (GPs) to large datasets. Existing SSGP algorithms for regression assume deterministic inputs, precluding their use in many real-world robotics and engineering applications where accounting for input uncertainty is crucial. We address this problem by proposing two analytic moment-based approaches with closed-form expressions for SSGP regression with uncertain inputs. Our methods are more general and scalable than their standard GP counterparts, and are naturally applicable to multi-step prediction or uncertainty propagation. We show that efficient algorithms for Bayesian filtering and stochastic model predictive control can use these methods, and we evaluate our algorithms with comparative analyses and real-world experiments.
    BibTeX:
    @inproceedings{Pan-ICML-17,
    Author = "Yunpeng Pan and Xinyan Yan and Evangelos Theodorou and Byron Boots", booktitle = {Proceedings of the 2017 International Conference on Machine Learning (ICML)},
    Title = "Prediction under Uncertainty in Sparse Spectrum Gaussian Processes with Applications to Filtering and Control",
    year = {2017}
    }
    M. Mukadam, J. Dong, F. Dellaert, & B. Boots. Simultaneous Trajectory Estimation and Planning via Probabilistic Inference. (39% Acceptance Rate) 2017 Proceedings of Robotics: Science and Systems XIII (RSS-2017)
    Abstract: We provide a unified probabilistic framework for trajectory estimation and planning. The key idea is to view these two problems, usually considered separately, as a single problem. At each time-step the robot is tasked with finding the complete continuous-time trajectory from start to goal. This can be quite difficult; the robot must contend with a potentially high-degree-of-freedom (DOF) trajectory space, uncertainty due to limited sensing capabilities, model inaccuracy, and the stochastic effect of executing actions, and the robot must find the solution in (faster than) real time. To overcome these challenges, we build on recent probabilistic inference approaches to continuous-time localization and mapping and continuous-time motion planning. We solve the joint problem by iteratively recomputing the maximum a posteriori trajectory conditioned on all available sensor data and cost information. Finally, we evaluate our framework empirically in both simulation and on a mobile manipulator.
    BibTeX:
    @inproceedings{Mukadam-RSS-17,
    Author = "Mustafa Mukadam and Jing Dong and Frank Dellaert and Byron Boots", booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
    Title = "Simultaneous Trajectory Estimation and Planning via Probabilistic Inference.",
    year = {2017}
    }
    E. Huang, A. Bhatia, B. Boots, & M. Mason. Exact Bounds on the Contact-Driven Motion of a Sliding Object, With Applications to Robotic Pulling.
    (39% Acceptance Rate)
    2017 Proceedings of Robotics: Science and Systems XIII (RSS-2017)
    Abstract: This paper explores the quasi-static motion of a planar slider being pushed or pulled through a single contact point assumed not to slip. The main contribution is to derive a method for computing exact bounds on the object's motion for classes of pressure distributions where the center of pressure is known but the distribution of support forces is unknown. The second contribution is to show that the exact motion bounds can be used to plan robotic pulling trajectories that guarantee convergence to the final pose. The planner was tested on the task of pulling an acrylic rectangle to random locations within the robot workspace. The generated plans were accurate to 4.00mm +/- 3.02mm of the target position and 4.35 degrees +/- 3.14 degrees of the target orientation.
    BibTeX:
    @inproceedings{Huang-RSS-17,
    Author = "Eric Huang and Ankit Bhatia and Byron Boots and Matthew T. Mason", booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
    Title = "Exact Bounds on the Contact-Driven Motion of a Sliding Object, With Applications to Robotic Pulling.",
    year = {2017}
    }
    B. Dai, N. He, Y. Pan, B. Boots, & L. Song. Learning from Conditional Distributions via Dual Kernel Embeddings. (31% Acceptance Rate) 2017 Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS-2017)
    Abstract: Many machine learning tasks, such as learning with invariance and policy evaluation in reinforcement learning, can be characterized as problems of learning from conditional distributions. In such problems, each sample x itself is associated with a conditional distribution p(z|x) represented by samples, and the goal is to learn a function f that links these conditional distributions to target values y. These problems become very challenging when we only have limited samples or, in the extreme case, only one sample from each conditional distribution. Commonly used approaches either assume that z is independent of x, or require an overwhelmingly large set of samples from each conditional distribution. To address these challenges, we propose a novel approach which employs a new min-max reformulation of the learning from conditional distributions problem. With such new reformulation, we only need to deal with the joint distribution p(z, x). We also design an efficient learning algorithm, Embedding-SGD, and establish theoretical sample complexity for such problems. Finally, our numerical experiments, on both synthetic and real-world datasets, show that the proposed approach shows significant improvement over existing algorithms.
    BibTeX:
    @inproceedings{Dai-AISTATS-17,
    Author = "Bo Dai and Niao He and Yunpeng Pan and Byron Boots and Le Song", booktitle = {Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)},
    Title = "Learning from Conditional Distributions via Dual Kernel Embeddings.",
    year = {2017}
    }
    M. Mukadam, C. Cheng, X. Yan, & B. Boots. Approximately Optimal Continuous-Time Motion Planning and Control via Probabilistic Inference.
    (41% Acceptance Rate)
    2017 Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA-2017)
    Abstract: The problem of optimal motion planning and control is fundamental in robotics. However, this problem is intractable for continuous-time stochastic systems in general and the solution is difficult to approximate if non-instantaneous nonlinear performance indices are present. In this work, we provide an efficient algorithm, PIPC (Probabilistic Inference for Planning and Control), that yields approximately optimal policies with arbitrary higher-order nonlinear performance indices. Using probabilistic inference and a Gaussian process representation of trajectories, PIPC exploits the underlying sparsity of the problem such that its complexity scales linearly in the number of nonlinear factors. We demonstrate the capabilities of our algorithm in a receding horizon setting with multiple systems in simulation.
    BibTeX:
    @inproceedings{Mukadam-ICRA-17,
    Author = "Mustafa Mukadam and Ching-An Cheng and Xinyan Yan and Byron Boots", booktitle = {Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA)},
    Title = "Approximately Optimal Continuous-Time Motion Planning and Control via Probabilistic Inference.",
    year = {2017}
    }
    G. Williams, N. Wagener, B. Goldfain, P. Drews, J. Rehg, B. Boots, & E. Theodorou. Information Theoretic MPC for Model-Based Reinforcement Learning.
    (Finalist for Best Overall Paper)
    2017 Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA-2017)
    Abstract: We introduce an information theoretic model predictive control (MPC) algorithm capable of handling complex cost criteria and general nonlinear dynamics. The generality of the approach makes it possible to use multi-layer neural networks as dynamics models, which we incorporate into our MPC algorithm in order to solve model-based reinforcement learning tasks. We test the algorithm in simulation on a cart-pole swing up and quadrotor navigation task, as well as on actual hardware in an aggressive driving task. Empirical results demonstrate that the algorithm is capable of achieving a high level of performance and does so only utilizing data collected from the system.
    BibTeX:
    @inproceedings{Williams-ICRA-17,
    Author = "Grady Williams and Nolan Wagener and Brian Goldfain and Paul Drews and James Rehg and Byron Boots and Evangelos Theodorou", booktitle = {Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA)},
    Title = "Information Theoretic {MPC}{ for Model-Based Reinforcement Learning.",
    year = {2017}
    }
    E. Huang, M. Mukadam, Z. Liu, & B. Boots. Motion Planning with Graph-Based Trajectories and Gaussian Process Inference. (41% Acceptance Rate) 2017 Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA-2017)
    Abstract: Motion planning as trajectory optimization requires generating trajectories that minimize a desired objective function or performance metric. Finding a globally optimal solution is often intractable in practice: despite the existence of fast motion planning algorithms, most are prone to local minima, which may require re-solving the problem multiple times with different initializations. In this work, we provide a novel motion planning algorithm, GPMP-GRAPH, that considers a graph-based initialization that simultaneously explores multiple homotopy classes, helping to contend with the local minima problem. Drawing on previous work to represent continuous-time trajectories as samples from a Gaussian process (GP) and formulating the motion planning problem as inference on a factor graph, we construct a graph of interconnected states such that each path through the graph is a valid trajectory and efficient inference can be performed on the collective factor graph. We perform a variety of benchmarks and show that our approach allows the evaluation of an exponential number of trajectories within a fraction of the computational time required to evaluate them one at a time, yielding a more thorough exploration of the solution space and a higher success rate.
    BibTeX:
    @inproceedings{Huang-ICRA-17,
    Author = "Eric Huang and Mustafa Mukadam and Zhen Liu and Byron Boots", booktitle = {Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA)},
    Title = "Motion Planning with Graph-Based Trajectories and {G}aussian Process Inference.",
    year = {2017}
    }
    J. Dong, J. Burnham, B. Boots, G. Rains, & F. Dellaert. 4D Crop Monitoring: Spatio-Temporal Reconstruction for Agriculture. (41% Acceptance Rate) 2017 Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA-2017)
    Abstract: Autonomous crop monitoring at high spatial and temporal resolution is a critical problem in precision agriculture. While Structure from Motion and Multi-View Stereo algorithms can finely reconstruct the 3D structure of a field with low-cost image sensors, these algorithms fail to capture the dynamic nature of continuously growing crops. In this paper we propose a 4D reconstruction approach to crop monitoring, which employs a spatio-temporal model of dynamic scenes that is useful for precision agriculture applications. Additionally, we provide a robust data association algorithm to address the problem of large appearance changes due to scenes being viewed from different angles at different points in time, which is critical to achieving 4D reconstruction. Finally, we collected a high-quality dataset with ground-truth statistics to evaluate the performance of our method. We demonstrate that our 4D reconstruction approach provides models that are qualitatively correct with respect to visual appearance and quantitatively accurate when measured against the ground truth geometric properties of the monitored crops.
    BibTeX:
    @inproceedings{Dong-ICRA-17,
    Author = "Jing Dong and John Burnhan and Byron Boots and Glen Rains and Frank Dellaert", booktitle = {Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA)},
    Title = "4D Crop Monitoring: Spatio-Temporal Reconstruction for Agriculture.",
    year = {2017}
    }
    B. Hrolenok, B. Boots, & T. Balch. Sampling Beats Fixed Estimate Predictors for Cloning Stochastic Behavior in Multiagent Systems.
    (24% Acceptance Rate)
    2017 Proceedings of the 31st Conference on Artificial Intelligence (AAAI-2017)
    Abstract: Modeling stochastic multiagent behavior such as fish schooling is challenging for fixed-estimate prediction techniques because they fail to reliably reproduce the stochastic aspects of the agents behavior. We show how standard fixed-estimate predictors fit within a probabilistic framework, and suggest the reason they work for certain classes of behaviors and not others. We quantify the degree of mismatch and offer alternative sampling-based modeling techniques. We are specifically interested in building executable models (as opposed to statistical or descriptive models) because we want to reproduce and study multiagent behavior in simulation. Such models can be used by biologists, sociologists, and economists to explain and predict individual and group behavior in novel scenarios, and to test hypotheses regarding group behavior. Developing models from observation of real systems is an obvious application of machine learning. Learning directly from data eliminates expensive hand processing and tuning, but introduces unique challenges that violate certain assumptions common in standard machine learning approaches. Our framework suggests a new class of sampling-based methods, which we implement and apply to simulated deterministic and stochastic schooling behaviors, as well as the observed schooling behavior of real fish. Experimental results show that our implementation performs comparably with standard learning techniques for deterministic behaviors, and better on stochastic behaviors.
    BibTeX:
    @inproceedings{Hrolenok-AAAI-17,
    Author = "Brian Hrolenok and Byron Boots and Tucker Balch", booktitle = {Proceedings of the Conference on Artificial Intelligence (AAAI)},
    Title = "Sampling Beats Fixed Estimate Predictors for Cloning Stochastic Behavior in Multiagent Systems",
    year = {2017}
    }
    X. Yan, V. Indelman, & B. Boots. Incremental Sparse GP Regression for Continuous-time Trajectory Estimation & Mapping.
    2017 Journal of Robotics and Autonomous Systems (RAS)
    Abstract: Recent work on simultaneous trajectory estimation and mapping (STEAM) for mobile robots has used Gaussian processes (GPs) to efficiently represent the robot’s trajectory through its environment. GPs have several advantages over discrete-time trajectory representations: they can represent a continuous-time trajectory, elegantly handle asynchronous and sparse measurements, and allow the robot to query the trajectory to recover its estimated position at any time of interest. A major drawback of the GP approach to STEAM is that it is formulated as a batch trajectory estimation problem. In this paper we provide the critical extensions necessary to transform the existing GP-based batch algorithm for STEAM into an extremely efficient incremental algorithm. In particular, we are able to vastly speed up the solution time through efficient variable reordering and incremental sparse updates, which we believe will greatly increase the practicality of Gaussian process methods for robot mapping and localization. Finally, we demonstrate the approach and its advantages on both synthetic and real datasets.
    BibTeX:
    @inproceedings{Yan17ras,	
      Author = "Xinyan Yan and Vadim Indelman and Byron Boots",
      journal = " Robotics and Autonomous Systems",
      Title = "Incremental Sparse {GP} Regression for Continuous-time Trajectory Estimation and Mapping",
      Year = {2017},
      pages="120-132",
      volume = {87}
    }
    C. Cheng & B. Boots. Incremental Variational Sparse Gaussian Process Regression. (22% Acceptance Rate) 2016 Proceedings of Advances in Neural Information Processing Systems 30 (NIPS-2016)
    Abstract: Recent work on scaling up Gaussian process regression (GPR) to large datasets has primarily focused on sparse GPR, which leverages a small set of basis functions to approximate the full Gaussian process during inference. However, the majority of these approaches are batch methods that operate on the entire training dataset at once, precluding the use of datasets that are streaming or too large to fit into memory. Although previous work has considered incrementally solving variational sparse GPR, most algorithms fail to update the basis functions and therefore perform suboptimally. We propose a novel incremental learning algorithm for variational sparse GPR based on stochastic mirror ascent of probability densities in reproducing kernel Hilbert space. This new formulation allows our algorithm to update basis functions online in accordance with the manifold structure of probability densities for fast convergence. We conduct several experiments and show that our proposed approach achieves better empirical performance in terms of prediction error than the recent state-of-the-art incremental solutions to sparse GPR.
    BibTeX:
    @inproceedings{Cheng16,	
      Author = "Ching-An Cheng and Byron Boots",
      Booktitle = "Proceedings of Advances in Neural Information Processing Systems 30 (NIPS)",
      Title = "Incremental Variational Sparse {G}aussian Process Regression",
      Year = {2016}
    }
    J. Tan, Z. Xie, B. Boots, & K. Liu. Simulation-Based Design of Dynamic Controllers for Humanoid Balancing. (Selected for Oral Presentation)
    (48% Acceptance Rate)
    2016 Proceedings of the 2016 International Conference on Intelligent Robots and Systems (IROS-2016)
    Abstract: Model-based trajectory optimization often fails to find a reference trajectory for under-actuated bipedal robots performing highly-dynamic, contact-rich tasks in the real world due to inaccurate physical models. In this paper, we propose a complete system that automatically designs a reference trajectory that succeeds on tasks in the real world with a very small number of real world experiments. We adopt existing system identification techniques and show that, with appropriate model parameterization and control optimization, an iterative system identification framework can be effective for designing reference trajectories. We focus on a set of tasks that leverage the momentum transfer strategy to rapidly change the whole-body from an initial configuration to a target configuration by generating large accelerations at the center of mass and switching contacts.
    BibTeX:
    @inproceedings{Tan-IROS-16,
    Author = "Jie Tan and Zhaoming Xie and Byron Boots and C. Karen Liu", booktitle = {Proceedings of The IEEE Conference on Intelligent Robots and Systems (IROS)},
    Title = "Simulation-Based Design of Dynamic Controllers for Humanoid Balancing",
    year = {2016}
    }
    W. Sun, R. Capobianco, G. J. Gordon, J. A. Bagnell, & B. Boots. Learning to Smooth with Bidirectional Predictive State Inference Machines. (31% Acceptance Rate) 2016 Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI-2016)
    Abstract: We present the Smoothing Machine (SMACH, pronounced "smash"), a dynamical system learning algorithm based on chain Conditional Random Fields (CRFs) with latent states. Unlike previous methods, SMACH is designed to optimize prediction performance when we have information from both past and future observations. By leveraging Predictive State Representations (PSRs), we model beliefs about latent states through predictive states--an alternative but equivalent representation that depends directly on observable quantities. Predictive states enable the use of well-developed supervised learning approaches in place of local-optimum-prone methods like EM: we learn regressors or classifiers that can approximate message passing and marginalization in the space of predictive states. We provide theoretical guarantees on smoothing performance and we empirically verify the efficacy of SMACH on several dynamical system benchmarks.
    BibTeX:
    @inproceedings{Sun-UAI-16,
    Author = "Wen Sun and Roberto Capobianco and Geoffrey J. Gordon and J. Andrew Bagnell and Byron Boots", booktitle = {Proceedings of The International Conference on Uncertainty in Artificial Intelligence (UAI)},
    Title = "Learning to Smooth with Bidirectional Predictive State Inference Machines",
    year = {2016}
    }
    J. Dong, M. Mukadam, F. Dellaert, & B. Boots. Motion Planning as Probabilistic Inference using Gaussian Processes and Factor Graphs.
    (20% Acceptance Rate)
    2016 Proceedings of Robotics: Science and Systems XII (RSS-2016)
    Abstract: With the increased use of high degree-of-freedom robots that must perform tasks in real-time, there is a need for fast algorithms for motion planning. In this work, we view motion planning from a probabilistic perspective. We consider smooth continuous-time trajectories as samples from a Gaussian process (GP) and formulate the planning problem as probabilistic inference. We use factor graphs and numerical optimization to perform inference quickly, and we show how GP interpolation can further increase the speed of the algorithm. Our framework also allows us to incrementally update the solution of the planning problem to contend with changing conditions. We benchmark our algorithm against several recent trajectory optimization algorithms on planning problems in multiple environments. Our evaluation reveals that our approach is several times faster than previous algorithms while retaining robustness. Finally, we demonstrate the incremental version of our algorithm on replanning problems, and show that it often can find successful solutions in a fraction of the time required to replan from scratch.
    BibTeX:
    @inproceedings{Dong-RSS-16,
    Author = "Jing Dong and Mustafa Mukadam and Frank Dellaert and Byron Boots", booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
    Title = "Motion Planning as Probabilistic Inference using Gaussian Processes and Factor Graphs",
    year = {2016}
    }
    Z. Marinho, B. Boots, A. Dragan, A. Byravan, G. J. Gordon, & S. Srinivasa. Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces. (20% Acceptance Rate) 2016 Proceedings of Robotics: Science and Systems XII (RSS-2016)
    Abstract: We introduce a functional gradient descent trajectory optimization algorithm for robot motion planning in Reproducing Kernel Hilbert Spaces (RKHSs). Functional gradient algorithms are a popular choice for motion planning in complex many-degree-of-freedom robots, since they (in theory) work by directly optimizing within a space of continuous trajectories to avoid obstacles while maintaining geometric properties such as smoothness. However, in practice, implementations such as CHOMP and TrajOpt typically commit to a fixed, finite parametrization of trajectories, often as a sequence of waypoints. Such a parameterization can lose much of the benefit of reasoning in a continuous trajectory space: e.g., it can require taking an inconveniently small step size and large number of iterations to maintain smoothness. Our work generalizes functional gradient trajectory optimization by formulating it as minimization of a cost functional in an RKHS. This generalization lets us represent trajectories as linear combinations of kernel functions, without any need for waypoints. As a result, we are able to take larger steps and achieve a locally optimal trajectory in just a few iterations. Depending on the selection of kernel, we can directly optimize in spaces of trajectories that are inherently smooth in velocity, jerk, curvature, etc., and that have a low-dimensional, adaptively chosen parameterization. Our experiments illustrate the effectiveness of the planner for different kernels, including Gaussian RBFs, Laplacian RBFs, and B-splines, as compared to the standard discretized waypoint representation.
    BibTeX:
    @inproceedings{Marinho-RSS-16,
    Author = "Zita Marinho and Anca Dragan and Arunkumar Byravan and Byron Boots and Geoffrey J. Gordon and Siddhartha Srinivasa", booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
    Title = "Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces",
    year = {2016}
    }
    A. Venkatraman, W. Sun, M. Hebert, B. Boots, & J. A. Bagnell. Inference Machines for Nonparametric Filter Learning.
    (24% Acceptance Rate)
    2016 Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI-2016)
    Abstract: Data-driven approaches for learning dynamic models for Bayesian filtering often try to maximize the data likelihood given parametric forms for the transition and observation models. However, this objective is usually nonconvex in the parametrization and can only be locally optimized. Furthermore, learning algorithms typically do not provide performance guarantees on the desired Bayesian filtering task. In this work, we propose using inference machines to directly optimize the filtering performance. Our procedure is capable of learning partially-observable systems when the state space is either unknown or known in advance. To accomplish this, we adapt Predictive State Inference Machines (PSIMS) by introducing the concept of hints, which incorporate prior knowledge of the state space to accompany the predictive state representation. This allows PSIM to be applied to the larger class of filtering problems which require prediction of a specific parameter or partial component of state. Our PSIM+HINTS adaptation enjoys theoretical advantages similar to the original PSIM algorithm, and we showcase its performance on a variety of robotics filtering problems.
    BibTeX:
    @inproceedings{Venkatraman-IJCAI-16,
    Author = "Arun Venkatraman and Wen Sun and Martial Hebert and Byron Boots and J. Andrew Bagnell", booktitle = {Proceedings of the 2016 International Joint Conference on Artificial Intelligence (IJCAI)},
    Title = "Inference Machines for Nonparametric Filter Learning.",
    year = {2016}
    }
    W. Sun, A. Venkatraman, B. Boots, & J. A. Bagnell. Learning to Filter With Predictive State Inference Machines. (24% Acceptance Rate) 2016 Proceedings of the 33rd International Conference on Machine Learning
    (ICML-2016)
    Abstract: Latent state space models are a fundamental and widely used tool for modeling dynamical systems. However, they are difficult to learn from data and learned models often lack performance guarantees on inference tasks such as filtering and prediction. In this work, we present the Predictive State Inference Machine (PSIM), a data-driven method that considers the inference procedure on a dynamical system as a composition of predictors. The key idea is that rather than first learning a latent state space model, and then using the learned model for inference, PSIM directly learns predictors for inference in predictive state space. We provide theoretical guarantees for inference, in both realizable and agnostic settings, and showcase practical performance on a variety of simulated and real world robotics benchmarks.
    BibTeX:
    @inproceedings{Sun-ICML-16,
    Author = "Wen Sun and Arun Venkatraman and Byron Boots and J. Andrew Bagnell", booktitle = {Proceedings of the 2016 International Conference on Machine Learning (ICML)},
    Title = "Learning to Filter with Predictive State Inference Machines",
    year = {2016}
    }
    M. Mukadam, X. Yan, & B. Boots. Gaussian Process Motion Planning.
    (34% Acceptance Rate)
    2016 Proceedings of the 2016 IEEE Conference on Robotics and Automation (ICRA-2016)
    Abstract: Motion planning is a fundamental tool in robotics, used to generate collision-free, smooth, trajectories, while satisfying task-dependent constraints. In this paper, we present a novel approach to motion planning using Gaussian processes. In contrast to most existing trajectory optimization algorithms, which rely on a discrete waypoint parameterization in practice, we represent the continuous-time trajectory as a sample from a Gaussian process (GP) generated by a linear time varying stochastic differential equation. We then provide a gradient-based optimization technique that optimizes continuous-time trajectories with respect to a cost functional. By exploiting GP interpolation, we develop the Gaussian Process Motion Planner (GPMP), that finds optimal trajectories parameterized by a small number of waypoints. We benchmark our algorithm against recent trajectory optimization algorithms by solving 7-DOF robotic arm planning problems in simulation and validate our approach on a real 7-DOF WAM arm.
    BibTeX:
    @inproceedings{Mukadam-ICRA-16,
    Author = "Mustafa Mukadam and Xinyan Yan and Byron Boots", booktitle = {Proceedings of the 2016 IEEE Conference on Robotics and Automation (ICRA)},
    Title = "Gaussian Process Motion Planning.",
    year = {2016}
    }
    Y. Nishiyama, A. Afsharinejad, S. Naruse, B. Boots, & L. Song. The Nonparametric Kernel Bayes Smoother.
    (30% Acceptance Rate)
    2016 Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS-2016)
    Abstract: Recently, significant progress has been made on developing kernel mean expressions for Bayesian inference. An important success in this domain is the nonparametric kernel Bayes filter (nKB-filter), which can be used for sequential inference in state space models. We expand upon this work by introducing a smoothing algorithm, the nonparametric kernel Bayes smoother (nKB-smoother), which relies on kernel Bayesian inference through the kernel sum rule and kernel Bayes rule. We derive the smoothing equations, analyze the computational cost, and show smoothing consistency. We summarize the algorithm, which is simple to implement, requiring only matrix multiplications and the output of nKB-filter. Finally, we report experimental results that compare the nKBsmoother to previous parametric and nonparametric approaches to Bayesian filtering and smoothing. In the supplementary materials, we show that the combination of nKB-filter and nKB-smoother allows marginal kernel mean computation, which gives an alternative to the kernel belief propagation.
    BibTeX:
    @inproceedings{Nishiyama-AISTATS-16,
    Author = "Yu Nishiyama and Amir Hossein Afsharinejad and Shunsuke Naruse and Byron Boots and Le Song", booktitle = {Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)},
    Title = "The Nonparametric Kernel {B}ayes Smoother.",
    year = {2016}
    }
    A. Venkatraman, W. Sun, M. Hebert, J. A. Bagnell, & B. Boots. Online Instrumental Variable Regression with Applications to Online Linear System Identification.
    (26% Acceptance Rate)
    2016 Proceedings of the 30th Conference on Artificial Intelligence (AAAI-2016)
    Abstract: Instrumental variable regression (IVR) is a statistical technique utilized for recovering unbiased estimators when there are errors in the independent variables. Estimator bias in learned time series models can yield poor performance in applications such as long-term prediction and filtering where the recursive use of the model results in the accumulation of propagated error. However, prior work addressed the IVR objective in the batch setting, where it is necessary to store the entire dataset in memory - an infeasible requirement in large dataset scenarios. In this work, we develop Online Instrumental Variable Regression (OIVR), an algorithm that is capable of updating the learned estimator with streaming data. We show that the online adaptation of IVR enjoys a no-regret performance guarantee with respect the original batch setting by taking advantage of any no-regret online learning algorithm inside OIVR for the underlying update steps. We experimentally demonstrate the efficacy of our algorithm in combination with popular no-regret online algorithms for the task of learning predictive dynamical system models and on a prototypical econometrics instrumental variable regression problem.
    BibTeX:
    @inproceedings{Venkatraman-AAAI-16,
    Author = "Arun Venkatraman and Wen Sun and Martial Hebert and J. Andrew Bagnell and Byron Boots", booktitle = {Proceedings of the Conference on Artificial Intelligence (AAAI)},
    Title = "Online Instrumental Variable Regression with Applications to Online Linear System Identification",
    year = {2016}
    }
    X. Yan, V. Indelman, & B. Boots. Incremental Sparse GP Regression for Continuous-time Trajectory Estimation & Mapping.
    2015 Proceedings of the 17th International Symposium on Robotics Research
    (ISRR-2015)
    Abstract: Recent work on simultaneous trajectory estimation and mapping (STEAM) for mobile robots has found success by representing the trajectory as a Gaussian process. Gaussian processes can represent a continuous-time trajectory, elegantly handle asynchronous and sparse measurements, and allow the robot to query the trajectory to recover its estimated position at any time of interest. A major drawback of this approach is that STEAM is formulated as a batch estimation problem. In this paper we provide the critical extensions necessary to transform the existing batch algorithm into an extremely efficient incremental algorithm. In particular, we are able to vastly speed up the solution time through efficient variable reordering and incremental sparse updates, which we believe will greatly increase the practicality of Gaussian process methods for robot mapping and localization. Finally, we demonstrate the approach and its advantages on both synthetic and real datasets.
    BibTeX:
    @inproceedings{Yan-ISRR-15,
    Author = "Xinyan Yan and Vadim Indelman and Byron Boots", booktitle = {Proceedings of the International Symposium on Robotics Research (ISRR)},
    Title = "Incremental Sparse {GP} Regression for Continuous-time Trajectory Estimation \& Mapping",
    year = {2015}
    }
    A. Shaban, M. Farajtabar, B. Xie, L. Song, & B. Boots. Learning Latent Variable Models by Improving Spectral Solutions with Exterior Point Methods.
    (34% Acceptance Rate)
    2015 Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI-2015) 
    Abstract: Probabilistic latent-variable models are a fundamental tool in statistics and machine learning. Despite their widespread use, identifying the parameters of basic latent variable models continues to be an extremely challenging problem. Traditional maximum likelihood-based learning algorithms find valid parameters, but suffer from high computational cost, slow convergence, and local optima. In contrast, recently developed method of moments-based algorithms are computationally efficient and provide strong statistical guarantees, but are not guaranteed to find valid parameters. In this work, we introduce a two-stage learning algorithm for latent variable models. We first use method of moments to find a solution that is close to the optimal solution but not necessarily in the valid set of model parameters. We then incrementally refine the solution via exterior point optimization until a local optima that is arbitrarily near the valid set of parameters is found. We perform several experiments on synthetic and real-world data and show that our approach is more accurate than previous work, especially when training data is limited.
    BibTeX:
    @inproceedings{Shaban-UAI-15,
    Author = "Amirreza Shaban and Mehrdad Farajtabar and Bo Xie and Le Song and Byron Boots", booktitle = {Proceedings of The International Conference on Uncertainty in Artificial Intelligence (UAI)},
    Title = "Learning Latent Variable Models by Improving Spectral Solutions with Exterior Point Methods",
    year = {2015}
    }
    A. Byravan, M. Monfort, B. Ziebart, B. Boots, & D. Fox. Graph-based Inverse Optimal Control for Robot Manipulation. (28% Acceptance Rate) 2015 Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI-2015)
    Abstract: Inverse optimal control (IOC) is a powerful approach for learning robotic controllers from demonstration that estimates a cost function which rationalizes demonstrated control trajectories. Unfortunately, its applicability is difficult in settings where optimal control can only be solved approximately. While local IOC approaches have been shown to successfully learn cost functions in such settings, they rely on the availability of good reference trajectories, which might not be available at test time. We address the problem of using IOC in these computationally challenging control tasks by using a graph-based discretization of the trajectory space. Our approach projects continuous demonstrations onto this discrete graph, where a cost function can be tractably learned via IOC. Discrete control trajectories from the graph are then projected back to the original space and locally optimized using the learned cost function. We demonstrate the effectiveness of the approach with experiments conducted on two 7-degree of freedom robotic arms.
    BibTeX:
    @inproceedings{Byravan-IJCAI-15,
    Author = "Arunkumar Byravan and Matthew Montfort and Brian Ziebart and Byron Boots and Dieter Fox", booktitle = {Proceedings of The International Joint Conference on Artificial Intelligence (IJCAI)},
    Title = "Layered Hybrid Inverse Optimal Control for Learning Robot Manipulation from Demonstration",
    year = {2015}
    }
    B. Boots, A. Byravan, & D. Fox. Learning Predictive Models of a Depth Camera & Manipulator from Raw Execution Traces.
    (48% Acceptance Rate)
    2014 Proceedings of the 2014 IEEE Conference on Robotics and Automation (ICRA-2014) 
    Abstract: We attack the problem of learning a predictive model of a depth camera and manipulator directly from raw execution traces. While the problem of learning manipulator models from visual and proprioceptive data has been addressed before, existing techniques often rely on assumptions about the structure of the robot or tracked features in observation space. We make no such assumptions. Instead, we formulate the problem as that of learning a high-dimensional controlled stochastic process. We leverage recent work on nonparametric predictive state representations to learn a generative model of the depth camera and robotic arm from sequences of uninterpreted actions and observations. We perform several experiments in which we demonstrate that our learned model can accurately predict future observations in response to sequences of motor commands.
    BibTeX:
    @inproceedings{Boots-Depthcam-Manipulator,
    Author = "Byron Boots and Arunkumar Byravan and Dieter Fox", booktitle = {Proceedings of The International Conference in Robotics and Automation (ICRA)},
    Title = "Learning Predictive Models of a Depth Camera \& Manipulator from Raw Execution Traces",
    year = {2014}
    }
    A. Byravan, B. Boots, S. Srinivasa, & D. Fox. Space-Time Functional Gradient Optimization for Motion Planning. (48% Acceptance Rate) 2014 Proceedings of the 2014 IEEE Conference on Robotics and Automation (ICRA-2014) 
    Abstract: Functional gradient algorithms (e.g. CHOMP) have recently shown great promise for producing optimal motion for complex many degree-of-freedom robots. A key limitation of such algorithms, is the difficulty in incorporating constraints and cost functions that explicitly depend on time. We present T-CHOMP, a functional gradient algorithm that overcomes this limitation by directly optimizing in space-time. We outline a framework for joint space-time optimization, derive an efficient trajectory-wide update for maintaining time monotonicity, and demonstrate the significance of T-CHOMP over CHOMP in several scenarios. By manipulating time, T-CHOMP produces lower-cost trajectories leading to behavior that is meaningfully different from CHOMP.
    BibTeX:
    @inproceedings{Byravan-Space-Time,
    Author = "Arunkumar Byravan and Byron Boots and Siddhartha Srinivasa and Dieter Fox", booktitle = {Proceedings of The International Conference in Robotics and Automation (ICRA)},
    Title = "Space-Time Functional Gradient Optimization for Motion Planning",
    year = {2014}
    }
    B. Boots, A. Gretton, & G. J. Gordon. Hilbert Space Embeddings of Predictive State Representations.
    (Selected for Plenary Presentation
    : 11% Acceptance Rate)
    2013 Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence
    (UAI-2013) 
    Abstract: Predictive State Representations (PSRs) are an expressive class of models for controlled stochastic processes. PSRs represent state as a set of predictions of future observable events. Because PSRs are defined entirely in terms of observable data, statistically consistent estimates of PSR parameters can be learned efficiently by manipulating moments of observed training data. Most learning algorithms for PSRs have assumed that actions and observations are finite with low cardinality. In this paper, we generalize PSRs to infinite sets of observations and actions, using the recent concept of Hilbert space embeddings of distributions. The essence is to represent the state as a nonparametric conditional embedding operator in a Reproducing Kernel Hilbert Space (RKHS) and leverage recent work in kernel methods to estimate, predict, and update the representation. We show that these Hilbert space embeddings of PSRs are able to gracefully handle continuous actions and observations, and that our learned models outperform competing system identification algorithms on several prediction benchmarks.
    BibTeX:
    @inproceedings{Boots-HSE-PSRs,
    Author = "Byron Boots and Arthur Gretton and Geoffrey J. Gordon", booktitle = {Proceedings of the 29th International Conference on Uncertainty in Artificial Intelligence (UAI)},
    Title = "Hilbert Space Embeddings of Predictive State Representations",
    year = {2013}
    }
    B. Boots & G. J. Gordon. A Spectral Learning Approach to Range-Only SLAM.
    (24% Acceptance Rate)
    2013 Proceedings of the 30th International Conference on Machine Learning
    (ICML-2013) 
    Abstract: We present a novel spectral learning algorithm for simultaneous localization and mapping (SLAM) from range data with known correspondences. This algorithm is an instance of a general spectral system identification framework, from which it inherits several desirable properties, including statistical consistency and no local optima. Compared with popular batch optimization or multiple-hypothesis tracking (MHT) methods for range-only SLAM, our spectral approach offers guaranteed low computational requirements and good tracking performance. Compared with popular extended Kalman filter (EKF) or extended information filter (EIF) approaches, and many MHT ones, our approach does not need to linearize a transition or measurement model; such linearizations can cause severe errors in EKFs and EIFs, and to a lesser extent MHT, particularly for the highly non-Gaussian posteriors encountered in range-only SLAM. We provide a theoretical analysis of our method, including finite-sample error bounds. Finally, we demonstrate on a real-world robotic SLAM problem that our algorithm is not only theoretically justified, but works well in practice: in a comparison of multiple methods, the lowest errors come from a combination of our algorithm with batch optimization, but our method alone produces nearly as good a result at far lower computational cost.
    BibTeX:
    @inproceedings{Boots-spectralROSLAM,
      Author = "Byron Boots and Geoffrey Gordon ",
      Booktitle = "Proceedings of the 30th International Conference on Machine Learning (ICML)",
      Title = "A Spectral Learning Approach to Range-Only {SLAM}",
      Year = {2013}
    }
    B. Boots & G. J. Gordon. Two-Manifold Problems with Applications to Nonlinear System Identification. (27% Acceptance Rate) 2012 Proceedings of the 29th International Conference on Machine Learning
    (ICML-2012) 
    Abstract: Recently, there has been much interest in spectral approaches to learning manifolds---so-called kernel eigenmap methods. These
    methods have had some successes, but their applicability is limited because they are not robust to noise. To address this limitation, we look at two-manifold problems, in which we simultaneously reconstruct two related manifolds, each representing a different view of the same data. By solving these interconnected learning problems together, two-manifold algorithms are able to succeed where a non-integrated approach would fail: each view allows us to suppress noise in the other, reducing bias. We propose a class of algorithms for two-manifold problems, based on spectral decomposition of cross-covariance operators in Hilbert space, and discuss when two-manifold problems are useful. Finally, we demonstrate that solving a two-manifold problem can aid in learning a nonlinear dynamical system from limited data.
    BibTeX:
    @inproceedings{Boots-2-Manifold,
      Author = "Byron Boots and Geoffrey Gordon ",
      Booktitle = "Proceedings of the 29th International Conference on Machine Learning (ICML)",
      Title = "Two-Manifold Problems with Applications to Nonlinear System Identification",
      Year = {2012}
    }
    B. Boots & G. J. Gordon. An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems.
    (25% Acceptance Rate)
    2011 Proceedings of the 25th Conference on Artificial Intelligence
    (AAAI-2011)
    Abstract: Recently, a number of researchers have proposed spectral algorithms for learning models of dynamical systems---for example, Hidden Markov Models (HMMs), Partially Observable Markov Decision Processes (POMDPs), and Transformed Predictive State Representations (TPSRs). These algorithms are attractive since they are statistically consistent and not subject to local optima. However, they are batch methods: they need to store their entire training data set in memory at once and operate on it as a large matrix, and so they cannot scale to extremely large data sets (either many examples or many features per example). In turn, this restriction limits their ability to learn accurate models of complex systems. To overcome these limitations, we propose a new online spectral algorithm, which uses tricks such as incremental SVD updates and random projections to scale to much larger data sets and more complex systems than previous methods. We demonstrate the new method on a high-bandwidth video mapping task, and illustrate desirable behaviors such as "closing the loop,'' where the latent state representation changes suddenly as the learner recognizes that it has returned to a previously known place.
    BibTeX:
    @inproceedings{Boots-online-psr,
      Author = "Byron Boots and Geoffrey Gordon ",
      Booktitle = "Proceedings of the 25th National Conference on Artificial Intelligence (AAAI)",
      Title = "An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems ",
      Year = "2011"
    }
    B. Boots, S. M. Siddiqi & G. J. Gordon. Closing the Learning-Planning Loop with Predictive State Representations. (Invited Journal Paper) 2011 The International Journal of Robotics Research (IJRR)
    Abstract: A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model which is too inaccurate to support planning or too large and complex for planning to be feasible; or, they require large amounts of prior domain knowledge or fail to provide important guarantees such as statistical consistency. To begin to fill this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of action-observation pairs. To evaluate the learned model, we then close the loop from observations to actions: we plan in the learned model and recover a policy which is near-optimal in the original environment (not the model). In more detail, we present a spectral algorithm for learning a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated high-dimensional, vision-based mobile robot planning task, and then performing approximate point-based planning in the learned PSR. This experiment shows that the algorithm learns a state space which captures the essential features of the environment, allows accurate prediction with a small number of parameters, and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles high-dimensional observations and long time horizons by working from real-valued features of observation sequences; and finally, our close-the-loop experiments provide an end-to-end practical test.
    BibTeX:
    @inproceedings{Boots-2011b,
      Author = "Byron Boots and Sajid Siddiqi and Geoffrey Gordon ",
      Title = "Closing the Learning Planning Loop with Predictive State Representations",
      Journal = "International Journal of Robotics Research (IJRR)",
      Volume = "30", 
      Year = "2011",
      Pages = "954-956"
    }
    B. Boots & G. J. Gordon Predictive State Temporal Difference Learning.
    (24% Acceptance Rate)
    2011 Proceedings of Advances in Neural Information Processing Systems 24 (NIPS-2010)    
    Abstract: We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identification (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD finds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difficult optimal stopping problem.
    BibTeX:
    @inproceedings{Boots11a,	
      Author = "Byron Boots and Geoffrey J. Gordon ",
      Booktitle = "Proceedings of Advances in Neural Information Processing Systems 24 (NIPS)",
      Title = "Predictive State Temporal Difference Learning",
      Year = {2011}
    }
    L. Song, B. Boots, S. M. Siddiqi, G. J. Gordon & A. J. Smola Hilbert Space Embeddings of Hidden Markov Models. 2010 Proceedings of the 27th International Conference on Machine Learning
    (ICML-2010) 
    Abstract: Hidden Markov Models (HMMs) are important tools for modeling sequence data. However, they are restricted to discrete latent states, and are largely restricted to Gaussian and discrete observations. And, learning algorithms for HMMs have predominantly relied on local search heuristics, with the exception of spectral methods such as those described below. We propose a Hilbert space embedding of HMMs that extends traditional HMMs to structured and non-Gaussian continuous distributions. Furthermore, we derive a local-minimum-free kernel spectral algorithm for learning these HMMs. We apply our method to robot vision data, slot car inertial sensor data and audio event classification data, and show that in these applications, embedded HMMs exceed the previous state-of-the-art performance.
    BibTeX:
    @inproceedings{Song:2010fk,	
      Author = "L. Song and B. Boots and S. M. Siddiqi and G. J. Gordon and A. J. Smola",
      Booktitle = "Proceedings of the 27th International Conference on Machine Learning (ICML)",
      Title = "Hilbert Space Embeddings of Hidden {M}arkov Models",
      Year = {2010}
    }
    B. Boots, S. M. Siddiqi & G. J. Gordon. Closing the Learning-Planning Loop with Predictive State Representations.
    (Selected for Plenary Presentation: 8% Acceptance Rate)
    2010 Proceedings of Robotics: Science and Systems VI
    (RSS-2010)
    Abstract: A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model which is too inaccurate to support planning or too large and complex for planning to be feasible; or, they require large amounts of prior domain knowledge or fail to provide important guarantees such as statistical consistency. To begin to fill this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of action-observation pairs. To evaluate the learned model, we then close the loop from observations to actions: we plan in the learned model and recover a policy which is near-optimal in the original environment (not the model). In more detail, we present a spectral algorithm for learning a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated high-dimensional, vision-based mobile robot planning task, and then performing approximate point-based planning in the learned PSR. This experiment shows that the algorithm learns a state space which captures the essential features of the environment, allows accurate prediction with a small number of parameters, and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles high-dimensional observations and long time horizons by working from real-valued features of observation sequences; and finally, our close-the-loop experiments provide an end-to-end practical test.
    BibTeX:
     @inproceedings{Boots-RSS-10,
       Author = "B. Boots AND S. Siddiqi and G. Gordon",
       Title = "Closing the Learning-Planning Loop with Predictive State Representations",
       Booktitle = "Proceedings of Robotics: Science and Systems VI (RSS)",
       Year = "2010",
       Address = "Zaragoza, Spain",
       Month = "June"
    }
                  
    S. M. Siddiqi, B. Boots & G. J. Gordon. Reduced-Rank Hidden Markov Models.
    (Selected for Plenary Presentation: 8% Acceptance Rate)
    2010 Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS-2010)  
    Abstract: Hsu et al. (2009) recenly proposed an efficient, accurate spectral learning algorirhm for Hidden Markov Models (HMMs). In this paper we relax their assumptions and prove a tighter finite-sample error bound for the case of Reduced-Rank HMMs, i.e., HMMs with low-rank transition matrices. Since rank-k RR-HMMs are a larger class of models than k-state HMMs while being equally efficient to work with, this relaxation greatly increases the learning algorithm's scope. In addition, we generalize the algorithm and bounds to models where multiple observations are needed to disambiguate state, and to models that emit multivariate real-valued observations. Finally we prove consistency for learning Predictive State Representations, an even larger class of models. Experiments on synthetic data and a toy video, as well as on difficult robot vision data, yeild accurate models that compare favorably with alternatives in simulation quality and prediction accuracy.
    BibTeX:
    @inproceedings{Siddiqi10a,
      author = "Sajid Siddiqi and Byron Boots and Geoffrey J. Gordon",
      title = "Reduced-Rank Hidden {Markov} Models",
      booktitle = "Proceedings of the Thirteenth International Conference 
      on Artificial Intelligence and Statistics (AISTATS)",
      year = "2010"
    }
    B. Boots, S. M. Siddiqi & G. J. Gordon. Closing the Learning-Planning Loop with Predictive State Representations. (Short paper: for longer version see paper accepted at RSS above) 2010 Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems
    (AAMAS-2010)
    Abstract: A central problem in artificial intelligence is to plan to maximize future reward under uncertainty in a partially observable environment. Models of such environments include Partially Observable Markov Decision Processes (POMDPs) as well as their generalizations, Predictive State Representations (PSRs) and Observable Operator Models (OOMs). POMDPs model the state of the world as a latent variable; in contrast, PSRs and OOMs represent state by tracking occurrence probabilities of a set of future events (called tests or characteristic events) conditioned on past events (called histories or indicative events). Unfortunately, exact planning algorithms such as value iteration are intractable for most realistic POMDPs due to the curse of history and the curse of dimensionality. However, PSRs and OOMs hold the promise of mitigating both of these curses: first, many successful approximate planning techniques designed to address these problems in POMDPs can easily be adapted to PSRs and OOMs. Second, PSRs and OOMs are often more compact than their corresponding POMDPs (i.e., need fewer state dimensions), mitigating the curse of dimensionality. Finally, since tests and histories are observable quantities, it has been suggested that PSRs and OOMs should be easier to learn than POMDPs; with a successful learning algorithm, we can look for a model which ignores all but the most important components of state, reducing dimensionality still further.
    BibTeX:
    @inproceedings{Boots10a,
      author = "Byron Boots and Sajid Siddiqi and Geoffrey J. Gordon",
      title = "Closing the Learning-Planning Loop with Predictive State Representations",
      booktitle = "Proceedings of the 9th International Conference on Autonomous 
      Agents and Multiagent Systems (AAMAS)",
      year = "2010"
    }
    S. M. Siddiqi, B. Boots & G. J. Gordon. A Constraint Generation Approach to Learning Stable Linear Dynamical Systems.
    (Selected for Plenary Presentation: 2.5% Acceptance Rate)
    2008 Proceedings of Advances in Neural Information Processing Systems 21 (NIPS-2007)  
    Abstract: Stability is a desirable characteristic for linear dynamical systems, but it is often ignored by algorithms that learn these systems from data. We propose a novel method for learning stable linear dynamical systems: we formulate an approximation of the problem as a convex program, start with a solution to a relaxed version of the program, and incrementally add constraints to improve stability. Rather than continuing to generate constraints until we reach a feasible solution, we test stability at each step; because the convex program is only an approximation of the desired problem, this early stopping rule can yield a higher-quality solution. We apply our algorithm to the task of learning dynamic textures from image sequences as well as to modeling biosurveillance drug-sales data. The constraint generation approach leads to noticeable improvement in the quality of simulated sequences. We compare our method to those of Lacy and Bernstein, with positive results in terms of accuracy, quality of simulated sequences, and efficiency.
    BibTeX:
    @inproceedings{Siddiqi07b,
      author = "Sajid Siddiqi and Byron Boots and Geoffrey J. Gordon",
      title = "A Constraint Generation Approach to Learning Stable Linear Dynamical Systems",
      booktitle = "Proceedings of Advances in Neural Information Processing Systems 20 (NIPS)",
      year = "2007"
    }
    S. M. Siddiqi, B. Boots G. J. Gordon, & A. W. Dubrawski Learning Stable Multivariate Baseline Models for Outbreak Detection. 2007 Advances in Disease Surveillance
    Abstract: We propose a novel technique for building generative models of real-valued multivariate time series data streams. Such models are of considerable utility as baseline simulators in anomaly detection systems. The proposed algorithm, based on Linear Dynamical Systems (LDS), learns stable parameters efficiently while yeilding more accurate results than previously known methods. The resulting model can be used to generate infinitely long sequences of realistic baselines using small samples of training data.
    BibTeX:
    @article{Siddiqi07c,
      author = "Sajid Siddiqi and Byron Boots and Geoffrey J. Gordon and Artur W. Dubrawski",
      title = "Learning Stable Multivariate Baseline Models for Outbreak Detection",
      journal = "Advances in Disease Surveillance",
      year = "2007",
      volume = {4},
      pages = {266}
    }
    
    B. Boots, S. Nundy & D. Purves. Evolution of Visually Guided Behavior in Artificial Agents. 2007 Network: Computation in Neural Systems  
    Abstract: Recent work on brightness, color, and form has suggested that human visual percepts represent the probable sources of retinal images rather than stimulus features as such. Here we investigate the plausibility of this empirical concept of vision by allowing autonomous agents to evolve in virtual environments based solely on the relative success of their behavior. The responses of evolved agents to visual stimuli indicate that fitness improves as the neural network control systems gradually incorporate the statistical relationship between projected images and behavior appropriate to the sources of the inherently ambiguous images. These results: (1) demonstrate the merits of a wholly empirical strategy of animal vision as a means of contending with the inverse optics problem; (2) argue that the information incorporated into biological visual processing circuitry is the relationship between images and their probable sources; and (3) suggest why human percepts do not map neatly onto physical reality.
    BibTeX:
    @article{Boots2007a,
      author = "Byron Boots and Surajit Nundy and Dale Purves",
      title = "Evolution of Visually Guided Behavior in Artificial Agents",
      journal = "Network: Computation in Neural Systems",
      year = "2007",
      volume = {18(1)},
      pages = {11--34}
    }
    S. Majercik & B. Boots DC-SSAT: A Divide-and-Conquer Approach to Solving Stochastic Satisfiability Problems Efficiently.
    (18% acceptance rate)
    2005 Proceedings of the 20th National Conference on Artificial Intelligence
    (AAAI-2005)
    Abstract: We present DC-SSAT, a sound and complete divide-and-conquer algorithm for solving stochastic satisfiability (SSAT) problems that outperforms the best existing algorithm for solving such problems (ZANDER) by several orders of magnitude with respect to both time and space. DC-SSAT achieves this performance by dividing the SSAT problem into subproblems based on the structure of the original instance, caching the viable partial assignments (VPAs) generated by solving these subproblems, and using these VPAs to construct the solution to the original problem. DC-SSAT does not save redundant VPAs and each VPA saved is necessary to construct the solution. Furthermore, DC-SSAT builds a solution that is already human-comprehensible, allowing it to avoid the costly solution rebuilding phase in ZANDER. As a result, DC-SSAT is able to solve problems using, typically, 1-2 orders of magnitude less space than ZANDER, allowing DC-SSAT to solve problems ZANDER cannot solve due to space constraints. And, in spite of its more parsimonious use of space, DC-SSAT is typically 1-2 orders of magnitude faster than ZANDER. We describe the DC-SSAT algorithm and present empirical results comparing its performance to that of ZANDER on a set of SSAT problems.
    BibTeX:
    @inproceedings{Majercik2005,
      author = "Stephen M. Majercik and Byron Boots",
      title = "DC-SSAT: A Divide-and-Conquer Approach to Solving 
      Stochastic Satisfiability Problems Efficiently",
      booktitle = "Proceedings of the Twentieth National Conference 
      on Artificial Intelligence (AAAI)",
      year = "2005",
      note = "AAAI-05"
    }

    Refereed Abstracts, Short Papers, & Workshop Publications
    Authors Title Year Venue
    Y. Pan, C. Cheng, K. Saigol, K. Lee, X. Yan, E. Theodorou, & B. Boots. Learning Deep Neural Netork Control Policies for Agile Off-Road Autonomous Driving. 2017 The NIPS Deep Rienforcement Learning Symposium
    Abstract: We present an end-to-end imitation learning framework for agile, off-road autonomous driving. Given expert demonstrations, we train a deep neural network to map on-board raw observations to continuous control commands. Compared with recent approaches to similar tasks, our method is data efficient and requires neither state estimation nor online planning to navigate the vehicle. By imitating an optimal controller or a human expert, the neural network policy is capable of learning steering and throttle controls, the latter of which is required to successfully drive on varied terrain at high speeds. Both simulated and real-world experimental results demonstrate successful autonomous off-road driving, matching the state-of-the-art speed.
    BibTeX:
    @inproceedings{Pan-NIPSWS-17,
    Author = "Yunpeng Pan and Ching-An Cheng and Kamil Saigol
    and Keuntaek Lee and Xinyan Yan and Evangelos Theodorou and Byron Boots.", booktitle = {The NIPS Deep Rienforcement Learning Symposium},
    Title = "Learning Deep Neural Netork Control Policies for Agile Off-Road Autonomous Driving",
    year = {2017}
    }
    C. Cheng & B. Boots. Convergence of Value Aggregation for Imitation Learning. 2017 The NIPS Deep Rienforcement Learning Symposium
    Abstract: Value aggregation is a general framework for solving imitation learning problems. Based on the idea of data aggregation, it generates a policy sequence by iteratively interleaving policy optimization and evaluation in an online learning setting. While the existence of a good policy in the policy sequence can be guaranteed non-asymptotically, little is known about the convergence of the sequence or the performance of the last policy. In this paper, we debunk the common belief that value aggregation always produces a convergent policy sequence. Moreover, we identify a critical stability condition for convergence and provide a tight non-asymptotic bound on the performance of the last policy. These new theoretical insights let us stabilize problems with regularization, which removes the inconvenient process of identifying the best policy in the policy sequence in stochastic problems.
    BibTeX:
    @inproceedings{Cheng-NIPSWS-17,
    Author = "Ching-An Cheng and Byron Boots.", booktitle = {The NIPS Deep Rienforcement Learning Symposium},
    Title = "Convergence of Value Aggregagtion for Imitation Learning",
    year = {2017}
    }
    A. Venkatraman, N. Rhinehart, W. Sun, L. Pinto, B. Boots, K. Kitani, & J. A. Bagnell. Predictive State Decoders: Encoding the Future into Recurrent Networks. 2017 The NIPS Deep Rienforcement Learning Symposium
    Abstract: Recurrent neural networks (RNNs) are a vital modeling technique that rely on internal states learned indirectly by optimization of a supervised, unsupervised, or reinforcement training loss. RNNs are used to model dynamic processes that are characterized by underlying latent states whose form is often unknown, precluding its analytic representation inside an RNN. In the Predictive-State Representation (PSR) literature, latent state processes are modeled by an internal state representation that directly models the distribution of future observations, and most recent work in this area has relied on explicitly representing and targeting sufficient statistics of this probability distribution. We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations. PSDs are simple to implement and easily incorporated into existing training pipelines via additional loss regularization. We demonstrate the effectiveness PSDs with experimental results in three different domains: probabilistic filtering, Imitation Learning, and Reinforcement Learning. Our method improves statistical performance of state-of-the-art recurrent baselines and does so with fewer iterations and less data.
    BibTeX:
    @inproceedings{Venkatraman-NIPSWS-17,
    Author = "Arun Venkatraman and Nicholas Rhinehart and Wen Sun and Lerrel Pinto and Byron Boots and Kris Kitani and James Andrew Bagnell.", booktitle = {The NIPS Deep Rienforcement Learning Symposium},
    Title = "Predictive State Decoders: Encoding the Future into Recurrent Networks",
    year = {2017}
    }
    B. Dai, N He, Y. Pan, B. Boots,& L. Song Learning from Conditional Distributions via Dual Embeddings. 2017 The NIPS Workshop on Learning on Distributions, Functions, Graphs and Groups
    Abstract: Many machine learning tasks, such as learning with invariance and policy evaluation in reinforcement learning, can be characterized as problems of learning from conditional distributions. In such problems, each sample x itself is associated with a conditional distribution p(z|x) represented by samples, and the goal is to learn a function f that links these conditional distributions to target values y. These problems become very challenging when we only have limited samples or, in the extreme case, only one sample from each conditional distribution. Commonly used approaches either assume that z is independent of x, or require an overwhelmingly large set of samples from each conditional distribution. To address these challenges, we propose a novel approach which employs a new min-max reformulation of the learning from conditional distributions problem. With such new reformulation, we only need to deal with the joint distribution p(z, x). We also design an efficient learning algorithm, Embedding-SGD, and establish theoretical sample complexity for such problems. Finally, our numerical experiments, on both synthetic and real-world datasets, show that the proposed approach shows significant improvement over existing algorithms.
    BibTeX:
    @inproceedings{Dai-NIPSWS-17,
    Author = "Bo Dai and Niao He and Yunpeng Pan and Byron Boots and Le Song", booktitle = {Short Paper at the Conference on Robot Learning (CoRL)},
    Title = "The NIPS Workshop on Learning on Distributions, Functions, Graphs and Groups",
    year = {2017}
    }
    X. Yan, K. Choromanski, B. Boots, & V. Sindhwani. Manifold Regularization for Kernelized LSTD. 2017 Short Paper at the Conference on Robot Learning (CoRL-2017)
    Abstract: Policy evaluation or value function or Q-function approximation is a key procedure in reinforcement learning (RL). It is a necessary component of policy iteration and can be used for variance reduction in policy gradient methods. Therefore, its quality has a significant impact on most RL algorithms. Motivated by manifold regularized learning, we propose a novel kernelized policy evaluation method that takes advantage of the intrinsic geometry of the state space learned from data, in order to achieve better sample efficiency and higher accuracy in Q-function approximation. Applying the proposed method in the Least-Squares Policy Iteration (LSPI) framework, we observe superior performance compared to widely used parametric basis functions on two standard benchmarks in terms of policy quality.
    BibTeX:
    @inproceedings{Yan-CoRL-17,
    Author = "Xinyan Yan and Krzysztof Choromanski and Byron Boots and Vikas Sindhwani.", booktitle = {Short Paper at the Conference on Robot Learning (CoRL)},
    Title = "Manifold Regularization for Kernelized LSTD",
    year = {2017}
    }
    C. Downey, A. Hefny, B. Li, B. Boots, & G. J. Gordon. Predictive State Recurrent Neural Networks. 2017 Short Paper at the Conference on Robot Learning (CoRL-2017)
    Abstract: We present a new model, Predictive State Recurrent Neural Networks (PSRNNs), for filtering and prediction in dynamical systems. PSRNNs draw on insights from both Recurrent Neural Networks (RNNs) and Predictive State Representations (PSRs), and inherit advantages from both types of models. Like many successful RNN architectures, PSRNNs use (potentially deeply composed) bilinear transfer functions to combine information from multiple sources. We show that such bilinear functions arise naturally from state updates in Bayes filters like PSRs, in which observations can be viewed as gating belief states. We also show that PSRNNs can be learned effectively by combining Backpropogation Through Time (BPTT) with an initialization derived from a statistically consistent learning algorithm for PSRs called two-stage regression (2SR). Finally, we show that PSRNNs can be factorized using tensor decomposition, reducing model size and suggesting interesting connections to existing multiplicative architectures such as LSTMs. We applied PSRNNs to 4 datasets, and showed that we outperform several popular alternative approaches to modeling dynamical systems in all cases.
    BibTeX:
    @inproceedings{Downey-CoRL-17,
    Author = "Carlton Downey and Ahmed Hefny and Boyue Li and Byron Boots and Geoffrey J. Gordon.", booktitle = {Short Paper at the Conference on Robot Learning (CoRL)},
    Title = "Predictive State Recurrent Neural Networks",
    year = {2017}
    }
    A. Venkatraman, N. Rhinehart, W. Sun, L. Pinto, B. Boots, K. Kitani, & J. A. Bagnell. Predictive State Decoders: Encoding the Future into Recurrent Networks. 2017 Short Paper at the Conference on Robot Learning (CoRL-2017)
    Abstract: Recurrent neural networks (RNNs) are a vital modeling technique that rely on internal states learned indirectly by optimization of a supervised, unsupervised, or reinforcement training loss. RNNs are used to model dynamic processes that are characterized by underlying latent states whose form is often unknown, precluding its analytic representation inside an RNN. In the Predictive-State Representation (PSR) literature, latent state processes are modeled by an internal state representation that directly models the distribution of future observations, and most recent work in this area has relied on explicitly representing and targeting sufficient statistics of this probability distribution. We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations. PSDs are simple to implement and easily incorporated into existing training pipelines via additional loss regularization. We demonstrate the effectiveness PSDs with experimental results in three different domains: probabilistic filtering, Imitation Learning, and Reinforcement Learning. Our method improves statistical performance of state-of-the-art recurrent baselines and does so with fewer iterations and less data.
    BibTeX:
    @inproceedings{Venkatraman-CoRL-17,
    Author = "Arun Venkatraman and Nicholas Rhinehart and Wen Sun and Lerrel Pinto
    and Byron Boots and Kris Kitani and James Andrew Bagnell.", booktitle = {Short Paper at the Conference on Robot Learning (CoRL)},
    Title = "Predictive State Decoders: Encoding the Future into Recurrent Networks",
    year = {2017}
    }
    M. A. Rana, M. Mukadam, S. R. Ahmadzadeh, S. Chernova, & B. Boots. Probabilistic Skill Learning with Skill Reproduction as Inference Based Planning. 2017 The RSS Workshop on (Empirically) Data-Driven Manipulation
    Abstract: We present a novel unifying approach to conventional learning from demonstration (LfD) and motion planning using probabilistic inference for skill reproduction. We also provide a new probabilistic skill model that requires minimal parameter tuning, and is more suited for encoding skill constraints and performing inference in an efficient manner. Preliminary experimental results using real-world data are presented.
    BibTeX:
    @inproceedings{Rana-RSS-DDM-17,
    Author = "Muhammad Asif Rana and Mustafa Mukadam and S. Reza Ahmadzadeh and Sonia Chernova and Byron Boots", booktitle = {The RSS Workshop on (Empirically) Data-Driven Manipulation},
    Title = "Probabilistic Skill Learning with Skill Reproduction as Inference Based Planning",
    year = {2017}
    }
    A. S. Lambert, A. Shaban, Z. Liu, & B. Boots. Deep Forward and Inverse Perceptual Models for Tracking and Prediction. 2017 The RSS Workshop on New Frontiers for Deep Learning in Robotics
    Abstract: We present a non-parametric perceptual model for generating video frames with deep networks, and provide a framework for its use in tracking and prediction tasks on a real robotic system. This is shown to greatly outperform standard deconvolutional methods for image generation, producing clear photo-realistic images. For tracking, we incorporate the sensor model into an Extended Kalman Filter and estimate robot trajectories. As a comparison, we introduce a secondary framework consisting of a discriminative, inverse model for state estimation, and compare this approach to that using a generative model.
    BibTeX:
    @inproceedings{Lambert-RSS-NFDLR-17,
    Author = "Alexander Lambert and Amirreza Shaban and Zhen Liu and Byron Boots", booktitle = {The RSS Workshop on New Frontiers for Deep Learning in Robotics},
    Title = "Deep Forward and Inverse Perceptual Models for Tracking and Prediction",
    year = {2017}
    }
    M. Mukadam, J. Dong, F. Dellaert, & B. Boots. Simultaneous Trajectory Estimation and Planning via Probabilistic Inference. 2017 The RSS Workshop on POMDPs in Robotics
    Abstract: We provide a unified probabilistic framework for trajectory estimation and planning. The key idea is to view these two problems, usually considered separately, as a single problem. At each time-step the robot is tasked with finding the complete continuous-time trajectory from start to goal. This can be quite difficult; the robot must contend with a potentially high-degree-of-freedom (DOF) trajectory space, uncertainty due to limited sensing capabilities, model inaccuracy, and the stochastic effect of executing actions, and the robot must find the solution in (faster than) real time. To overcome these challenges, we build on recent probabilistic inference approaches to continuous-time localization and mapping and continuous-time motion planning. We solve the joint problem by iteratively recomputing the maximum a posteriori trajectory conditioned on all available sensor data and cost information. Finally, we evaluate our framework empirically in both simulation and on a mobile manipulator.
    BibTeX:
    @inproceedings{Mukadam-RSS-POMDP-17,
    Author = "Mustafa Mukadam and Jing Dong and Frank Dellaert and Byron Boots", booktitle = {The RSS Workshop on POMDPs in Robotics},
    Title = "Simultaneous Trajectory Estimation and Planning via Probabilistic Inference",
    year = {2017}
    }
    Y. Pan, C. Cheng, K. Saigol, K. Lee, X. Yan, E. Theodorou, & B. Boots Deep AutoRally: End-to-End Imitation Learning for Agile Autonomous Driving. 2017 The RSS Workshop on Learning From Demonstrations in High-Dimensional Feature Spaces
    Abstract: In this paper, we aim to learn control policies for aggressive, agile, autonomous driving on rough terrain. Our setup consists of a robot that is required to perform precise maneuvers in a physically-complex environment by making high-frequency decisions. We adopt an end-to-end imitation learning framework with the goal of finding a policy that minimizes expected accumulated cost. This task is challenging for human drivers to demonstrate, so during the training phase, we assume an oracle, or expert, which uses additional sensors, model knowledge, and computation unavailable during the testing phase to generate demonstrations. Given these expert demonstrations, we train a deep neural network that maps raw observations to control commands. We present real-world experimental results showing successful application of our method.
    BibTeX:
    @inproceedings{Pan-RSS-LfD-17,
    Author = "Yunpeng Pan and Ching-An Cheng and Kamil Saigol and Keunteak Lee
    and Xinyan Yan and Evangelos A. Theodorou and Byron Boots", booktitle = {The RSS Workshop on Learning From Demonstrations in High-Dimensional Feature Spaces},
    Title = "Deep AutoRally: End-to-End Imitation Learning for Agile Autonomous Driving",
    year = {2017}
    }
    A. S. Lambert, A. Shaban, Z. Liu, & B. Boots. Deep Forward and Inverse Perceptual Models for Tracking and Prediction. 2017 The RSS Workshop on Articulated Object Tracking
    Abstract: We present a non-parametric perceptual model for generating video frames with deep networks, and provide a framework for its use in tracking and prediction tasks on a real robotic system. This is shown to greatly outperform standard deconvolutional methods for image generation, producing clear photo-realistic images. For tracking, we incorporate the sensor model into an Extended Kalman Filter and estimate robot trajectories. As a comparison, we introduce a secondary framework consisting of a discriminative, inverse model for state estimation, and compare this approach to that using a generative model.
    BibTeX:
    @inproceedings{Lambert-RSS-AOT-17,
    Author = "Alexander Lambert and Amirreza Shaban and Zhen Liu and Byron Boots", booktitle = {The RSS Workshop on Articulated Object Tracking},
    Title = "Deep Forward and Inverse Perceptual Models for Tracking and Prediction",
    year = {2017}
    }
    M. A. Rana, M. Mukadam, S. R. Ahmadzadeh, S. Chernova, & B. Boots. Probabilistic Skill Learning with Skill Reproduction as Inference Based Planning. 2017 The RSS Workshop on Mathematical Models, Algorithms, and Human-Robot Interaction
    Abstract: We present a novel unifying approach to conventional learning from demonstration (LfD) and motion planning using probabilistic inference for skill reproduction. We also provide a new probabilistic skill model that requires minimal parameter tuning, and is more suited for encoding skill constraints and performing inference in an efficient manner. Preliminary experimental results using real-world data are presented.
    BibTeX:
    @inproceedings{Rana-RSS-AHRI-17,
    Author = "Muhammad Asif Rana and Mustafa Mukadam and S. Reza Ahmadzadeh and Sonia Chernova and Byron Boots", booktitle = {The RSS Workshop on Mathematical Models, Algorithms, and Human-Robot Interaction},
    Title = "Probabilistic Skill Learning with Skill Reproduction as Inference Based Planning",
    year = {2017}
    }
    E. Huang, A. Bhatia, B. Boots, & M. Mason. Exact Bounds on the Contact-Driven Motion of a Sliding Object, With Applications to Robotic Pulling. 2017 The RSS Workshop on 'Revisiting Contact - Turning a Problem into a Solution'
    Abstract: This paper explores the quasi-static motion of a planar slider being pushed or pulled through a single contact point assumed not to slip. The main contribution is to derive a method for computing exact bounds on the object's motion for classes of pressure distributions where the center of pressure is known but the distribution of support forces is unknown. The second contribution is to show that the exact motion bounds can be used to plan robotic pulling trajectories that guarantee convergence to the final pose. The planner was tested on the task of pulling an acrylic rectangle to random locations within the robot workspace. The generated plans were accurate to 4.00mm +/- 3.02mm of the target position and 4.35 degrees +/- 3.14 degrees of the target orientation.
    BibTeX:
    @inproceedings{Huang-RSSWS-17,
    Author = "Eric Huang and Ankit Bhatia and Byron Boots and Matthew T. Mason", booktitle = {The RSS Workshop on 'Revisiting Contact - Turning a Problem into a Solution'},
    Title = "Exact Bounds on the Contact-Driven Motion of a Sliding Object, With Applications to Robotic Pulling",
    year = {2017}
    }
    W. Sun, A. Venkatraman, G. J. Gordon, B. Boots, & J. A. Bagnell. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction.
    (Selected for Oral Presentation: 10% Acceptance Rate)
    2017 The 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM-2017)
    Abstract: Researchers have demonstrated state-of-the-art performance in sequential decision making problems (e.g., robotics control, sequential prediction) with deep neural network models. One often has access to near-optimal oracles that achieve good performance on the task during training. We demonstrate that AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL) approach of Ross et al. --- can leverage such an oracle to achieve faster and better solutions with less training data than a less-informed Reinforcement Learning (RL) technique. Using both feedforward and recurrent neural predictors, we present stochastic gradient procedures on a sequential prediction task, dependency-parsing from raw image data, as well as on various high dimensional robotics control problems. We also provide a comprehensive theoretical study of IL that demonstrates we can expect up to exponentially lower sample complexity for learning with AggreVaTeD than with RL algorithms, which backs our empirical findings. Our results and theory indicate that the proposed approach can achieve superior performance with respect to the oracle when the demonstrator is sub-optimal.
    BibTeX:
    @article{Sun_RLDM_17,
      author = "Wen Sun and Arun Venkatraman and Geoffrey J. Gordon and Byron Boots and J. Andrew Bagnell",
      title = "Deeply {A}ggre{V}a{T}e{D}: Differentiable Imitation Learning for Sequential Prediction",
      journal = "The 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making",
      year = "2017"
    }
    
    Y. Pan, X. Yan, E. Theodorou, & B. Boots Prediction under Uncertainty in Sparse Spectrum Gaussian Processes with Applications to Filtering and Control. 2017 The 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM-2017)
    Abstract: In many sequential prediction and decision-making problems such as Bayesian filtering and probabilistic model-based planning and control, we need to cope with the challenge of prediction under uncertainty, where the goal is to compute the predictive distribution p(y) given a input distribution p(x) and a probabilistic model p(y|x). Computing the exact predictive distribution is generally intractable. In this work, we consider a special class of problems in which the input distribution p(x) is a multivariate Gaussian, and the probabilistic model p(y|x) is learned from data and specified by a sparse spectral representation of Gaussian processes (SSGPs). SSGPs are a powerful tool for scaling Gaussian processes (GPs) to large datasets by approximating the covariance function using finite-dimensional random Fourier features. Existing SSGP algorithms for regression assume deterministic inputs, precluding their use in many sequential prediction and decision-making applications where accounting for input uncertainty is crucial. To address this prediction under uncertainty problem, we propose an exact moment-matching approach with closed-form expressions for predictive distributions. Our method is more general and scalable than its standard GP counterpart, and is naturally applicable to multi-step prediction or uncertainty propagation. We show that our method can be used to develop new algorithms for Bayesian filtering and stochastic model predictive control, and we evaluate the applicability of our method with both simulated and real-world experiments.
    BibTeX:
    @article{Pan_RLDM_17,
      author = "Yunpeng Pan and Xinyan Yan and Evangelos Theodorou and Byron Boots.",
      title = "Prediction under Uncertainty in Sparse Spectrum Gaussian Processes with Applications to Filtering and Control",
      journal = "The 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making",
      year = "2017"
    }
    
    G. Williams, N. Wagener, B. Goldfain, P. Drews, J. Rehg, B. Boots, & E. Theodorou. Information Theoretic MPC Using Neural Network Dynamics.
    2016 The NIPS Deep Reinforcement Learning Workshop
    Abstract: We introduce an information theoretic model predictive control (MPC) algorithm that is capable of controlling systems with dynamics represented by multi-layer neural networks and subject to complex cost criteria. The proposed approach is validated in two difficult simulation scenarios, a cart-pole swing up and quadrotor navigation task, and on real hardware on a 1/5th scale vehicle in an aggressive driving task.
    BibTeX:
    @article{WilliamsNIPSWS16,
      author = "Grady Williams and Nolan Wagener and Brian Goldfain and Paul Drews and James Rehg and Byron Boots and Evangelos Theodorou.",
      title = "Information Theoretic {MPC} Using Neural Network Dynamics",
      journal = "The NIPS Deep Reinforcement Learning Workshop",
      year = "2016"
    }
    
    Y. Pan, X. Yan, E. Theodorou, & B. Boots. Solving the Linear Bellman Equation via Kernel Embeddings and Stochastic Gradient Descent.
    (Selected for Oral Presentation)
    2016 NIPS Workshop on Adaptive and Scalable Nonparametric Methods in Machine Learning
    Abstract: We introduce a data-efficient approach for solving the linear Bellman equation, which corresponds to a class of Markov decision processes (MDPs) and stochastic optimal control (SOC) problems. We show that this class of control problem can be cast as a stochastic composition optimization problem, which can be further reformulated as a saddle point problem and solved via dual kernel embeddings. Our method is model-free and using only one sample per state transition from stochastic dynamical systems. Different from related work such as Z-learning based on temporal-difference learning, our method is an online algorithm following the true stochastic gradient. Numerical results are provided, showing that our method outperforms the Z-learning algorithm.
    BibTeX:
    @article{ChengNIPSWS16,
      author = "Yunpeng Pan and Xinyan Yan and Evangelos Theodorou and Byron Boots",
      title = "Solving the Linear {B}ellman Equation via Kernel Embeddings and Stochastic Gradient Descent",
      journal = "NIPS Workshop on Adaptive and Scalable Nonparametric Methods in Machine Learning",
      year = "2016"
    }
    
    C. Cheng & B. Boots. Incremental Variational Sparse Gaussian Process Regression. 2016 NIPS Workshop on Adaptive and Scalable Nonparametric Methods in Machine Learning
    Abstract: Recent work on scaling up Gaussian process regression (GPR) to large datasets has primarily focused on sparse GPR, which leverages a small set of basis functions to approximate the full Gaussian process during inference. However, the majority of these approaches are batch methods that operate on the entire training dataset at once, precluding the use of datasets that are streaming or too large to fit into memory. Although previous work has considered incrementally solving variational sparse GPR, most algorithms fail to update the basis functions and therefore perform suboptimally. We propose a novel incremental learning algorithm for variational sparse GPR based on stochastic mirror ascent of probability densities in reproducing kernel Hilbert space. This new formulation allows our algorithm to update basis functions online in accordance with the manifold structure of probability densities for fast convergence. We conduct several experiments and show that our proposed approach achieves better empirical performance in terms of prediction error than the recent state-of-the-art incremental solutions to variational sparse GPR.
    BibTeX:
    @article{ChengNIPSWS16,
      author = "Cheng-An Cheng and Byron Boots",
      title = "Incremental Variational Sparse {G}aussian Process Regression",
      journal = "NIPS Workshop on Adaptive and Scalable Nonparametric Methods in Machine Learning",
      year = "2016"
    }
    
    Y. Pan, X. Yan, E. Theodorou, & B. Boots. Scalable Reinforcement Learning via Trajectory Optimization and Approximate Gaussian Process Regression. 2015 NIPS Workshop on Advances in Approximate Bayesian Inference
    Abstract: In order to design an efficient RL algorithm, we combine the attractive characteristics of two approaches: local trajectory optimization and random feature approximations. Local trajectory optimization methods, such as Differential Dynamic Programming (DDP), are a class of approaches for solving nonlinear optimal control problems. Compared to global approaches, DDP shows superior computational efficiency and scalability to high-dimensional problems. The principal limitation of DDP is that it relies on accurate and explicit representation of the dynamics. In this work we take a nonparametric approach to learn the dynamics based on Gaussian processes (GPs). GPs have demonstrated encouraging performance in modeling dynamical systems, but are also computationally expensive and do not scale to moderate or large datasets. While a number of approximation methods exist, sparse spectrum Gaussian process regression (SSGPR) stands out with a superior combination of efficiency and accuracy. By combining the benefits of both DDP and SSGPR, we show that our approach is able to scale to high-dimensional dynamical systems and large datasets.
    BibTeX:
    @article{PanNIPSWS15,
      author = "Yunpeng Pan and Xinyan Yan and Evangelos Theodorou and Byron Boots",
      title = "Scalable Reinforcement Learning via Trajectory Optimization and Approximate {G}aussian Process Regression",
      journal = "NIPS Workshop on Advances in Approximate Bayesian Inference",
      year = "2015"
    }
    
    A. Shaban, M. Farajtabar, B. Xie, L. Song, & B. Boots. Learning Latent Variable Models by Improving Spectral Solutions with Exterior Point Methods. 2015 NIPS Workshop on Non-convex Optimization for Machine Learning: Theory and Practice
    Abstract: Probabilistic latent-variable models are a fundamental tool in statistics and machine learning. Despite their widespread use, identifying the parameters of basic latent variable models continues to be an extremely challenging problem. Traditional maximum likelihood-based learning algorithms find valid parameters, but suffer from high computational cost, slow convergence, and local optima. In contrast, recently developed method of moments-based algorithms are computationally efficient and provide strong statistical guarantees, but are not guaranteed to find valid parameters. In this work, we introduce a two-stage learning algorithm for latent variable models. We first use method of moments to find a solution that is close to the optimal solution but not necessarily in the valid set of model parameters. We then incrementally refine the solution via exterior point optimization until a local optima that is arbitrarily near the valid set of parameters is found. We perform several experiments on synthetic and real-world data and show that our approach is more accurate than previous work, especially when training data is limited.
    BibTeX:
    @article{ShabanNIPSWS15,
      author = "Amirreza Shaban and  Mehrdad Farajtabar and Bo Xie and Le Song and Byron Boots",
      title = "Learning Latent Variable Models by Improving Spectral Solutions with Exterior Point Methods",
      journal = "NIPS Workshop on Non-convex Optimization for Machine Learning: Theory and Practice",
      year = "2015"
    }
    
    X. Yan, V. Indelman, & B. Boots. Incremental Sparse GP Regression for Continuous-time Trajectory Estimation & Mapping. 2015 RSS Workshop on the Problem of Mobile Sensors: Setting future goals and indicators of progress for SLAM.
    Abstract: Recent work on simultaneous trajectory estimation and mapping (STEAM) for mobile robots has found success by representing the trajectory as a Gaussian process. Gaussian processes can represent a continuous-time trajectory, elegantly handle asynchronous and sparse measurements, and allow the robot to query the trajectory to recover its estimated position at any time of interest. A major drawback of this approach is that STEAM is formulated as a batch estimation problem. In this paper we provide the critical extensions necessary to transform the existing batch algorithm into an extremely efficient incremental algorithm. In particular, we are able to vastly speed up the solution time through efficient variable reordering and incremental sparse updates, which we believe will greatly increase the practicality of Gaussian process methods for robot mapping and localization. Finally, we demonstrate the approach and its advantages on both synthetic and real datasets.
    BibTeX:
    @article{YanRSSWS15,
      author = "Xinyan Yan and Vadim Indelman and Byron Boots",
      title = "Incremental Sparse {GP} Regression for Continuous-time Trajectory Estimation \& Mapping",
      journal = "The Problem of Mobile Sensors: Setting future goals and indicators of progress for SLAM",
      year = "2015"
    }
    
    X. Yan, B. Xie, L. Song, & B. Boots. Large-Scale Gaussian Process Regression via Doubly Stochastic Gradient Descent. 2015 The ICML Workshop on Large-Scale Kernel Learning: Challenges and New Opportunities
    Abstract: Gaussian process regression (GPR) is a popular tool for nonlinear function approximation. Unfortunately, GPR can be difficult to use in practice due to the O(n^2) memory and O(n ^3) processing requirements for n training data points. We propose a novel approach to scaling up GPR to handle large datasets using the recent concept of doubly stochastic functional gradients. Our approach relies on the fact that GPR can be expressed as a convex optimization problem that can be solved by making two unbiased stochastic approximations to the functional gradient, one using random training points and another using random features, and then descending using this noisy functional gradient. The effectiveness of the resulting algorithm is evaluated on the wellknown problem of learning the inverse dynamics of a robot manipulator.
    BibTeX:
    @article{YanICMLWS15,
      author = "Xinyan Yan and Bo Xie and Le Song and Byron Boots",
      title = "Large-Scale Gaussian Process Regression via Doubly Stochastic Gradient Descent",
      journal = "The ICML Workshop on Large-Scale Kernel Learning",
      year = "2015"
    }
    
    X. Yan, V. Indelman, & B. Boots. Incremental Sparse Gaussian Process Regression for Continuous-time Trajectory Estimation & Mapping. 2014 NIPS Workshop on Autonomously Learning Robots
    Abstract: Recent work has investigated the problem of continuous-time trajectory estimation and mapping for mobile robots by formulating the problem as sparse Gaussian process regression. Gaussian processes provide a continuous-time representation of the robot trajectory, which elegantly handles asynchronous and sparse measurements, and allows the robot to query the trajectory to recover it’s estimated position at any time of interest. One of the major drawbacks of this approach is that Gaussian process regression formulates continuous-time trajectory estimation as a batch estimation problem. In this work, we provide the critical extensions necessary to transform this existing batch approach into an extremely efficient incremental approach. In particular, we are able to vastly speed up the solution time through efficient variable reordering and incremental sparse updates, which we believe will greatly increase the practicality of Gaussian process methods for robot mapping and localization. Finally, we demonstrate the approach and its advantages on both synthetic and real datasets.
    BibTeX:
    @article{BootsNIPSWS14,
      author = "Xinyan Yan and Vadim Indelman and Byron Boots",
      title = "Incremental Sparse GP Regression for Continuous-time Trajectory Estimation & Mapping",
      journal = "NIPS Workshop on Autonomously Learning Robots",
      year = "2014"
    }
    
    Z. Marinho, A. Dragan, A. Byravan, S. Srinivasa, G. Gordon, & B. Boots. Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces. 2014 NIPS Workshop on Autonomously Learning Robots
    Abstract: We introduce a functional gradient descent based trajectory optimization algorithm for robot motion planning in arbitrary Reproducing Kernel Hilbert Spaces (RKHSs). Functional gradient algorithms are a popular choice for motion planning in complex many degree-of-freedom robots. In theory, these algorithms work by directly optimizing continuous trajectories to avoid obstacles while maintaining smoothness. However, in practice, functional gradient algorithms commit to a finite parametrization of the trajectories, often as a finite set of waypoints. Such a parametrization limits expressiveness, and can fail to produce smooth trajectories despite the inclusion of smoothness in the objective. As a result, we often observe practical problems such as slow convergence and the requirement to choose an inconveniently small step size. Our work generalizes the waypoint parametrization to arbitrary RKHSs by formulating trajectory optimization as minimization of a cost functional. We derive a gradient update method that is able to take larger steps and achieve a locally minimum trajectory in just a few iterations. Depending on the selection of a kernel, we can directly optimize in spaces of continuous trajectories that are inherently smooth, and that have a low-dimensional, adaptively chosen parametrization. Our experiments illustrate the effectiveness of the planner for two different kernels, RBFs and B-splines, as compared to the standard discretized waypoint representation.
    BibTeX:
    @article{Marinho14,
      author = "Zita Marinho and Anca Dragan and Arun Byravan and
      Siddhartha Srinivasa and Geoffrey J. Gordon and Byron Boots",
      title = "Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces",
      journal = "NIPS Workshop on Autonomously Learning Robots",
      year = "2014"
    }
    
    A. Venkatraman,
    B. Boots, M. Hebert, & J. A. Bagnell.
    Data as Demonstrator with Applications to System Identification. 2014 NIPS Workshop on Autonomously Learning Robots
    Abstract: Machine learning techniques for system identification and time series modeling often phrase the problem as the optimization of a loss function over a single timestep prediction. However, in many applications, the learned model is recursively applied in order to make a multiple-step prediction, resulting in compounding prediction errors. We present DATA AS DEMONSTRATOR, an approach that reuses training data to make a no-regret learner robust to errors made during multistep prediction. We present results on the task of linear system identification applied to a simulated system and to a real-world dataset.
    BibTeX:
    @article{Venkatraman14,
      author = "Arun Venkatraman and Byron Boots and Martial Hebert and James Bagnell",
      title = "Data as Demonstrator with Applications to System Identification",
      journal = "NIPS Workshop on Autonomously Learning Robots",
      year = "2014"
    }
    
    A. Byravan, M. Montfort, B. Ziebart,
    B. Boots, & D. Fox.
    Layered Hybrid Inverse Optimal Control for Learning Robot Manipulation from Demonstration. 2014 NIPS Workshop on Autonomously Learning Robots
    Abstract: Inverse optimal control (IOC) is a powerful approach for learning robotic controllers from demonstration that estimates a cost function which rationalizes demonstrated control trajectories. Unfortunately, its applicability is difficult in settings where optimal control can only be solved approximately. Local IOC approaches rationalize demonstrated trajectories based on a linear-quadratic approximation around a good reference trajectory (i.e., the demonstrated trajectory itself). Without this same reference trajectory, however, dissimilar control results. We address the complementary problem of using IOC to find appropriate reference trajectories in these computationally challenging control tasks. After discussing the inherent difficulties of learning within this setting, we present a projection technique from the original trajectory space to a discrete and tractable trajectory space and perform IOC on this space. Control trajectories are projected back to the original space and locally optimized. We demonstrate the effectiveness of the approach with experiments conducted on a 7-degree of freedom robotic arm.
    BibTeX:
    @article{Byravan14,
      author = "Arunkumar Byravan and Matthew Montfort and Brian Ziebart and Byron Boots and Dieter Fox",
      title = "Layered Hybrid Inverse Optimal Control for Learning Robot Manipulation from Demonstration",
      journal = "NIPS Workshop on Autonomously Learning Robots",
      year = "2014"
    }
    
    B. Boots, A. Byravan, & D. Fox. Learning Predictive Models of a Depth Camera & Manipulator from Raw Execution Traces. 2013 NIPS Workshop on Advances in Machine Learning for Sensorimotor Control
    Abstract: We attack the problem of learning a predictive model of a depth camera and manipulator directly from raw execution traces. While the problem of learning manipulator models from visual and proprioceptive data has been addressed before, existing techniques often rely on assumptions about the structure of the robot or tracked features in observation space. We make no such assumptions. Instead, we formulate the problem as that of learning a high-dimensional controlled stochastic process. We leverage recent work on nonparametric predictive state representations to learn a generative model of the depth camera and robotic arm from sequences of uninterpreted actions and observations. We perform several experiments in which we demonstrate that our learned model can accurately predict future observations in response to sequences of motor commands.
    BibTeX:
    @article{Boots13NIPSa,
      author = "Byron Boots and Arunkumar Byravan and Dieter Fox",
      title = "Learning Predictive Models of a Depth Camera & Manipulator from Raw Execution Traces",
      journal = "NIPS Workshop on Advances in Machine Learning for Sensorimotor Control",
      year = "2013"
    }
    
    B. Boots & D. Fox. Learning Dynamic Policies from Demonstration. 2013 NIPS Workshop on Advances in Machine Learning for Sensorimotor Control
    Abstract: We address the problem of learning a policy directly from expert demonstrations. Typically, this problem is solved with a supervised learning approach such as regression or classification resulting in a reactive policy. Unfortunately, reactive policies cannot model long-range dependancies, and this omission can result in suboptimal performance. We take a different approach. We observe that policies and dynamical systems are mathematical duals, and we use this fact to leverage the rich literature on system identification to learn policies from demonstration. System identification algorithms often have desirable properties like the ability to model long-range dependancies, statistical consistency, and efficient off-the-shelf implementations. We show that by employing system identification to learning from demonstration problems, all of these properties can be carried over to the learning from demonstration domain resulting in improved practical performance.
    BibTeX:
    @article{Boots13NIPSb,
      author = "Byron Boots and Dieter Fox",
      title = "Learning Dynamic Policies from Demonstration",
      journal = "NIPS Workshop on Advances in Machine Learning for Sensorimotor Control",
      year = "2013"
    }
    
    B. Boots, A. Gretton, &
    G. J. Gordon.
    Hilbert Space Embeddings of PSRs. 2013 ICML Workshop on Machine Learning and System Identification
    Abstract: We fully generalize PSRs to continuous observations and actions using a recent concept called Hilbert space embeddings of distributions. The essence of our method is to represent distributions of tests, histories, observations, and actions, as points in (possibly) infinite-dimensional Reproducing Kernel Hilbert Spaces (RKHSs). During filtering we update these distributions using a kernel version of Bayes' rule. To improve computational tractability, we develop a spectral system identification method to learn a succinct parameterization of the target system.
    BibTeX:
    @article{Boots13ICML,
      author = "Byron Boots, Arthur Gretton, and Geoffrey J. Gordon",
      title = "Hilbert Space Embeddings of PSRs",
      journal = "ICML workshop on Machine Learning and System Identification (MLSYSID)",
      year = "2013"
    }
    
    B. Boots, A. Gretton, &
    G. J. Gordon.
    Hilbert Space Embeddings of PSRs. 2012 NIPS Workshop on Spectral Algorithms for Latent Variable Models
    Abstract: We fully generalize PSRs to continuous observations and actions using a recent concept called Hilbert space embeddings of distributions. The essence of our method is to represent distributions of tests, histories, observations, and actions, as points in (possibly) infinite-dimensional Reproducing Kernel Hilbert Spaces (RKHSs). During filtering we update these distributions using a kernel version of Bayes' rule. To improve computational tractability, we develop a spectral system identification method to learn a succinct parameterization of the target system.
    BibTeX:
    @article{Boots12NIPSb,
      author = "Byron Boots, Arthur Gretton, and Geoffrey J. Gordon",
      title = "Hilbert Space Embeddings of PSRs",
      journal = "NIPS Workshop on Spectral Algorithms for Latent Variable Models",
      year = "2012"
    }
    
    B. Boots, & G. J. Gordon. A Spectral Learning Approach to Range-Only SLAM. 2012 NIPS Workshop on Spectral Algorithms for Latent Variable Models
    Abstract: We present a novel spectral learning algorithm for simultaneous localization and mapping (SLAM) from range data with known correspondences. This algorithm is an instance of a general spectral system identification framework, from which it inherits several desirable properties, including statistical consistency and no local optima. Compared with popular batch optimization or multiple-hypothesis tracking methods for range-only SLAM, our spectral approach offers guaranteed low computational requirements and good tracking performance.
    BibTeX:
    @article{Boots12NIPSa,
      author = "Byron Boots and Geoffrey J. Gordon",
      title = "A Spectral Learning Approach to Range-Only SLAM",
      journal = "NIPS Workshop on Spectral Algorithms for Latent Variable Models",
      year = "2012"
    }
    
    B. Boots & G. J. Gordon. Online Spectral Identification of Dynamical Systems. 2011 NIPS Workshop on Sparse Representation and Low-rank Approximation
    Abstract: Recently, a number of researchers have proposed spectral algorithms for learning models of dynamical systems---for example, Hidden Markov Models (HMMs), Partially Observable Markov Decision Processes (POMDPs), and Transformed Predictive State Representations (TPSRs). These algorithms are attractive since they are statistically consistent and not subject to local optima. However, they are batch methods: they need to store their entire training data set in memory at once and operate on it as a large matrix, and so they cannot scale to extremely large data sets (either many examples or many features per example). In turn, this restriction limits their ability to learn accurate models of complex systems. To overcome these limitations, we propose a new online spectral algorithm, which uses tricks such as incremental SVD updates and random projections to scale to much larger data sets and more complex systems than previous methods. We demonstrate the new method on a high-bandwidth video mapping task, and illustrate desirable behaviors such as "closing the loop,'' where the latent state representation changes suddenly as the learner recognizes that it has returned to a previously known place.
    BibTeX:
    @inproceedings{Boots11nips,
      author = "Byron Boots and Geoffrey J. Gordon",
      title = "Online Spectral Identification of Dynamical Systems",
      journal = "NIPS Workshop on Sparse Representation and Low-rank Approximation",
      year = "2011"
    }
    B. Boots, S. M. Siddiqi & G. J. Gordon. Closing the Learning-Planning Loop with Predictive State. Representations. 2009 NIPS Workshop on Probabilistic Approaches for Robotics and Control
    Abstract: We propose a principled and provably statistically consistent model-learning algorithm, and demonstrate positive results on a challenging high-dimensional problem with continuous observations. In particular, we propose a novel, consistent spectral algorithm for learning a variant of PSRs called Transformed PSRs (TPSRs) directly from execution traces.
    BibTeX:
    @article{Boots09nips,
      author = "Byron Boots and Sajid M. Siddiqi and Geoffrey J. Gordon",
      title = "Closing the Learning-Planning Loop with Predictive State Representations",
      journal = "NIPS Workshop on Probabilistic Approaches for Robotics and Control",
      year = "2009"
    }
    
    D. Purves, & B. Boots Evolution of Visually Guided Behavior in Artificial Agents. 2006 Vision Science Society Annual Meeting/Journal of Vision
    Abstract: Recent work on brightness, color and form has suggested that human visual percepts represent the probable sources of retinal images rather than stimulus features as such. We have investigated this empirical concept of vision by asking whether agents using neural network control systems evolve successful visually guided behavior based solely on the statistical relationship of images on their sensor arrays and the probable sources of the images in a simulated environment. A virtual environment was created with OpenGL consisting of an arena with a central obstacle, similar to arenas used in evolutionary robotics experiments. The neural control system for each agent comprised a single-layer, feed-forward network that connected all 256 inputs from a sensor array to two output nodes that encoded rotation and translation responses. Each agent's behavioral actions in the environment were evaluated, and the fittest individuals selected to produce a new population according to a standard genetic algorithm. This process was repeated until the average fitness of subsequent generations reached a plateau. Analysis of the actions of evolved agents in response to visual input showed their neural network control systems had incorporated the statistical relationship between projected images and their possible sources, and that this information was used to produce increasingly successful visually guided behavior. The simplicity of this paradigm notwithstanding, these results support the idea that biological vision has evolved to solve the inverse problem on a wholly empirical basis, and provide a novel way of exploring visual processing.
    BibTeX:
    @article{Purvesn2006,
      author = "Dale Purves and Byron Boots",
      title = "Evolution of Visually Guided Behavior in Artificial Agents",
      journal = "Journal of Vision",
      year = {2006},
      volume = {6(6)},
      pages = {356a}
    }
    

    Non-Refereed Preprints, Technical Reports, & Book Chapters
    Authors Title Year Tech. Report/Book
    J. Lee, C. Cheng, K. Goldberg, & B. Boots. Continuous Online Learning and New Insights into Online Imitation Learning. 2019 Technical Report arXiv:1912.01261
    Abstract: Online learning is a powerful tool for analyzing iterative algorithms. However, the classic adversarial setup sometimes fails to capture certain regularity in online problems in practice. Motivated by this, we establish a new setup, called Continuous Online Learning (COL), where the gradient of online loss function changes continuously across rounds with respect to the learner's decisions. We show that COL covers and more appropriately describes many interesting applications, from general equilibrium problems (EPs) to optimization in episodic MDPs. Using this new setup, we revisit the difficulty of achieving sublinear dynamic regret. We prove that there is a fundamental equivalence between achieving sublinear dynamic regret in COL and solving certain EPs, and we present a reduction from dynamic regret to both static regret and convergence rate of the associated EP. At the end, we specialize these new insights into online imitation learning and show improved understanding of its learning stability.
    BibTeX:
    @article{Lee19arxiv,
                  author    = {Jonathan Lee and Ching-An Cheng and Ken Goldberg and Byron Boots},
      title     = {Continuous Online Learning and New Insights to Online Imitation Learning},
      journal   = {CoRR},
      volume    = {abs/1912.01261},
      year      = {2019},
      url       = {http://arxiv.org/abs1912.01261}
    }
    
    S. Adhikary, S. Srinivasan, & B. Boots. Learning Quantum Graphical Models using Constrained Gradient Descent on the Stiefel Manifold. 2019 Technical Report arXiv:1903.03730
    Abstract: Quantum graphical models (QGMs) extend the classical framework for reasoning about uncertainty by incorporating the quantum mechanical view of probability. Prior work on QGMs has focused on hidden quantum Markov models (HQMMs), which can be formulated using quantum analogues of the sum rule and Bayes rule used in classical graphical models. Despite the focus on developing the QGM framework, there has been little progress in learning these models from data. The existing state-of-the-art approach randomly initializes parameters and iteratively finds unitary transformations that increase the likelihood of the data. While this algorithm demonstrated theoretical strengths of HQMMs over HMMs, it is slow and can only handle a small number of hidden states. In this paper, we tackle the learning problem by solving a constrained optimization problem on the Stiefel manifold using a well-known retraction-based algorithm. We demonstrate that this approach is not only faster and yields better solutions on several datasets, but also scales to larger models that were prohibitively slow to train via the earlier method.
    BibTeX:
    @article{Adhikary19arxiv,
                  author    = {Sandesh Adhikary and Siddarth Srinivasan and Byron Boots},
      title     = {Learning Quantum Graphical Models using Constrained Gradient Descent on the Stiefel Manifold},
      journal   = {CoRR},
      volume    = {abs/1903.03730},
      year      = {2019},
      url       = {http://arxiv.org/abs/1903.03730}
    }
    
    K. Ahlin, B. Bazemore, B. Boots, J. Burnham, F. Dellaert, J. Dong, A. Hu, B. Joffe, G. McMurray, G. Rains, & N. Sadegh. Robotics for Spatially and Temporally Unstructured Agricultural Environments. 2017 Robotics and Mechatronics for Agriculture
    Abstract: The farming industry faces many problems that threaten its sustainability. Among the most important are the detection, prevention, and control of devastating plant pests and diseases. Management of pests and diseases (in addition to water and nutrients) is based on scouting a field weekly. Farmers can spend between 10 to 40 US dollars per acre to have their fields inspected by a human field scout. Depending on the crop, detection of even a single insect can trigger an intensive pesticide spraying program. On the other hand, non-comprehensive scouting may miss populations of pests that sometimes congregate in localized areas or a disease that is asymptomatic until it has spread to many areas of the field. As a consequence of this inefficient and sometimes inaccurate method, farmers spray preventatively for many plant pathogens. If more extensive and efficient quantification of pest control, water stress, and nutrient needs were possible, a tremendous cost savings could be acheived by a decrease of unnecessary spraying. Currently, the only solution to potential pest problems is to spray at the first sight of pest and treat the entire field. This leads to an overuse of pesticide which is costly and environmentally unfriendly. Hoever, the risk of losing crops to pests is too high not to take necessary precautions. The development of an automated field scout (AFS) would make it possible to determine the spatial and temporal distribution of pests and diseases, nutrient dificiencies, and water stress in the field. The AFS would help the farmer assess management strategies after they are implemented and help determine the best management practices. This work describes a collaborative project among the Georgia Tech Research Institute, the Georgia Institute of Technology, and the University of Georgia in developing and fielding an AFS system composed of four main components: an autonomous ground vehicle, a vehicle-mounted 4-dimensional (4D) mapping system, a vehicle mounted robot arm used for leaf and soil sampling, and a farmer/consultant who will interact with the AFS system to meet the needs of each particular farm. The project focues on peanuts, though the developed AFS coiuld be adapted for any crop that requires intensive management.
    BibTeX:
    @incollection{Ahlin2017,
      author = "Konrad Ahlin and Brad Bazemore and Byron Boots
      and John Burnham and Frank Dellaert and Jing Dong
      and Ai-Ping Hu and Benjamin Joffe and Gary McMurray
      and Glen Rains and Nader Sadegh",
      title = "Robotics for Spatially and Temporally Unstructured Agricultural Environments",
      booktitle = "Robotics and Mechatronics for Agriculture",
      editor = "Dan Zhang and Bin Wei",
      publisher = {CRC Press},
      year = {2017},
      chapter = {3}
    }
    J. Dong, B. Boots, & F. Dellaert. Sparse Gaussian Processes for Continuous-Time Trajectory Estimation on Matrix Lie Groups. 2017 Technical Report arXiv:1705.06020
    Abstract: Continuous-time trajectory representations are a powerful tool that can be used to address several issues in practical simultaneous localization and mapping (SLAM) scenarios, including continuously collected measurements distorted by robot motion and asynchronous sensor measurements. Sparse Gaussian processes (GP) allow for a probabilistic non-parametric trajectory representation that enables fast trajectory estimation by sparse GP regression. However, previous approaches are limited to dealing with vector space representations of state only. In this technical report we extend the work by Barfoot et al. to general matrix Lie groups, by applying constant-velocity prior, and defining locally linear GP. This enables using a sparse GP approach in a large space of practical SLAM settings. In this report we provide only the theory and leave experimental evaluation to future publications.
    BibTeX:
    @article{Dong17arxiv,
                  author    = {Jing Dong and
                   Byron Boots and
                   Frank Dellaert},
      title     = {Sparse Gaussian Processes for Continuous-Time Trajectory Estimation on Matrix Lie Groups},
      journal   = {CoRR},
      volume    = {abs/1705.06020},
      year      = {2017},
      url       = {http://arxiv.org/abs/1705.06020}
    }
    
    Y. Pan, X. Yan, E. Theodorou, & B. Boots Adaptive Probabilistic Trajectory Optimization via Efficient Approximate Inference. 2016 Technical Report arXiv:1608.06235
    Abstract: Robotic systems must be able to quickly and robustly make decisions when operating in uncertain and dynamic environments. While Reinforcement Learning (RL) can be used to compute optimal policies with little prior knowledge about the environment, it suffers from slow convergence. An alternative approach is Model Predictive Control (MPC), which optimizes policies quickly, but also requires accurate models of the system dynamics and environment. In this paper we propose a new approach, adaptive probabilistic trajectory optimization, that combines the benefits of RL and MPC. Our method uses scalable approximate inference to learn and updates probabilistic models in an online incremental fashion while also computing optimal control policies via successive local approximations. We present two variations of our algorithm based on the Sparse Spectrum Gaussian Process (SSGP) model, and we test our algorithm on three learning tasks, demonstrating the effectiveness and efficiency of our approach.
    BibTeX:
    @article{Pan16arxiv,
      author    = {Yunpeng Pan and Xinyan Yan and Evangelos Theodorou and Byron Boots},
      title     = {Adaptive Probabilistic Trajectory Optimization via Efficient Approximate Inference},
      journal   = {CoRR},
      volume    = {abs/1608.06235},
      year      = {2016},
      url       = {http://arxiv.org/abs/1608.06235}
    }
    
    S. M. Siddiqi, B. Boots & G. J. Gordon. A Constraint Generation Approach to Learning Stable Linear Dynamical Systems. 2008 Technical Report
    CMU-ML-08-101  
    Abstract: Stability is a desirable characteristic for linear dynamical systems, but it is often ignored by algorithms that learn these systems from data. We propose a novel method for learning stable linear dynamical systems: we formulate an approximation of the problem as a convex program, start with a solution to a relaxed version of the program, and incrementally add constraints to improve stability. Rather than continuing to generate constraints until we reach a feasible solution, we test stability at each step; because the convex program is only an approximation of the desired problem, this early stopping rule can yield a higher-quality solution. We apply our algorithm to the task of learning dynamic textures from image sequences as well as to modeling biosurveillance drug-sales data. The constraint generation approach leads to noticeable improvement in the quality of simulated sequences. We compare our method to those of Lacy and Bernstein, with positive results in terms of accuracy, quality of simulated sequences, and efficiency.
    BibTeX:
    @techreport{Siddiqi08,
      author = "Sajid Siddiqi and Byron Boots and Geoffrey J. Gordon",
      title = "A Constraint Generation Approach to Learning Stable Linear Dynamical Systems",
      institution = {Carnegie Mellon University},
      number = {CMU-ML-08-101},
      year = {2008}
    }
    
    E. Chown & B. Boots Learning in Cognitive Maps: Finding Useful Structure in an Uncertain World. 2008 Robot and Cognitive Approaches to Spatial Mapping  
    Abstract: In this chapter we will describe the central mechanisms that influence how people learn about large-scale space. We will focus particularly on how these mechanisms enable people to effectively cope with both the uncertainty inherent in a constantly changing world and also with the high information content of natural environments. The major lessons are that humans get by with a “less is more” approach to building structure, and that they are able to quickly adapt to environmental changes thanks to a range of general purpose mechanisms. By looking at abstract principles, instead of concrete implementation details, it is shown that the study of human learning can provide valuable lessons for robotics. Finally, these issues are discussed in the context of an implementation on a mobile robot.
    BibTeX:
    @incollection{Chown2008,
      author = "Eric Chown and Byron Boots",
      title = "Learning Cognitive Maps: Finding Useful Structure in an Uncertain World",
      booktitle = "Robot and Cognitive Approaches to Spatial Mapping",
      editor = "Margaret E. Jefferies and Wai-Kiang Yeap",
      publisher = {Springer Verlag},
      year = {2008},
      chapter = {10},  
      pages ={215--236}
    }
    

    Theses
    Author Title Year Degree & Institution
    B. Boots Spectral Approaches to Learning Predictive Representations. 2012 Doctoral Thesis
    Carnegie Mellon University  
    Abstract: A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time consuming process. This motivates an alternative approach: learning a model directly from observations. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or, they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency. To address this gap, we propose spectral subspace identification algorithms which provably learn compact, accurate, predictive models of partially observable dynamical systems directly from sequences of action-observation pairs. Our research agenda includes several variations of this general approach: spectral methods for classical models like Kalman filters and hidden Markov models, batch algorithms and online algorithms, and kernel-based algorithms for learning models in high- and infinite-dimensional feature spaces. All of these approaches share a common framework: the model's belief space is represented as predictions of observable quantities and spectral algorithms are applied to learn the model parameters. Unlike the popular EM algorithm, spectral learning algorithms are statistically consistent, computationally efficient, and easy to implement using established matrix-algebra techniques. We evaluate our learning algorithms on a series of prediction and planning tasks involving simulated data and real robotic systems.
    BibTeX:
    @phdthesis{BootsThesis2012,
      author = "Byron Boots",
      title = "Spectral Approaches to Learning Predictive Representations",
      school = "Carnegie Mellon University",
      year = {2012},
      month = {December},
    }
    
    B. Boots Spectral Approaches to Learning Predictive Representations: Thesis Proposal 2011 Thesis Proposal
    Carnegie Mellon University  
    Abstract: A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment.
    To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time consuming process. This motivates an alternative approach: learning a model directly from observations. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or, they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency.

    To address this gap, we propose spectral subspace identification algorithms which provably learn compact, accurate, predictive models of partially observable dynamical systems directly from sequences of action-observation pairs. Our research agenda includes several
    variations of this general approach: batch algorithms and online algorithms, kernel-based algorithms for learning models in high- and infinite-dimensional feature spaces, and manifold-based identification algorithms. All of these approaches share a common framework: they are statistically consistent, computationally efficient, and easy to implement using established matrix-algebra techniques. Additionally, we show that our framework generalizes a variety of successful spectral learning algorithms in diverse areas, including the identification of Hidden Markov Models, recovering structure from motion, and discovering manifold embeddings. We will evaluate our learning algorithms on a series of prediction and planning tasks involving simulated data and real robotic systems.

    We anticipate several difficulties while moving from smaller problems and synthetic problems to larger practical applications. The first is the challenge of scaling learning algorithms up to the higher-dimensional state spaces that more complex tasks require. The second is the problem of integrating expert knowledge into the learning procedure. The third is the problem of properly accounting for actions and exploration in controlled systems. We believe that overcoming these remaining difficulties will allow our models to capture the essential features of an environment, predict future observations well, and enable successful planning.
    B. Boots Learning Stable Linear Dynamical Systems. 2009 M.S. Machine Learning
    Carnegie Mellon University  
    Abstract: Stability is a desirable characteristic for linear dynamical systems, but it is often ignored by algorithms that learn these systems from data. We propose a novel method for learning stable linear dynamical systems: we formulate an approximation of the problem as a convex program, start with a solution to a relaxed version of the program, and incrementally add constraints to improve stability. Rather than continuing to generate constraints until we reach a feasible solution, we test stability at each step; because the convex program is only an approximation of the desired problem, this early stopping rule can yield a higher-quality solution. We employ both maximum likelihood and subspace ID methods to the problem of learning dynamical systems with exogenous inputs directly from data. Our algorithm is applied to a variety of problems including the tasks of learning dynamic textures from image sequences, learning a model of laser and vision sensor data from a mobile robot, learning stable baseline models for drug-sales data in the biosurveillance domain, and learning a model to predict sunspot data over time. We compare the constraint generation approach to learning stable dynamical systems to the best previous stable algorithms (Lacy and Bernstein, 2002, 2003), with positive results in terms of prediction accuracy, quality of simulated sequences, and computational efficiency.
    BibTeX:
    @MastersThesis{Boots:Thesis:2009,
      author = "Byron Boots",
      title = "Learning Stable Linear Dynamical Systems",
      school = "Carnegie Mellon University",
      year = {2009},
      month = {May},
    }
    
    B. Boots Robot Localization and Abstract Mapping in Dynamic Environments. 2003 B.A. Computer Science
    Bowdoin College  
    Abstract: In this chapter we will describe the central mechanisms that influence how people learn about large-scale space. We will focus particularly on how these mechanisms enable people to effectively cope with both the uncertainty inherent in a constantly changing world and also with the high information content of natural environments. The major lessons are that humans get by with a “less is more” approach to building structure, and that they are able to quickly adapt to environmental changes thanks to a range of general purpose mechanisms. By looking at abstract principles, instead of concrete implementation details, it is shown that the study of human learning can provide valuable lessons for robotics. Finally, these issues are discussed in the context of an implementation on a mobile robot.
    BibTeX:
    techreport{Boots2003b,
      author = "Byron Boots",
      title = "Robot Localization and Abstract Mapping in Dynamic Environments",
      institution = {Bowdoin College},
      year = {2003}
    }
    
    B. Boots Chunking: A Modified Dynamic Programming Approach to Solving Stochastic Satisfiability Problems Efficiently. 2003 B.A. Computer Science
    Bowdoin College  
    Abstract: The best general stochastic satisfiability solvers systematically evaluate all possible variable assignments, using heuristics to prune assignments whenever possible. The idea of chunking differs in that it breaks the solution to stochastic satisfiability problems into pieces, amounting to a modified dynamic programming approach. The benefit of this approach, as compared with the straightforward application of dynamic programming, is that the saved solutions to the problem pieces are partial solutions and thus reusable in multiple situations.
    BibTeX:
    techreport{Boots2003a,
      author = "Byron Boots",
      title = "Chunking: A Modified Dynamic Programming Approach to 
      Solving Stochastic Satisfiability Problems Efficiently",
      institution = {Bowdoin College},
      year = {2003}
    }
    

    Academic Honors & Awards
     
    • 2022 Amazon Professor of Machine Learning, University of Washington
    • 2022 DARPA Young Faculty Award
    • 2021 Best Systems Paper Award, Finalist, Conference on Robot Learning (CoRL)
    • 2021 Best Paper Award, RSS Workshop on Geometry and Topology in Robotics
    • 2020 Early Career Award, Robotics: Science and Systems (RSS)
    • 2019 Best Paper Award, NeurIPS Workshop on Optimization Foundations for Reinforcement Learning
    • 2019 Outstanding Junior Faculty Research Award, College of Computing, Georgia Tech
    • 2019 Paper of the Year Award (for 2018), International Journal of Robotics Research (IJRR)
    • 2019 Best Student Paper Award, Robotics: Science and Systems (RSS)
    • 2019 Best Systems Paper Award, Finalist, Robotics: Science and Systems (RSS)
    • 2019 Best Manipulation Paper Award, Finalist, International Conference on Robotics and Automation (ICRA)
    • 2019 Amazon Research Award
    • 2018 NSF CAREER Award
    • 2018 Best Systems Paper Award, Finalist, Robotics: Science and Systems (RSS)
    • 2018 Best Paper Award, International Conference on Artificial Intelligence and Statistics (AISTATS)
    • 2017 Best Paper Award, Finalist, International Conference on Robotics and Automation (ICRA)
    • 2015 Best Paper Award, RSS Workshop on the Problem of Mobile Sensors
    • 2015 "Thank-a-Teacher" Award, Center for the Enhancement of Teaching & Learning, Georgia Tech
    • 2015 NSF CRII Award
    • 2010 Best Paper Award, International Conference on Machine Learning (ICML)
    • 2006-2008 NSF Graduate Research Fellowship, Honorable Mention, won three years in a row
    • 2003 Computer Science Senior Year Prize, Bowdoin College
    • 2003 High Honors, Bowdoin College
    • 2002 Undergraduate Research Fellowship, Surdna Foundation
    [Return to top]