Self-Supervised Terrain Representation Learning from Unconstrained Robot Experience

Haresh Karnan¹, Elvin Yang¹, Daniel Farkash², Garrett Warnell^1,3, Joydeep Biswas¹, Peter Stone^1,4

¹ University of Texas at Austin, ²Cornell University ³Army Research Laboratory ⁴Sony AI

Paper Video CoRL 2023 Spotlight Talk Code Trail Deployment

sterling is a self-supervised terrain representation learning algorithm that learns from unconstrained multi-modal robot experience

Abstract

During off-road navigation, the ability to identify and distinguish terrains, known as terrain-awareness is crucial for autonomous mobile robots. Current approaches that provide robots with this awareness either rely on labeled data which is expensive to collect, engineered features and cost functions that may not generalize, or expert human demonstrations which may not be available. Towards endowing robots with terrain awareness without these limitations, we introduce sterling, a novel approach for learning terrain representations that relies solely on easy-to-collect, unconstrained (e.g., non-expert), and unlabeled robot experience, with no additional constraints on data collection.

Learning from Unconstrained Robot Experience

sterling learns terrain representations from unconstrained, unlabeled robot experiences collected using any navigation policy. Compared to requiring a human expert to provide teleoperated demonstrations and labels, collecting this type of robot experience is cheap and easy, thereby providing a scalable pathway to data collection and system improvement.

Offline Pre-Processing

sterling pre-processes robot experience offline, obtaining both visual and non-visual observations of traversed terrains. Visual patches of the terrain at a particular location are extracted from multiple past viewpoints and are paired with inertial, proprioceptive, and tactile observations at the same location.

Non-Contrastive Self-Supervised Representation Learning

sterling performs non-contrastive representation learning based on the VICReg framework and employs two novel self-supervision objectives, namely viewpoint-invariance and multimodal-correlation objectives to learn relevant terrain representations in a self-supervised way with no expert human annotations or demonstrations required.

Qualitative Study - Trail Deployment

We deployed sterling on a 3-mile off-road trail in Austin, TX to qualitatively evaluate it on the task of preference-aligned navigation. Trained with only a few minutes of robot experience, sterling features enable the robot to successfully complete the trail with minimal human intervention.

Quantitative Study

We evaluate sterling in 6 different outdoor environments within the UT Austin campus with 8 different terrains, against other SOTA baseline methods. We use the success-rate metric to quantify operator preference-alignment of all methods.

Trajectories traced by different approaches in 5 environments containing 8 different terrains. The operator preferences are shown above. We see that sterling navigates in an operator-preference aligned manner, by preferring cement sidewalk, red bricks, pebble sidewalk, and yellow bricks over mulch, grass, marble rocks, and bush, outperforming other baselines and performing on-par with the Fully-Supervised approach.

Segmentation-based Baseline

Segmentation-based approach do not scale well to all terrains seen in the real world, and need additional human annotations to be collected for each new terrain.

STERLING

STERLING learns representations in a self-supervised manner and hence can easily adapt to terrains in the real world, enabling preference-aligned navigation.

Acknowledgement

This work has taken place in the Learning Agents Research Group (LARG) and Autonomous Mobile Robotics Laboratory (AMRL) at UT Austin. LARG research is supported in part by NSF (CPS-1739964, IIS-1724157, NRI-1925082), ONR (N00014-18-2243), FLI (RFP2-000), ARO (W911NF19-2-0333), DARPA, Lockheed Martin, GM, and Bosch. AMRL research is supported in part by NSF (CAREER2046955, IIS-1954778, SHF-2006404), ARO (W911NF-19-2- 0333, W911NF-21-20217), DARPA (HR001120C0031), Amazon, JP Morgan, and Northrop Grumman Mission Systems. Peter Stone serves as the Executive Director of Sony AI America and receives financial compensation for this work. The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research.

BibTeX

@inproceedings{karnan2023sterling,
      title={STERLING: Self-Supervised Terrain Representation Learning from Unconstrained Robot Experience}, 
      author={Haresh Karnan and Elvin Yang and Daniel Farkash and Garrett Warnell and Joydeep Biswas and Peter Stone},
      year={2023},
      booktitle={7th Annual Conference on Robot Learning},
      url={https://openreview.net/forum?id=VLihM67Wdi6}
}