robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine,...

73
Sermanet @ OpenAI Symposium 2019 Pierre Sermanet In collaboration with Corey Lynch, Debidaa Dwibedi, Soeren Pirk, Jonathan Tompson, Mohi Khansari, Yusuf Aytar, Yevgen Chebotar, Yunfei Bai, Jasmine Hsu, Eric Jang, Vikash Kumar, Ted Xiao, Stefan Schaal, Andrew Zisserman, Sergey Levine Self-Supervision and Play Robotics at Google hp://g.co/robotics

Transcript of robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine,...

Page 1: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Pierre Sermanet

In collaboration withCorey Lynch, Debidatta Dwibedi, Soeren Pirk,Jonathan Tompson, Mohi Khansari,Yusuf Aytar, Yevgen Chebotar, Yunfei Bai,Jasmine Hsu, Eric Jang, Vikash Kumar, Ted Xiao,Stefan Schaal, Andrew Zisserman, Sergey Levine

Self-Supervision and Play

Robotics at Googlehttp://g.co/robotics

Page 2: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Main Message

● Real-world robotics cannot rely on labels and rewards

● Instead, mostly

○ Self-supervise on unlabeled data

○ Use play data

● We present ways to do this for vision and control

Page 3: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Our mission:Self-Supervised Robots

i.e.: Autonomously extract learning signals from the world from play and from others

because: “Give a robot a label and you feed it for a second; Teach a robot to label and you feed it for a lifetime.”

Page 4: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Supervision Types Supervised

1. Hand pose2. Cups pose3. Liquid amounts4. Liquid flowing5. Liquid color

Discrete & human-definedEmbedding

Signal:

Self-Supervised Signal:

● Depth● Time● Multi-view● Tactile feedback

Continuous embedding

Unsupervised

Signal:● Auto-encoding● Sparsity prior● Gaussian prior

Continuous embedding

Page 5: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Supervision Costs

Type of Supervision Description Cost

Playing (Intrinsic Motivation) Alone or with others Free

Play data (Tele-op) Very cheap

Imitation other agents “playing” for hours, not segmented, not labeled

Cheap(but not unlimited)

Demonstrations Staged, segmented and labeled Expensive

Labeled Frames e.g. action and object classes / attributes

Very Expensive

Proportions in which supervision is ok to use (ideally mostly rely on green and blue).

Page 6: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Why Self-Supervise?

● Be versatile and robust to different hardware & environments:

○ Robot-agnostic and self-calibrating

○ Agnostic to sim or real, train the same way

● Scaling up in the real world

● Can’t afford human supervision given the high dimensionality

of the problem

○ Labeling is not easy to define even for humans

● Rich representations can be discovered through

self-supervision and lead to higher sample efficiency in RL

Page 7: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Why play?

● Self-Supervision enables using play data● Cheap● General● Rich

Page 8: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

labelfree

Page 9: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Self-Supervision and Playfor Vision

Page 10: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Self-Supervised Visual Representations

● Time-Contrastive Networks (TCN)● Temporal Cycle-Consistency (TCC)● Object-Contrastive Networks (OCN)

disentangled / invariantstates and attributes

Imitation

Progressionof task

Page 11: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Imitation

Progressionof task

Play

● Detect errors● Correct● Retry● Transition● General skills● Data augmentation

Page 12: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Time-Contrastive Networks (TCN)[ Sermanet*, Lynch*, Chebotar*, Hsu, Jang, Schaal, Levine @ ICRA 2018 ] [ sermanet.github.io/imitate ]

Page 13: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Single-view TCN

Page 14: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Semantic Alignment with TCN

Observation multi-view TCN

Page 15: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Robotic Imitation:Step 1. Self-Supervise on Play data

TCN

embedding

Page 16: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Robotic Imitation:Step 2. Follow abstract trajectory

Page 17: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Actionable Representations[ Dwibedi, Tompson, Lynch, Sermanet @ IROS 2018 ] [ sites.google.com/view/actionablerepresentations ]

Page 18: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Cheetah Environment

Agent observes another agent demonstrating an action

View 1 View 2

Page 19: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Qualitative Results: Cheetah

PPO ontrue state

PPO onlearned visual

representations

Input to PPO Cumulative Reward(Avg of 100 runs)

Random State 28.31

True State 390.16

Raw Pixels 146.14

mfTCN 360.50

Page 20: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Temporal Cycle-Consistency (TCC)[ Dwibedi, Aytar, Tompson, Sermanet, Zisserman @ CVPR 2019 ] [ temporal-cycle-consistency.github.io ]

Page 21: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Temporal Cycle-Consistency (TCC)[ Dwibedi, Aytar, Tompson, Sermanet, Zisserman @ CVPR 2019 ] [ temporal-cycle-consistency.github.io ]

Page 22: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Temporal Cycle-Consistency

cycle consistency error

nearestneighbors

cycleconsistent

not cycleconsistent

embedding space

...

...

video 1

video 2

Page 23: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Page 24: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Action Phase Classification

Phase classification results when fine-tuning ImageNet pre-trained ResNet-50.

Page 25: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Self-Supervised Alignment

Page 26: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Object-Contrastive Networks (OCN)[ Pirk, Khansari, Bai, Lynch, Sermanet @ under review ]

Self-teaching about any objectleads to high robustness, allowing deployment

Page 27: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Robotic Data Collection

Page 28: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Play data for Objects

Page 29: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Object-Contrastive Networks (OCN)[ Pirk, Khansari, Bai, Lynch, Sermanet @ under review ]

repulsion

noisy repulsion

robo

tic d

ata

colle

ctio

n

attraction

Obj

ectn

ess

attraction

negativeanchor positive

OCN embedding

deep network

metric lossattraction repulsion

Page 30: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Recovering Continuous AttributesQ

uery

Imag

es

(Same instance removed)

Page 31: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Online Object Understanding

● Offline average error: 54%● Online average error: 17% -> 3%● Do not define states and attributes

Offline ResNet Online self-supervision OCN

Page 32: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Online Adaptation

Page 33: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Online Adaptation

Page 34: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Self-Supervision and Playfor Control

Page 35: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Pose Imitation with TCN[ Sermanet*, Lynch*, Chebotar*, Hsu, Jang, Schaal, Levine @ ICRA 2018 ] [ sermanet.github.io/imitate ]

Page 36: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Play Data in Pose Space

Human Play Robot Scripting Human imitating Robot

Page 37: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Pose Imitation with Play Data

Human Play Robot Script Robot Script

Page 38: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

● Self-Supervision + Play recipe

● No explicit task definition.

Pose Imitation

Page 39: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Learning from Play (LfP)[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ]

● No tasks

● No rewards or RL

● Multiple tasks in zero-shot

● 85% on 18 tasks

● Self-Supervision + Play recipe

Page 40: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Continuum of skills

Page 41: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

How can we cover the continuum?

Scripted collection + RL

Exploration

Scripting exploration

reward sensorfor door opening

Reward engineeringDistributed training

Reset

Page 42: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Tasks are not discrete

“Grasp fast?” “Nudge slow?” “Nudge + grasp?”

Slide “full”? Slide “partial”? Boundaries between multiple tasks?

Page 43: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

How can we cover the continuum?

Learning from Demonstration (LfD)

Kinesthetic [Kober and Peters, 2011] Tele-op, segmented expert demonstrations

Page 44: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Learning from Play (LfP)

How can we cover the continuum?

Learning from Demonstration (LfD)

Scripted collection + RL

Page 45: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Play data for trainingcollected from human tele-operation

(2.5x speedup)

Page 46: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

How do we learn control from play?

actions

current goal

goal-conditioned policy

Page 47: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Play covers the continuum

Page 48: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Goal relabeling

t=0

...

t=1 t=Nt=2 t=3

...600 unique sequences, 3.65 hours

60 seconds

Page 49: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Multimodality issue

Page 50: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

current entire sequencegoal

latent plandistribution space

planproposal

planrecognition

KL divergenceminimization

2. Learn latent plans using self-supervision

goalcurrent latent plan(sampled)

action likelihood

action αᵗaction

decoder

3. Decode plan to reconstruct actions

1. Given unlabeled play dataaction αᵗ αᵗ⁺ᵐ αⁿ

Replay Bufferunlabeled & unsegmented play videos & actions

Training Play-LMP

Page 51: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Play-LMP: Test time

1. Given a goal state

current@ 1Hz

goal

?

3. Generate an action (@30Hz) goalcurrent

@ 30Hz

action

actiondecoder

closedloop@ 30Hz

2. Generate a latent plan (@1Hz)

sampling

planproposal

plan distribution

latent plan

Page 52: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

18 tasks (for evaluation only)

Page 53: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Quantitative Accuracy

We obtain a single task-agnostic policyand evaluate it on 18 zero-shot tasks.

● Play-LMP: single policy trained oncheap unlabelled data: 85% zero shot

● Baseline: 18 policies trained onexpensive labelled data: 65%

● When perturbing the start position, the success is:○ baseline: 23%○ Play-LMP: 79%

Page 54: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Examples of success runs for Play-LMP

Goal Play-LMP policy

1x

(task: sliding)

Page 55: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Examples of success runs for Play-LMP

Goal Play-LMP policy

1x

(task: sweep)

Page 56: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Examples of success runs for Play-LMP

Goal Play-LMP policy

1x

(task: pull out of shelf)

Page 57: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Examples of success runs for Play-LMP

Goal Play-LMP policy

1x

(task: rotate left)

Page 58: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Some failure cases for Play-LMP

Goal Play-LMP policy

1x

(task: sliding)

Page 59: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Some failure cases for Play-LMP

Goal Play-LMP policy

1x

(task: pull out of shelf)

Page 60: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Retrying behavior emerging from Play-LMP

Goal Play-LMP policy

1x

(task: sliding)

Page 61: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Retrying behavior emerging from Play-LMP

Goal Play-LMP policy

1x

(task: pull out of shelf)

Page 62: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Retrying behavior emerging from Play-LMP

Goal Play-LMP policy

1x

(task: sweep right)

Page 63: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Composing 2 skills: grasp + close drawer

Goals Play-LMP policy

1x

=+

Page 64: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Composing 2 skills: put in shelf + close sliding

Goals Play-LMP policy

1x

=+

Page 65: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Composing 2 skills: open sliding + push green

Goals Play-LMP policy

1x

=+

Page 66: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Composing 2 skills: sweep + close drawer

Goals Play-LMP policy

1x

=+

Page 67: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Composing 2 skills: drawer open + sweep

Goals Play-LMP policy

1x

=+

Page 68: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

8 skills in a row

Goal Play-LMP policy

1.5x

Page 69: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Latent plan space (t-SNE)

Page 70: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Scalable

Rich Learning from demonstrations (LfD)

Scripted collection+ RL

Learning from play (LfP)

Richness & Scalability of Data

Page 71: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Recipe: Self-Supervision + Play

Page 72: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Takeaways

● Self-Supervision + Play recipe:

○ Self-supervise on lots of unlabeled data

○ Use play data

● Delay definitions of tasks, states or attributes,

Let self-supervision organize continuous spaces:

○ Continuum of states and attributes

○ Continuum of skills

Page 73: robotics Self-Supervision and PlayRobotics at Google[ Lynch, Khansari, Xiao, Kumar, Tompson, Levine, Sermanet @ under review ] [ learning-from-play.github.io ] No tasks No rewards

Sermanet @ OpenAI Symposium 2019

Questions?g.co/robotics

[email protected]

Corey Lynch Debidatta Dwibedi Soeren Pirk Jonathan Tompson Mohi Khansari Yusuf Aytar Yevgen Chebotar Yunfei Bai

Jasmine Hsu Eric Jang Vikash Kumar Ted Xiao Stefan Schaal Andrew Zisserman Sergey Levine Pierre Sermanet