Abstract
We present Sparsh-skin, a pre-trained encoder for magnetic skin sensors
distributed across the fingertips, phalanges, and palm of a dexterous robot
hand. Magnetic tactile skins offer a flexible form factor for hand-wide
coverage with fast response times, in contrast to vision-based tactile sensors
that are restricted to the fingertips and limited by bandwidth. Full hand
tactile perception is crucial for robot dexterity. However, a lack of
general-purpose models, challenges with interpreting magnetic flux and
calibration have limited the adoption of these sensors.Sparsh-skin, given a
history of kinematic and tactile sensing across a hand, outputs a latent
tactile embedding that can be used in any downstream task. The encoder is
self-supervised via self-distillation on a variety of unlabeled hand-object
interactions using an Allegro hand sensorized with Xela uSkin. In experiments
across several benchmark tasks, from state estimation to policy learning, we
find that pretrained Sparsh-skin representations are both sample efficient in
learning downstream tasks and improve task performance by over 41% compared to
prior work and over 56% compared to end-to-end learning.
Sparsh-skin overview
We collect a pre-training dataset of the robot hand performing various atomic manipulation actions
with 14 household object and toys including squeeze, slide, rotation, pick-and-drop, circrumduction,
pressing, wiping, and articulation. Using a VR based teleoperation system with Meta Quest 3, we
record \( \sim \) 4 hours of varied interactions.
Then, Sparsh-skin uses a self-distillation approach to learn sensor-level
representations over small windows of tactile data. Specifically, the student encoder network is
given corrupted tactile signal data and trained to predict / match the representations that are
predicted by the teacher network from complete tactile signal data.
Once the representations are pre-trained they can be used for downstream tasks such as force
estimation, pose estimation, and policy learning.
Sparsh-skin downstream tasks

A snapshot of our robot system for downstream tasks.
Signal auto reconstructions from Sparsh-skin features
We visualize the auto-reconstruction of tactile signals from the latent features of Sparsh-skin.
Red dots indicate the tactile sensor locations on the robot hand. Normal force applied on the sensor
directly correlates with the radius of the green blobs over a sensor.
Further, the offset of the center of the green blob, higlighted by red vectors from the canonical
sensor position to the offset position, indicates the shear force applied on the sensors.
Pose estimation
We also show that Sparsh-skin representations capture relative object pose
information. (See comparisons in the paper)
Plug insertion task
Sparsh-skin representations can enable planning for manipulation.
Here, we show the results of a plug insertion task where the goal is to insert a pre-grasped plug
into the first socket of an extension power strip.
Specifically, we train a transformer decoder to predict action sequences. The policy is given as
observations input, images from three third-person cameras and a wrist camera,
as well as tactile observations in the form of Sparsh-skin representations.
The robot setup is shown in the setup figure in the overview section.
Sparsh-skin frozen
Vision only
BibTeX
If you find our work useful, please consider citing our paper:
@article{sharma2025sparshskin,
title = {Self-supervised perception for tactile skin covered dexterous hands},
author = {Akash Sharma, Carolina Higuera, Chaithanya Krishna Bodduluri, Zixi Liu, Taosha Fan, Tess Hellebrekers, Mike Lambeta, Byron Boots, Michael Kaess, Tingfan Wu, Francois R. Hogan and Mustafa Mukadam},
year = {2025},
eprint={2505.11420},
arxivPrefix = {arxiv},
primaryClass = {cs.RO},
url = {https://arxiv.org/abs/2505.11420}
}