Website Title

SSFold:Learning to Fold Arbitrary Crumpled
Cloth Using Graph Dynamics from Human Demonstration

ChangShi Zhou

Haichuan Xu

Jiarui Hu

Feng Luan

Zhipeng Wang

Yanchao Dong

Yanmin Zhou

Bin He

Abstract

Robotic cloth manipulation faces challenges due to complex dynamics and the high dimensionality of configuration spaces. Previous methods have largely focused on isolated smoothing or folding tasks and heavily relied on simulations, often failing to bridge the significant sim-to-real gap in deformable object manipulation.

To overcome these challenges, we propose a two-stream architecture with sequential and spatial pathways, unifying smoothing and folding tasks into a single adaptable policy model that accommodates various cloth types and states. The sequential stream determines cloth pick and place positions, while the spatial stream, leveraging a visible connectivity dynamics model, constructs a visibility connectivity graph from partial point cloud data of self-occluded cloth, thus improving the robot’s perception of the cloth’s current state. To bridge the sim-to-real gap, we utilize a hand tracking detection algorithm to gather and integrate human demonstration data into our novel end-to-end neural network, improving real-world adaptability. Our method, validated on a UR5 robot across four distinct cloth folding tasks, reliably achieves folded states from crumpled initial configurations with success rates of 99%, 99%, 83%, and 67%. It outperforms existing state-of-the-art cloth manipulation techniques and demonstrates strong generalization to unseen cloth with diverse colors, shapes, and stiffness in real-world experiments.

Approach Overview

Fig. 1. Method overview. (a) In a workspace equipped with a UR5 robotic arm and a piece of cloth in an arbitrary crumpled configuration, a top-down RGB image is captured by the camera. (b) The pick point, identified using a YOLOv10-based hand tracking algorithm, is concatenated with the captured RGB image. This combined input is then fed into the U-net network within the Sequential Stream. (c) In the Spatial Stream, the infrared and depth images captured by the camera are first used to extract a mask of the cloth region and generate the corresponding point cloud. The point cloud is voxelized to reduce complexity, followed by inferring nearby edges and mesh edges to predict the cloth’s graph data. (d) Finally, the features from both streams are fused and processed to produce an output action map, which guides the robotic arm to execute the corresponding actions using parameterized action primitives.

Sequential demonstration of folding arbitrary crumpled cloth across three distinct tasks

Fig. 2. Each task consists of two rows: the first row presents the top-view operation sequence captured by the overhead camera, while the second row displays the side-view operation sequence captured by the side camera. Q_pick represents the predicted pick heatmap, Q_place represents the predicted place heatmap, PA represents the predicted action map, and a_t represents the action pair.

SSFold:Learning to Fold Arbitrary Crumpled Cloth Using Graph Dynamics from Human Demonstration

Abstract

Approach Overview

Sequential demonstration of folding arbitrary crumpled cloth across three distinct tasks

We used 18 samples from each of three fabric categories--Towels,T-shirts,and Shorts(see Fig. 7).

Real World Experiments

1. Performance on Double Inward Fold (DIF) tasks across towels of varying difficulty

easy task

medium task

hard task

2. Performance on Double Triangle Fold (DTF) tasks across towels of varying difficulty

easy task

medium task

hard task

3. Performance on Four Corners Inward Fold (FCIF) tasks across towels of varying difficulty

easy task

medium task

hard task

4. Performance on Shorts Fold (ShF) tasks across towels of varying difficulty

easy task

medium task

hard task

5. Performance on T-shirt Fold (TSF) tasks across towels of varying difficulty

easy task

medium task

hard task

Generalization to Unseen Cloth

We tested our model on these unseen cloth types with different properties across six tasks,ranging from arbitrary crumpled states to folded configurations.

Single Fold

Towel 1

Towel 2

Double Inward Fold

Towel 1

Towel 2

Double Triangle Fold

Towel 1

Towel 2

All Corners Inward Fold

Towel 1

Towel 2

Shorts Fold

Shorts 1

Shorts 2

T-Shirt Fold

T-Shirt 1

T-Shirt 2

BibTeX

SSFold:Learning to Fold Arbitrary Crumpled
Cloth Using Graph Dynamics from Human Demonstration