Download link: (~55G).
The dataset serves mainly for object rearrangement research, but can also be used to do other areas, e.g., object segmentation, depth estimation, robotic grasping, scene graph, and language-guided tasks. It is collected by placing availble meshes from Google Scanned Objects and HouseCat6D on tabletop scenarios physically checked by Pybullet and rendered by NViSII.
|--📁 raw
|--📁 table_id
|--📄 scene_name.obj # scene mesh
|--📄 scene_name{none/_mid/_goal}_view-{1/2}.json # scene rendering params
|--📄 scene_name{none/_mid/_goal}_scene_graph.json # scene graph labels
|--🖼 scene_name{none/_mid/_goal}_scene_graph.png # scene graph visualization
|--🖼 scene_name{none/_mid/_goal}_view-{1/2}.png # rendered RGB image
|--🖼 scene_name{none/_mid/_goal}_view-{1/2}_camp_depth.png # rendered depth image
|--📄 scene_name{none/_mid/_goal}_view-{1/2}_depth.pkl # raw depth data
|--🖼 scene_name{none/_mid/_goal}_view-{1/2}_seg.exr # object masks
|--📁 models
|--📁 bottle # original object meshes
|--📁 bottle_simplified # watertight and simplified meshes
|--📁 collision # decomposed meshes for pybullet simulation
|--📁 bottle
|--📄 relationships_{train/validation}.json
Scene graph labels for all scenes: list of all triplets consisting of semantic classes (nodes) and semantic relationships (edges).
|--📄 obj_boxes_{train/validation}.json
Bounding boxes of each object in all scenes.
|--📄 classes.txt
List of the semantic classes.
|--📄 relationships.txt
List of all relationship types.
|--📄 train_scenes.txt
training split.
|--📄 description.json
scene description in sentences. (for LLM usage)
We define that the collected scenes with a bowl inside may contain tablespoon. The scenes containing a cup may contain a teaspoon as well. The scenes containing a plate may contain a fork and a knife.
is the name of the current table used to collect scenes.
is the name of the current scene.
It has three types based on a common PREFIX--"table_id
- Initial scene (objects in random positions and rotations)--just PREFIX.
- Middle scene (objects in random positions and canonical rotations)--PREFIX+
. - Goal scene (objects in goal positions and canonical rotations)--PREFIX+
: The number of objects in this scene.
: It may contain one, two or all categories from {bowl, cup, plate}. X
means there is no existence of this category. For example, A scene that only contains a bowl will be named as X-bowl-X
, while a scene containing a bowl and a plate is named as bowl-plate-X
: The relation between the cutlery and the object in the scene. For example, a scene containing a bowl (without plate and cup) with a table spoon inside is named as X-in-X
. Together with scene_type
, the completed name is scene-X-bowl-X_type-X-in-X
Each scene contains scene_name{none/_mid/_goal}_view-{1/2}.json
, describing the rendering settings and object status.
camera_data: { # camera location and intrinsics
"width": 640,
"height": 480,
"segmentation_id": 0,
"camera_look_at": {"at":...,"eye":...,"up":...},
"location_world": ...,
"quaternion_world_xyzw": ...,
"intrinsics": ...
objects: {
"class": "teapot", # class name
"name": "teapot_1", # object name {class}_{id}
"global_id": 39, # global id in the dataset
"class_id": 11,
"segmentation_id": 4, # instance id
"color": "#80a261", # color hex id
"scale": 0.13, # the scale applied to the original mesh
local_to_world_matrix: ... # the location matrix under the world frame
8points: ... # axis-aligned bounding box corners
param6: ... # size and location, [l, w, h, cx, cy, cz]
location: ... # postion under the camera_frame
quaternion_xyzw: ... # rotation using quaternion
visibility_image: ... # 1 means visible, 0 means invisible
bounding_box: ... # 2D bbox
contain labels for all scenes in the dataset. The structure is similar to 3DSSG and SG-FRONT, except that the ids are globally consistent for the whole dataset.
"scene_id": "faa5d5ba2a002922511e5b9dc733c75c_0124_5_scene-X-bowl-cup_mid", # scene name
"relationships": [
18, # obj_A global_id
113, # obj_B global_id
6, # relationship_id
"standing on" # relationship
"objects": {
"76": "box" # global_id: object name
concurs as above. none
"faa5d5ba2a002922511e5b9dc733c75c_0124_5_scene-X-bowl-cup_mid": {
"76": { # object global id
"param6": [0.07584431767463684,...], # bbox params (l,w,h,x,y,z)
"8points": [ # corner points af each bbox
The dataset is under MIT license. For any queries, feel free to drop an email to [email protected]