Merge branch 'main' into aloha_hd5_to_dataset_v2
This commit is contained in:
commit
0d9a0cdb6f
|
@ -50,7 +50,7 @@ jobs:
|
||||||
uses: actions/checkout@v3
|
uses: actions/checkout@v3
|
||||||
|
|
||||||
- name: Install poetry
|
- name: Install poetry
|
||||||
run: pipx install poetry
|
run: pipx install "poetry<2.0.0"
|
||||||
|
|
||||||
- name: Poetry check
|
- name: Poetry check
|
||||||
run: poetry check
|
run: poetry check
|
||||||
|
@ -64,7 +64,7 @@ jobs:
|
||||||
uses: actions/checkout@v3
|
uses: actions/checkout@v3
|
||||||
|
|
||||||
- name: Install poetry
|
- name: Install poetry
|
||||||
run: pipx install poetry
|
run: pipx install "poetry<2.0.0"
|
||||||
|
|
||||||
- name: Install poetry-relax
|
- name: Install poetry-relax
|
||||||
run: poetry self add poetry-relax
|
run: poetry self add poetry-relax
|
||||||
|
|
|
@ -68,7 +68,7 @@
|
||||||
|
|
||||||
### Acknowledgment
|
### Acknowledgment
|
||||||
|
|
||||||
- Thanks to Tony Zaho, Zipeng Fu and colleagues for open sourcing ACT policy, ALOHA environments and datasets. Ours are adapted from [ALOHA](https://tonyzhaozh.github.io/aloha) and [Mobile ALOHA](https://mobile-aloha.github.io).
|
- Thanks to Tony Zhao, Zipeng Fu and colleagues for open sourcing ACT policy, ALOHA environments and datasets. Ours are adapted from [ALOHA](https://tonyzhaozh.github.io/aloha) and [Mobile ALOHA](https://mobile-aloha.github.io).
|
||||||
- Thanks to Cheng Chi, Zhenjia Xu and colleagues for open sourcing Diffusion policy, Pusht environment and datasets, as well as UMI datasets. Ours are adapted from [Diffusion Policy](https://diffusion-policy.cs.columbia.edu) and [UMI Gripper](https://umi-gripper.github.io).
|
- Thanks to Cheng Chi, Zhenjia Xu and colleagues for open sourcing Diffusion policy, Pusht environment and datasets, as well as UMI datasets. Ours are adapted from [Diffusion Policy](https://diffusion-policy.cs.columbia.edu) and [UMI Gripper](https://umi-gripper.github.io).
|
||||||
- Thanks to Nicklas Hansen, Yunhai Feng and colleagues for open sourcing TDMPC policy, Simxarm environments and datasets. Ours are adapted from [TDMPC](https://github.com/nicklashansen/tdmpc) and [FOWM](https://www.yunhaifeng.com/FOWM).
|
- Thanks to Nicklas Hansen, Yunhai Feng and colleagues for open sourcing TDMPC policy, Simxarm environments and datasets. Ours are adapted from [TDMPC](https://github.com/nicklashansen/tdmpc) and [FOWM](https://www.yunhaifeng.com/FOWM).
|
||||||
- Thanks to Antonio Loquercio and Ashish Kumar for their early support.
|
- Thanks to Antonio Loquercio and Ashish Kumar for their early support.
|
||||||
|
|
|
@ -21,7 +21,7 @@ How to decode videos?
|
||||||
|
|
||||||
## Variables
|
## Variables
|
||||||
**Image content & size**
|
**Image content & size**
|
||||||
We don't expect the same optimal settings for a dataset of images from a simulation, or from real-world in an appartment, or in a factory, or outdoor, or with lots of moving objects in the scene, etc. Similarly, loading times might not vary linearly with the image size (resolution).
|
We don't expect the same optimal settings for a dataset of images from a simulation, or from real-world in an apartment, or in a factory, or outdoor, or with lots of moving objects in the scene, etc. Similarly, loading times might not vary linearly with the image size (resolution).
|
||||||
For these reasons, we run this benchmark on four representative datasets:
|
For these reasons, we run this benchmark on four representative datasets:
|
||||||
- `lerobot/pusht_image`: (96 x 96 pixels) simulation with simple geometric shapes, fixed camera.
|
- `lerobot/pusht_image`: (96 x 96 pixels) simulation with simple geometric shapes, fixed camera.
|
||||||
- `aliberts/aloha_mobile_shrimp_image`: (480 x 640 pixels) real-world indoor, moving camera.
|
- `aliberts/aloha_mobile_shrimp_image`: (480 x 640 pixels) real-world indoor, moving camera.
|
||||||
|
@ -63,7 +63,7 @@ This of course is affected by the `-g` parameter during encoding, which specifie
|
||||||
|
|
||||||
Note that this differs significantly from a typical use case like watching a movie, in which every frame is loaded sequentially from the beginning to the end and it's acceptable to have big values for `-g`.
|
Note that this differs significantly from a typical use case like watching a movie, in which every frame is loaded sequentially from the beginning to the end and it's acceptable to have big values for `-g`.
|
||||||
|
|
||||||
Additionally, because some policies might request single timestamps that are a few frames appart, we also have the following scenario:
|
Additionally, because some policies might request single timestamps that are a few frames apart, we also have the following scenario:
|
||||||
- `2_frames_4_space`: 2 frames with 4 consecutive frames of spacing in between (e.g `[t, t + 5 / fps]`),
|
- `2_frames_4_space`: 2 frames with 4 consecutive frames of spacing in between (e.g `[t, t + 5 / fps]`),
|
||||||
|
|
||||||
However, due to how video decoding is implemented with `pyav`, we don't have access to an accurate seek so in practice this scenario is essentially the same as `6_frames` since all 6 frames between `t` and `t + 5 / fps` will be decoded.
|
However, due to how video decoding is implemented with `pyav`, we don't have access to an accurate seek so in practice this scenario is essentially the same as `6_frames` since all 6 frames between `t` and `t + 5 / fps` will be decoded.
|
||||||
|
@ -85,8 +85,8 @@ However, due to how video decoding is implemented with `pyav`, we don't have acc
|
||||||
**Average Structural Similarity Index Measure (higher is better)**
|
**Average Structural Similarity Index Measure (higher is better)**
|
||||||
`avg_ssim` evaluates the perceived quality of images by comparing luminance, contrast, and structure. SSIM values range from -1 to 1, where 1 indicates perfect similarity.
|
`avg_ssim` evaluates the perceived quality of images by comparing luminance, contrast, and structure. SSIM values range from -1 to 1, where 1 indicates perfect similarity.
|
||||||
|
|
||||||
One aspect that can't be measured here with those metrics is the compatibility of the encoding accross platforms, in particular on web browser, for visualization purposes.
|
One aspect that can't be measured here with those metrics is the compatibility of the encoding across platforms, in particular on web browser, for visualization purposes.
|
||||||
h264, h265 and AV1 are all commonly used codecs and should not be pose an issue. However, the chroma subsampling (`pix_fmt`) format might affect compatibility:
|
h264, h265 and AV1 are all commonly used codecs and should not pose an issue. However, the chroma subsampling (`pix_fmt`) format might affect compatibility:
|
||||||
- `yuv420p` is more widely supported across various platforms, including web browsers.
|
- `yuv420p` is more widely supported across various platforms, including web browsers.
|
||||||
- `yuv444p` offers higher color fidelity but might not be supported as broadly.
|
- `yuv444p` offers higher color fidelity but might not be supported as broadly.
|
||||||
|
|
||||||
|
@ -116,7 +116,7 @@ Additional encoding parameters exist that are not included in this benchmark. In
|
||||||
- `-preset` which allows for selecting encoding presets. This represents a collection of options that will provide a certain encoding speed to compression ratio. By leaving this parameter unspecified, it is considered to be `medium` for libx264 and libx265 and `8` for libsvtav1.
|
- `-preset` which allows for selecting encoding presets. This represents a collection of options that will provide a certain encoding speed to compression ratio. By leaving this parameter unspecified, it is considered to be `medium` for libx264 and libx265 and `8` for libsvtav1.
|
||||||
- `-tune` which allows to optimize the encoding for certains aspects (e.g. film quality, fast decoding, etc.).
|
- `-tune` which allows to optimize the encoding for certains aspects (e.g. film quality, fast decoding, etc.).
|
||||||
|
|
||||||
See the documentation mentioned above for more detailled info on these settings and for a more comprehensive list of other parameters.
|
See the documentation mentioned above for more detailed info on these settings and for a more comprehensive list of other parameters.
|
||||||
|
|
||||||
Similarly on the decoding side, other decoders exist but are not implemented in our current benchmark. To name a few:
|
Similarly on the decoding side, other decoders exist but are not implemented in our current benchmark. To name a few:
|
||||||
- `torchaudio`
|
- `torchaudio`
|
||||||
|
|
|
@ -159,11 +159,11 @@ DATASETS = {
|
||||||
**ALOHA_STATIC_INFO,
|
**ALOHA_STATIC_INFO,
|
||||||
},
|
},
|
||||||
"aloha_static_vinh_cup": {
|
"aloha_static_vinh_cup": {
|
||||||
"single_task": "Pick up the platic cup with the right arm, then pop its lid open with the left arm.",
|
"single_task": "Pick up the plastic cup with the right arm, then pop its lid open with the left arm.",
|
||||||
**ALOHA_STATIC_INFO,
|
**ALOHA_STATIC_INFO,
|
||||||
},
|
},
|
||||||
"aloha_static_vinh_cup_left": {
|
"aloha_static_vinh_cup_left": {
|
||||||
"single_task": "Pick up the platic cup with the left arm, then pop its lid open with the right arm.",
|
"single_task": "Pick up the plastic cup with the left arm, then pop its lid open with the right arm.",
|
||||||
**ALOHA_STATIC_INFO,
|
**ALOHA_STATIC_INFO,
|
||||||
},
|
},
|
||||||
"aloha_static_ziploc_slide": {"single_task": "Slide open the ziploc bag.", **ALOHA_STATIC_INFO},
|
"aloha_static_ziploc_slide": {"single_task": "Slide open the ziploc bag.", **ALOHA_STATIC_INFO},
|
||||||
|
|
|
@ -177,7 +177,7 @@ def run_server(
|
||||||
{"url": url_for("static", filename=video_path), "filename": video_path.parent.name}
|
{"url": url_for("static", filename=video_path), "filename": video_path.parent.name}
|
||||||
for video_path in video_paths
|
for video_path in video_paths
|
||||||
]
|
]
|
||||||
tasks = dataset.meta.episodes[0]["tasks"]
|
tasks = dataset.meta.episodes[episode_id]["tasks"]
|
||||||
else:
|
else:
|
||||||
video_keys = [key for key, ft in dataset.features.items() if ft["dtype"] == "video"]
|
video_keys = [key for key, ft in dataset.features.items() if ft["dtype"] == "video"]
|
||||||
videos_info = [
|
videos_info = [
|
||||||
|
@ -232,69 +232,54 @@ def get_episode_data(dataset: LeRobotDataset | IterableNamespace, episode_index)
|
||||||
"""Get a csv str containing timeseries data of an episode (e.g. state and action).
|
"""Get a csv str containing timeseries data of an episode (e.g. state and action).
|
||||||
This file will be loaded by Dygraph javascript to plot data in real time."""
|
This file will be loaded by Dygraph javascript to plot data in real time."""
|
||||||
columns = []
|
columns = []
|
||||||
has_state = "observation.state" in dataset.features
|
|
||||||
has_action = "action" in dataset.features
|
selected_columns = [col for col, ft in dataset.features.items() if ft["dtype"] == "float32"]
|
||||||
|
selected_columns.remove("timestamp")
|
||||||
|
|
||||||
# init header of csv with state and action names
|
# init header of csv with state and action names
|
||||||
header = ["timestamp"]
|
header = ["timestamp"]
|
||||||
if has_state:
|
|
||||||
|
for column_name in selected_columns:
|
||||||
dim_state = (
|
dim_state = (
|
||||||
dataset.meta.shapes["observation.state"][0]
|
dataset.meta.shapes[column_name][0]
|
||||||
if isinstance(dataset, LeRobotDataset)
|
if isinstance(dataset, LeRobotDataset)
|
||||||
else dataset.features["observation.state"].shape[0]
|
else dataset.features[column_name].shape[0]
|
||||||
)
|
)
|
||||||
header += [f"state_{i}" for i in range(dim_state)]
|
header += [f"{column_name}_{i}" for i in range(dim_state)]
|
||||||
column_names = dataset.features["observation.state"]["names"]
|
|
||||||
while not isinstance(column_names, list):
|
if "names" in dataset.features[column_name] and dataset.features[column_name]["names"]:
|
||||||
column_names = list(column_names.values())[0]
|
column_names = dataset.features[column_name]["names"]
|
||||||
columns.append({"key": "state", "value": column_names})
|
while not isinstance(column_names, list):
|
||||||
if has_action:
|
column_names = list(column_names.values())[0]
|
||||||
dim_action = (
|
else:
|
||||||
dataset.meta.shapes["action"][0]
|
column_names = [f"motor_{i}" for i in range(dim_state)]
|
||||||
if isinstance(dataset, LeRobotDataset)
|
columns.append({"key": column_name, "value": column_names})
|
||||||
else dataset.features.action.shape[0]
|
|
||||||
)
|
selected_columns.insert(0, "timestamp")
|
||||||
header += [f"action_{i}" for i in range(dim_action)]
|
|
||||||
column_names = dataset.features["action"]["names"]
|
|
||||||
while not isinstance(column_names, list):
|
|
||||||
column_names = list(column_names.values())[0]
|
|
||||||
columns.append({"key": "action", "value": column_names})
|
|
||||||
|
|
||||||
if isinstance(dataset, LeRobotDataset):
|
if isinstance(dataset, LeRobotDataset):
|
||||||
from_idx = dataset.episode_data_index["from"][episode_index]
|
from_idx = dataset.episode_data_index["from"][episode_index]
|
||||||
to_idx = dataset.episode_data_index["to"][episode_index]
|
to_idx = dataset.episode_data_index["to"][episode_index]
|
||||||
selected_columns = ["timestamp"]
|
|
||||||
if has_state:
|
|
||||||
selected_columns += ["observation.state"]
|
|
||||||
if has_action:
|
|
||||||
selected_columns += ["action"]
|
|
||||||
data = (
|
data = (
|
||||||
dataset.hf_dataset.select(range(from_idx, to_idx))
|
dataset.hf_dataset.select(range(from_idx, to_idx))
|
||||||
.select_columns(selected_columns)
|
.select_columns(selected_columns)
|
||||||
.with_format("numpy")
|
.with_format("pandas")
|
||||||
)
|
)
|
||||||
rows = np.hstack(
|
|
||||||
(np.expand_dims(data["timestamp"], axis=1), *[data[col] for col in selected_columns[1:]])
|
|
||||||
).tolist()
|
|
||||||
else:
|
else:
|
||||||
repo_id = dataset.repo_id
|
repo_id = dataset.repo_id
|
||||||
selected_columns = ["timestamp"]
|
|
||||||
if "observation.state" in dataset.features:
|
|
||||||
selected_columns.append("observation.state")
|
|
||||||
if "action" in dataset.features:
|
|
||||||
selected_columns.append("action")
|
|
||||||
|
|
||||||
url = f"https://huggingface.co/datasets/{repo_id}/resolve/main/" + dataset.data_path.format(
|
url = f"https://huggingface.co/datasets/{repo_id}/resolve/main/" + dataset.data_path.format(
|
||||||
episode_chunk=int(episode_index) // dataset.chunks_size, episode_index=episode_index
|
episode_chunk=int(episode_index) // dataset.chunks_size, episode_index=episode_index
|
||||||
)
|
)
|
||||||
df = pd.read_parquet(url)
|
df = pd.read_parquet(url)
|
||||||
data = df[selected_columns] # Select specific columns
|
data = df[selected_columns] # Select specific columns
|
||||||
rows = np.hstack(
|
|
||||||
(
|
rows = np.hstack(
|
||||||
np.expand_dims(data["timestamp"], axis=1),
|
(
|
||||||
*[np.vstack(data[col]) for col in selected_columns[1:]],
|
np.expand_dims(data["timestamp"], axis=1),
|
||||||
)
|
*[np.vstack(data[col]) for col in selected_columns[1:]],
|
||||||
).tolist()
|
)
|
||||||
|
).tolist()
|
||||||
|
|
||||||
# Convert data to CSV string
|
# Convert data to CSV string
|
||||||
csv_buffer = StringIO()
|
csv_buffer = StringIO()
|
||||||
|
@ -379,10 +364,6 @@ def visualize_dataset_html(
|
||||||
template_folder=template_dir,
|
template_folder=template_dir,
|
||||||
)
|
)
|
||||||
else:
|
else:
|
||||||
image_keys = dataset.meta.image_keys if isinstance(dataset, LeRobotDataset) else []
|
|
||||||
if len(image_keys) > 0:
|
|
||||||
raise NotImplementedError(f"Image keys ({image_keys=}) are currently not supported.")
|
|
||||||
|
|
||||||
# Create a simlink from the dataset video folder containg mp4 files to the output directory
|
# Create a simlink from the dataset video folder containg mp4 files to the output directory
|
||||||
# so that the http server can get access to the mp4 files.
|
# so that the http server can get access to the mp4 files.
|
||||||
if isinstance(dataset, LeRobotDataset):
|
if isinstance(dataset, LeRobotDataset):
|
||||||
|
|
|
@ -98,9 +98,34 @@
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<!-- Videos -->
|
<!-- Videos -->
|
||||||
|
<div class="max-w-32 relative text-sm mb-4 select-none"
|
||||||
|
@click.outside="isVideosDropdownOpen = false">
|
||||||
|
<div
|
||||||
|
@click="isVideosDropdownOpen = !isVideosDropdownOpen"
|
||||||
|
class="p-2 border border-slate-500 rounded flex justify-between items-center cursor-pointer"
|
||||||
|
>
|
||||||
|
<span class="truncate">filter videos</span>
|
||||||
|
<div class="transition-transform" :class="{ 'rotate-180': isVideosDropdownOpen }">🔽</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div x-show="isVideosDropdownOpen"
|
||||||
|
class="absolute mt-1 border border-slate-500 rounded shadow-lg z-10">
|
||||||
|
<div>
|
||||||
|
<template x-for="option in videosKeys" :key="option">
|
||||||
|
<div
|
||||||
|
@click="videosKeysSelected = videosKeysSelected.includes(option) ? videosKeysSelected.filter(v => v !== option) : [...videosKeysSelected, option]"
|
||||||
|
class="p-2 cursor-pointer bg-slate-900"
|
||||||
|
:class="{ 'bg-slate-700': videosKeysSelected.includes(option) }"
|
||||||
|
x-text="option"
|
||||||
|
></div>
|
||||||
|
</template>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
<div class="flex flex-wrap gap-x-2 gap-y-6">
|
<div class="flex flex-wrap gap-x-2 gap-y-6">
|
||||||
{% for video_info in videos_info %}
|
{% for video_info in videos_info %}
|
||||||
<div x-show="!videoCodecError" class="max-w-96 relative">
|
<div x-show="!videoCodecError && videosKeysSelected.includes('{{ video_info.filename }}')" class="max-w-96 relative">
|
||||||
<p class="absolute inset-x-0 -top-4 text-sm text-gray-300 bg-gray-800 px-2 rounded-t-xl truncate">{{ video_info.filename }}</p>
|
<p class="absolute inset-x-0 -top-4 text-sm text-gray-300 bg-gray-800 px-2 rounded-t-xl truncate">{{ video_info.filename }}</p>
|
||||||
<video muted loop type="video/mp4" class="object-contain w-full h-full" @canplaythrough="videoCanPlay" @timeupdate="() => {
|
<video muted loop type="video/mp4" class="object-contain w-full h-full" @canplaythrough="videoCanPlay" @timeupdate="() => {
|
||||||
if (video.duration) {
|
if (video.duration) {
|
||||||
|
@ -250,6 +275,9 @@
|
||||||
nVideos: {{ videos_info | length }},
|
nVideos: {{ videos_info | length }},
|
||||||
nVideoReadyToPlay: 0,
|
nVideoReadyToPlay: 0,
|
||||||
videoCodecError: false,
|
videoCodecError: false,
|
||||||
|
isVideosDropdownOpen: false,
|
||||||
|
videosKeys: {{ videos_info | map(attribute='filename') | list | tojson }},
|
||||||
|
videosKeysSelected: [],
|
||||||
columns: {{ columns | tojson }},
|
columns: {{ columns | tojson }},
|
||||||
rowLabels: {{ columns | tojson }}.reduce((colA, colB) => colA.value.length > colB.value.length ? colA : colB).value,
|
rowLabels: {{ columns | tojson }}.reduce((colA, colB) => colA.value.length > colB.value.length ? colA : colB).value,
|
||||||
|
|
||||||
|
@ -261,6 +289,7 @@
|
||||||
if(!canPlayVideos){
|
if(!canPlayVideos){
|
||||||
this.videoCodecError = true;
|
this.videoCodecError = true;
|
||||||
}
|
}
|
||||||
|
this.videosKeysSelected = this.videosKeys.map(opt => opt)
|
||||||
|
|
||||||
// process CSV data
|
// process CSV data
|
||||||
const csvDataStr = {{ episode_data_csv_str|tojson|safe }};
|
const csvDataStr = {{ episode_data_csv_str|tojson|safe }};
|
||||||
|
|
Loading…
Reference in New Issue