Use -g 2, Fix delta_timestamps, Redo benchmark

This commit is contained in:
Cadene 2024-05-01 20:00:58 +00:00
parent a00102b643
commit 370fb5348e
7 changed files with 480 additions and 261 deletions

View File

@ -19,10 +19,10 @@ How to decode videos?
## Metrics
**Percentage of data compression (higher is better)**
`pc_compression` is the ratio of the memory space on disk taken by the original images to encode, to the memory space taken by the encoded video. For instance, `pc_compression=400%` means that the video takes 4 times less memory space on disk compared to the original images.
`compression_factor` is the ratio of the memory space on disk taken by the original images to encode, to the memory space taken by the encoded video. For instance, `compression_factor=4` means that the video takes 4 times less memory space on disk compared to the original images.
**Percentage of loading time (lower is better)**
`pc_load_time` is the ratio of the time it takes to load original images at given timestamps, to the time it takes to decode the exact same frames from the video. Lower is better. For instance, `pc_load_time=120%` means that decoding from video is a bit slower than loading the original images.
**Percentage of loading time (higher is better)**
`load_time_factor` is the ratio of the time it takes to load original images at given timestamps, to the time it takes to decode the exact same frames from the video. Higher is better. For instance, `load_time_factor=0.5` means that decoding from video is 2 times slower than loading the original images.
**Average L2 error per pixel (lower is better)**
`avg_per_pixel_l2_error` is the average L2 error between each decoded frame and its corresponding original image over all requested timestamps, and also divided by the number of pixels in the image to be comparable when switching to different image sizes.
@ -40,7 +40,12 @@ How to decode videos?
We don't expect the same optimal settings for a dataset of images from a simulation, or from real-world in an appartment, or in a factory, or outdoor, etc. Hence, we run this bechmark on two datasets: `pusht` (simulation) and `umi` (real-world outdoor).
**Requested timestamps**
In this benchmark, we focus on the loading time of random access, so we are not interested about sequentially loading all frames of a video like in a movie. However, the number of consecutive timestamps requested and their spacing can greatly affect the `pc_load_time`. In fact, it is expected to get faster loading time by decoding a large number of consecutive frames from a video, than to load the same data from individual images. To reflect our robotics use case, we consider a setting where we load 2 consecutive frames with 4 frames of spacing.
In this benchmark, we focus on the loading time of random access, so we are not interested about sequentially loading all frames of a video like in a movie. However, the number of consecutive timestamps requested and their spacing can greatly affect the `load_time_factor`. In fact, it is expected to get faster loading time by decoding a large number of consecutive frames from a video, than to load the same data from individual images. To reflect our robotics use case, we consider a few settings:
- `single_frame`: 1 frame,
- `2_frames`: 2 consecutive frames (e.g. `[t, t + 1 / fps]`),
- `2_frames_4_space`: 2 consecutive frames with 4 frames of spacing (e.g `[t, t + 4 / fps]`),
**Data augmentations**
We might revisit this benchmark and find better settings if we train our policies with various data augmentations to make them more robusts (e.g. robust to color changes, compression, etc.).
@ -48,10 +53,8 @@ We might revisit this benchmark and find better settings if we train our policie
## Results
### Loading 2 consecutive frames with 4 frames spacing (Diffusion Policy setting)
**`decoder`**
| repo_id | decoder | pc_load_time | avg_per_pixel_l2_error |
| repo_id | decoder | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- |
| lerobot/pusht | <span style="color: #32CD32;">torchvision</span> | 0.166 | 0.0000119 |
| lerobot/pusht | ffmpegio | 0.009 | 0.0001182 |
@ -60,127 +63,274 @@ We might revisit this benchmark and find better settings if we train our policie
| lerobot/umi_cup_in_the_wild | ffmpegio | 0.010 | 0.0000735 |
| lerobot/umi_cup_in_the_wild | torchaudio | 0.154 | 0.0000340 |
### `1_frame`
**`pix_fmt`**
| repo_id | pix_fmt | pc_compression | pc_load_time | avg_per_pixel_l2_error |
| repo_id | pix_fmt | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- | --- |
| lerobot/pusht | yuv420p | 3.602 | 0.202 | 0.0000661 |
| lerobot/pusht | <span style="color: #32CD32;">yuv444p</span> | 3.213 | 0.153 | 0.0000110 |
| lerobot/umi_cup_in_the_wild | yuv420p | 8.879 | 0.202 | 0.0000332 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">yuv444p</span> | 8.517 | 0.165 | 0.0000175 |
| lerobot/pusht | yuv420p | 3.788 | 0.224 | 0.0000760 |
| lerobot/pusht | yuv444p | 3.646 | 0.185 | 0.0000443 |
| lerobot/umi_cup_in_the_wild | yuv420p | 14.391 | 0.388 | 0.0000469 |
| lerobot/umi_cup_in_the_wild | yuv444p | 14.932 | 0.329 | 0.0000397 |
**`g`**
| repo_id | g | pc_compression | pc_load_time | avg_per_pixel_l2_error |
| repo_id | g | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- | --- |
| lerobot/pusht | 1 | 1.308 | 0.190 | 0.0000151 |
| lerobot/pusht | 5 | 2.739 | 0.184 | 0.0000123 |
| lerobot/pusht | 10 | 3.213 | 0.144 | 0.0000116 |
| lerobot/pusht | 15 | 3.460 | 0.137 | 0.0000112 |
| lerobot/pusht | 20 | 3.559 | 0.118 | 0.0000109 |
| lerobot/pusht | 30 | 3.697 | 0.104 | 0.0000117 |
| lerobot/pusht | 40 | 3.763 | 0.092 | 0.0000116 |
| lerobot/pusht | 60 | 3.925 | 0.068 | 0.0000117 |
| lerobot/pusht | 100 | 4.010 | 0.054 | 0.0000117 |
| lerobot/pusht | <span style="color: #32CD32;">None</span> | 4.058 | 0.043 | 0.0000117 |
| lerobot/umi_cup_in_the_wild | 1 | 4.790 | 0.236 | 0.0000221 |
| lerobot/umi_cup_in_the_wild | 5 | 7.707 | 0.201 | 0.0000185 |
| lerobot/umi_cup_in_the_wild | 10 | 8.517 | 0.172 | 0.0000177 |
| lerobot/umi_cup_in_the_wild | 15 | 8.830 | 0.152 | 0.0000170 |
| lerobot/umi_cup_in_the_wild | 20 | 8.961 | 0.133 | 0.0000167 |
| lerobot/umi_cup_in_the_wild | 30 | 8.850 | 0.113 | 0.0000167 |
| lerobot/umi_cup_in_the_wild | 40 | 8.996 | 0.109 | 0.0000174 |
| lerobot/umi_cup_in_the_wild | 60 | 9.113 | 0.081 | 0.0000163 |
| lerobot/umi_cup_in_the_wild | 100 | 9.278 | 0.051 | 0.0000173 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">None</span> | 9.396 | 0.030 | 0.0000165 |
| lerobot/pusht | 1 | 2.543 | 0.204 | 0.0000556 |
| lerobot/pusht | 2 | 3.646 | 0.182 | 0.0000443 |
| lerobot/pusht | 3 | 4.431 | 0.174 | 0.0000450 |
| lerobot/pusht | 4 | 5.103 | 0.163 | 0.0000448 |
| lerobot/pusht | 5 | 5.625 | 0.163 | 0.0000436 |
| lerobot/pusht | 6 | 5.974 | 0.155 | 0.0000427 |
| lerobot/pusht | 10 | 6.814 | 0.130 | 0.0000410 |
| lerobot/pusht | 15 | 7.431 | 0.105 | 0.0000406 |
| lerobot/pusht | 20 | 7.662 | 0.097 | 0.0000400 |
| lerobot/pusht | 40 | 8.163 | 0.061 | 0.0000405 |
| lerobot/pusht | 100 | 8.761 | 0.039 | 0.0000422 |
| lerobot/pusht | None | 8.909 | 0.024 | 0.0000431 |
| lerobot/umi_cup_in_the_wild | 1 | 14.411 | 0.444 | 0.0000601 |
| lerobot/umi_cup_in_the_wild | 2 | 14.932 | 0.345 | 0.0000397 |
| lerobot/umi_cup_in_the_wild | 3 | 20.174 | 0.282 | 0.0000416 |
| lerobot/umi_cup_in_the_wild | 4 | 24.889 | 0.271 | 0.0000415 |
| lerobot/umi_cup_in_the_wild | 5 | 28.825 | 0.260 | 0.0000415 |
| lerobot/umi_cup_in_the_wild | 6 | 31.635 | 0.249 | 0.0000415 |
| lerobot/umi_cup_in_the_wild | 10 | 39.418 | 0.195 | 0.0000399 |
| lerobot/umi_cup_in_the_wild | 15 | 44.577 | 0.169 | 0.0000394 |
| lerobot/umi_cup_in_the_wild | 20 | 47.907 | 0.140 | 0.0000390 |
| lerobot/umi_cup_in_the_wild | 40 | 52.554 | 0.096 | 0.0000384 |
| lerobot/umi_cup_in_the_wild | 100 | 58.241 | 0.046 | 0.0000390 |
| lerobot/umi_cup_in_the_wild | None | 60.530 | 0.022 | 0.0000400 |
**`crf`**
| repo_id | crf | pc_compression | pc_load_time | avg_per_pixel_l2_error |
| repo_id | crf | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- | --- |
| lerobot/pusht | 0 | 4.529 | 0.041 | 0.0000035 |
| lerobot/pusht | 5 | 3.138 | 0.040 | 0.0000077 |
| lerobot/pusht | <span style="color: #32CD32;">10</span> | 4.058 | 0.038 | 0.0000121 |
| lerobot/pusht | <span style="color: #32CD32;">15</span> | 5.407 | 0.039 | 0.0000195 |
| lerobot/pusht | <span style="color: #32CD32;">20</span> | 7.335 | 0.039 | 0.0000319 |
| lerobot/pusht | <span style="color: #32CD32;">None</span> | 8.909 | 0.046 | 0.0000425 |
| lerobot/pusht | 25 | 10.213 | 0.039 | 0.0000519 |
| lerobot/pusht | 30 | 14.516 | 0.041 | 0.0000795 |
| lerobot/pusht | 40 | 23.546 | 0.041 | 0.0001557 |
| lerobot/pusht | 50 | 28.460 | 0.042 | 0.0002723 |
| lerobot/umi_cup_in_the_wild | 0 | 2.318 | 0.012 | 0.0000056 |
| lerobot/umi_cup_in_the_wild | 5 | 4.899 | 0.019 | 0.0000132 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">10</span> | 9.396 | 0.026 | 0.0000183 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">15</span> | 19.161 | 0.034 | 0.0000241 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">20</span> | 39.311 | 0.039 | 0.0000329 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">None</span> | 60.530 | 0.043 | 0.0000401 |
| lerobot/umi_cup_in_the_wild | 25 | 81.048 | 0.046 | 0.0000454 |
| lerobot/umi_cup_in_the_wild | 30 | 165.189 | 0.051 | 0.0000609 |
| lerobot/umi_cup_in_the_wild | 40 | 544.478 | 0.056 | 0.0001095 |
| lerobot/umi_cup_in_the_wild | 50 | 1109.556 | 0.072 | 0.0001815 |
| lerobot/pusht | 0 | 1.699 | 0.175 | 0.0000035 |
| lerobot/pusht | 5 | 1.409 | 0.181 | 0.0000080 |
| lerobot/pusht | 10 | 1.842 | 0.172 | 0.0000123 |
| lerobot/pusht | 15 | 2.322 | 0.187 | 0.0000211 |
| lerobot/pusht | 20 | 3.050 | 0.181 | 0.0000346 |
| lerobot/pusht | None | 3.646 | 0.189 | 0.0000443 |
| lerobot/pusht | 25 | 3.969 | 0.186 | 0.0000521 |
| lerobot/pusht | 30 | 5.687 | 0.184 | 0.0000850 |
| lerobot/pusht | 40 | 10.818 | 0.193 | 0.0001726 |
| lerobot/pusht | 50 | 18.185 | 0.183 | 0.0002606 |
| lerobot/umi_cup_in_the_wild | 0 | 1.918 | 0.165 | 0.0000056 |
| lerobot/umi_cup_in_the_wild | 5 | 3.207 | 0.171 | 0.0000111 |
| lerobot/umi_cup_in_the_wild | 10 | 4.818 | 0.212 | 0.0000153 |
| lerobot/umi_cup_in_the_wild | 15 | 7.329 | 0.261 | 0.0000218 |
| lerobot/umi_cup_in_the_wild | 20 | 11.361 | 0.312 | 0.0000317 |
| lerobot/umi_cup_in_the_wild | None | 14.932 | 0.339 | 0.0000397 |
| lerobot/umi_cup_in_the_wild | 25 | 17.741 | 0.297 | 0.0000452 |
| lerobot/umi_cup_in_the_wild | 30 | 27.983 | 0.406 | 0.0000629 |
| lerobot/umi_cup_in_the_wild | 40 | 82.449 | 0.468 | 0.0001184 |
| lerobot/umi_cup_in_the_wild | 50 | 186.145 | 0.515 | 0.0001879 |
### Loading 6 consecutive frames with no spacing (TDMPC setting)
**`decoder`**
| repo_id | decoder | pc_load_time | avg_per_pixel_l2_error |
**best**
| repo_id | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- |
| lerobot/pusht | <span style="color: #32CD32;">torchvision</span> | 0.386 | 0.0000117 |
| lerobot/pusht | ffmpegio | 0.008 | 0.0000117 |
| lerobot/pusht | torchaudio | 0.184 | 0.0000356 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">torchvision</span> | 0.448 | 0.0000178 |
| lerobot/umi_cup_in_the_wild | ffmpegio | 0.009 | 0.0000178 |
| lerobot/umi_cup_in_the_wild | torchaudio | 0.149 | 0.0000349 |
| lerobot/pusht | 3.646 | 0.188 | 0.0000443 |
| lerobot/umi_cup_in_the_wild | 14.932 | 0.339 | 0.0000397 |
### `2_frames`
**`pix_fmt`**
| repo_id | pix_fmt | pc_compression | pc_load_time | avg_per_pixel_l2_error |
| repo_id | pix_fmt | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- | --- |
| lerobot/pusht | yuv420p | 3.602 | 0.518 | 0.0000651 |
| lerobot/pusht | <span style="color: #32CD32;">yuv444p</span> | 3.213 | 0.401 | 0.0000117 |
| lerobot/umi_cup_in_the_wild | yuv420p | 8.879 | 0.578 | 0.0000334 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">yuv444p</span> | 8.517 | 0.479 | 0.0000178 |
| lerobot/pusht | yuv420p | 3.788 | 0.314 | 0.0000799 |
| lerobot/pusht | yuv444p | 3.646 | 0.303 | 0.0000496 |
| lerobot/umi_cup_in_the_wild | yuv420p | 14.391 | 0.642 | 0.0000503 |
| lerobot/umi_cup_in_the_wild | yuv444p | 14.932 | 0.529 | 0.0000436 |
**`g`**
| repo_id | g | pc_compression | pc_load_time | avg_per_pixel_l2_error |
| repo_id | g | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- | --- |
| lerobot/pusht | 1 | 1.308 | 0.528 | 0.0000152 |
| lerobot/pusht | 5 | 2.739 | 0.483 | 0.0000124 |
| lerobot/pusht | 10 | 3.213 | 0.396 | 0.0000117 |
| lerobot/pusht | 15 | 3.460 | 0.379 | 0.0000118 |
| lerobot/pusht | 20 | 3.559 | 0.319 | 0.0000114 |
| lerobot/pusht | 30 | 3.697 | 0.278 | 0.0000116 |
| lerobot/pusht | 40 | 3.763 | 0.243 | 0.0000115 |
| lerobot/pusht | 60 | 3.925 | 0.186 | 0.0000118 |
| lerobot/pusht | 100 | 4.010 | 0.156 | 0.0000119 |
| lerobot/pusht | <span style="color: #32CD32;">None</span> | 4.058 | 0.105 | 0.0000121 |
| lerobot/umi_cup_in_the_wild | 1 | 4.790 | 0.605 | 0.0000221 |
| lerobot/umi_cup_in_the_wild | 5 | 7.707 | 0.533 | 0.0000183 |
| lerobot/umi_cup_in_the_wild | 10 | 8.517 | 0.469 | 0.0000178 |
| lerobot/umi_cup_in_the_wild | 15 | 8.830 | 0.399 | 0.0000174 |
| lerobot/umi_cup_in_the_wild | 20 | 8.961 | 0.382 | 0.0000175 |
| lerobot/umi_cup_in_the_wild | 30 | 8.850 | 0.326 | 0.0000172 |
| lerobot/umi_cup_in_the_wild | 40 | 8.996 | 0.279 | 0.0000173 |
| lerobot/umi_cup_in_the_wild | 60 | 9.113 | 0.226 | 0.0000174 |
| lerobot/umi_cup_in_the_wild | 100 | 9.278 | 0.150 | 0.0000175 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">None</span> | 9.396 | 0.076 | 0.0000176 |
| lerobot/pusht | 1 | 2.543 | 0.308 | 0.0000599 |
| lerobot/pusht | 2 | 3.646 | 0.279 | 0.0000496 |
| lerobot/pusht | 3 | 4.431 | 0.259 | 0.0000498 |
| lerobot/pusht | 4 | 5.103 | 0.243 | 0.0000501 |
| lerobot/pusht | 5 | 5.625 | 0.235 | 0.0000492 |
| lerobot/pusht | 6 | 5.974 | 0.230 | 0.0000481 |
| lerobot/pusht | 10 | 6.814 | 0.194 | 0.0000468 |
| lerobot/pusht | 15 | 7.431 | 0.152 | 0.0000460 |
| lerobot/pusht | 20 | 7.662 | 0.151 | 0.0000455 |
| lerobot/pusht | 40 | 8.163 | 0.095 | 0.0000454 |
| lerobot/pusht | 100 | 8.761 | 0.062 | 0.0000472 |
| lerobot/pusht | None | 8.909 | 0.037 | 0.0000479 |
| lerobot/umi_cup_in_the_wild | 1 | 14.411 | 0.638 | 0.0000625 |
| lerobot/umi_cup_in_the_wild | 2 | 14.932 | 0.537 | 0.0000436 |
| lerobot/umi_cup_in_the_wild | 3 | 20.174 | 0.493 | 0.0000437 |
| lerobot/umi_cup_in_the_wild | 4 | 24.889 | 0.458 | 0.0000446 |
| lerobot/umi_cup_in_the_wild | 5 | 28.825 | 0.438 | 0.0000445 |
| lerobot/umi_cup_in_the_wild | 6 | 31.635 | 0.424 | 0.0000444 |
| lerobot/umi_cup_in_the_wild | 10 | 39.418 | 0.345 | 0.0000435 |
| lerobot/umi_cup_in_the_wild | 15 | 44.577 | 0.313 | 0.0000417 |
| lerobot/umi_cup_in_the_wild | 20 | 47.907 | 0.264 | 0.0000421 |
| lerobot/umi_cup_in_the_wild | 40 | 52.554 | 0.185 | 0.0000414 |
| lerobot/umi_cup_in_the_wild | 100 | 58.241 | 0.090 | 0.0000420 |
| lerobot/umi_cup_in_the_wild | None | 60.530 | 0.042 | 0.0000424 |
**`crf`**
| repo_id | crf | pc_compression | pc_load_time | avg_per_pixel_l2_error |
| repo_id | crf | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- | --- |
| lerobot/pusht | 0 | 4.529 | 0.108 | 0.0000035 |
| lerobot/pusht | 5 | 3.138 | 0.099 | 0.0000077 |
| lerobot/pusht | 10 | 4.058 | 0.091 | 0.0000121 |
| lerobot/pusht | 15 | 5.407 | 0.095 | 0.0000195 |
| lerobot/pusht | 20 | 7.335 | 0.100 | 0.0000318 |
| lerobot/pusht | <span style="color: #32CD32;">None</span> | 8.909 | 0.102 | 0.0000422 |
| lerobot/pusht | 25 | 10.213 | 0.102 | 0.0000517 |
| lerobot/pusht | 30 | 14.516 | 0.104 | 0.0000795 |
| lerobot/pusht | 40 | 23.546 | 0.106 | 0.0001555 |
| lerobot/pusht | 50 | 28.460 | 0.110 | 0.0002723 |
| lerobot/umi_cup_in_the_wild | 0 | 2.318 | 0.032 | 0.0000056 |
| lerobot/umi_cup_in_the_wild | 5 | 4.899 | 0.052 | 0.0000127 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">10</span> | 9.396 | 0.073 | 0.0000176 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">15</span> | 19.161 | 0.097 | 0.0000234 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">20</span> | 39.311 | 0.110 | 0.0000321 |
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">None</span> | 60.530 | 0.117 | 0.0000393 |
| lerobot/umi_cup_in_the_wild | 25 | 81.048 | 0.126 | 0.0000446 |
| lerobot/umi_cup_in_the_wild | 30 | 165.189 | 0.138 | 0.0000603 |
| lerobot/umi_cup_in_the_wild | 40 | 544.478 | 0.151 | 0.0001095 |
| lerobot/umi_cup_in_the_wild | 50 | 1109.556 | 0.167 | 0.0001817 |
| lerobot/pusht | 0 | 1.699 | 0.302 | 0.0000097 |
| lerobot/pusht | 5 | 1.409 | 0.287 | 0.0000142 |
| lerobot/pusht | 10 | 1.842 | 0.283 | 0.0000184 |
| lerobot/pusht | 15 | 2.322 | 0.305 | 0.0000268 |
| lerobot/pusht | 20 | 3.050 | 0.285 | 0.0000402 |
| lerobot/pusht | None | 3.646 | 0.285 | 0.0000496 |
| lerobot/pusht | 25 | 3.969 | 0.293 | 0.0000572 |
| lerobot/pusht | 30 | 5.687 | 0.293 | 0.0000893 |
| lerobot/pusht | 40 | 10.818 | 0.319 | 0.0001762 |
| lerobot/pusht | 50 | 18.185 | 0.304 | 0.0002626 |
| lerobot/umi_cup_in_the_wild | 0 | 1.918 | 0.235 | 0.0000112 |
| lerobot/umi_cup_in_the_wild | 5 | 3.207 | 0.261 | 0.0000166 |
| lerobot/umi_cup_in_the_wild | 10 | 4.818 | 0.333 | 0.0000207 |
| lerobot/umi_cup_in_the_wild | 15 | 7.329 | 0.406 | 0.0000267 |
| lerobot/umi_cup_in_the_wild | 20 | 11.361 | 0.489 | 0.0000361 |
| lerobot/umi_cup_in_the_wild | None | 14.932 | 0.537 | 0.0000436 |
| lerobot/umi_cup_in_the_wild | 25 | 17.741 | 0.578 | 0.0000487 |
| lerobot/umi_cup_in_the_wild | 30 | 27.983 | 0.453 | 0.0000655 |
| lerobot/umi_cup_in_the_wild | 40 | 82.449 | 0.767 | 0.0001192 |
| lerobot/umi_cup_in_the_wild | 50 | 186.145 | 0.816 | 0.0001881 |
**best**
| repo_id | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- |
| lerobot/pusht | 3.646 | 0.283 | 0.0000496 |
| lerobot/umi_cup_in_the_wild | 14.932 | 0.543 | 0.0000436 |
### `2_frames_4_space`
**`pix_fmt`**
| repo_id | pix_fmt | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- | --- |
| lerobot/pusht | yuv420p | 3.788 | 0.257 | 0.0000855 |
| lerobot/pusht | yuv444p | 3.646 | 0.261 | 0.0000556 |
| lerobot/umi_cup_in_the_wild | yuv420p | 14.391 | 0.493 | 0.0000476 |
| lerobot/umi_cup_in_the_wild | yuv444p | 14.932 | 0.371 | 0.0000404 |
**`g`**
| repo_id | g | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- | --- |
| lerobot/pusht | 1 | 2.543 | 0.226 | 0.0000670 |
| lerobot/pusht | 2 | 3.646 | 0.222 | 0.0000556 |
| lerobot/pusht | 3 | 4.431 | 0.217 | 0.0000567 |
| lerobot/pusht | 4 | 5.103 | 0.204 | 0.0000555 |
| lerobot/pusht | 5 | 5.625 | 0.179 | 0.0000556 |
| lerobot/pusht | 6 | 5.974 | 0.188 | 0.0000544 |
| lerobot/pusht | 10 | 6.814 | 0.160 | 0.0000531 |
| lerobot/pusht | 15 | 7.431 | 0.150 | 0.0000521 |
| lerobot/pusht | 20 | 7.662 | 0.123 | 0.0000519 |
| lerobot/pusht | 40 | 8.163 | 0.092 | 0.0000519 |
| lerobot/pusht | 100 | 8.761 | 0.053 | 0.0000533 |
| lerobot/pusht | None | 8.909 | 0.034 | 0.0000541 |
| lerobot/umi_cup_in_the_wild | 1 | 14.411 | 0.409 | 0.0000607 |
| lerobot/umi_cup_in_the_wild | 2 | 14.932 | 0.381 | 0.0000404 |
| lerobot/umi_cup_in_the_wild | 3 | 20.174 | 0.355 | 0.0000418 |
| lerobot/umi_cup_in_the_wild | 4 | 24.889 | 0.346 | 0.0000425 |
| lerobot/umi_cup_in_the_wild | 5 | 28.825 | 0.354 | 0.0000419 |
| lerobot/umi_cup_in_the_wild | 6 | 31.635 | 0.336 | 0.0000419 |
| lerobot/umi_cup_in_the_wild | 10 | 39.418 | 0.314 | 0.0000402 |
| lerobot/umi_cup_in_the_wild | 15 | 44.577 | 0.269 | 0.0000397 |
| lerobot/umi_cup_in_the_wild | 20 | 47.907 | 0.246 | 0.0000395 |
| lerobot/umi_cup_in_the_wild | 40 | 52.554 | 0.171 | 0.0000390 |
| lerobot/umi_cup_in_the_wild | 100 | 58.241 | 0.091 | 0.0000399 |
| lerobot/umi_cup_in_the_wild | None | 60.530 | 0.043 | 0.0000409 |
**`crf`**
| repo_id | crf | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- | --- |
| lerobot/pusht | 0 | 1.699 | 0.212 | 0.0000193 |
| lerobot/pusht | 5 | 1.409 | 0.211 | 0.0000232 |
| lerobot/pusht | 10 | 1.842 | 0.199 | 0.0000270 |
| lerobot/pusht | 15 | 2.322 | 0.198 | 0.0000347 |
| lerobot/pusht | 20 | 3.050 | 0.211 | 0.0000469 |
| lerobot/pusht | None | 3.646 | 0.206 | 0.0000556 |
| lerobot/pusht | 25 | 3.969 | 0.210 | 0.0000626 |
| lerobot/pusht | 30 | 5.687 | 0.223 | 0.0000927 |
| lerobot/pusht | 40 | 10.818 | 0.227 | 0.0001763 |
| lerobot/pusht | 50 | 18.185 | 0.223 | 0.0002625 |
| lerobot/umi_cup_in_the_wild | 0 | 1.918 | 0.147 | 0.0000071 |
| lerobot/umi_cup_in_the_wild | 5 | 3.207 | 0.182 | 0.0000125 |
| lerobot/umi_cup_in_the_wild | 10 | 4.818 | 0.222 | 0.0000166 |
| lerobot/umi_cup_in_the_wild | 15 | 7.329 | 0.270 | 0.0000229 |
| lerobot/umi_cup_in_the_wild | 20 | 11.361 | 0.325 | 0.0000326 |
| lerobot/umi_cup_in_the_wild | None | 14.932 | 0.362 | 0.0000404 |
| lerobot/umi_cup_in_the_wild | 25 | 17.741 | 0.390 | 0.0000459 |
| lerobot/umi_cup_in_the_wild | 30 | 27.983 | 0.437 | 0.0000633 |
| lerobot/umi_cup_in_the_wild | 40 | 82.449 | 0.499 | 0.0001186 |
| lerobot/umi_cup_in_the_wild | 50 | 186.145 | 0.564 | 0.0001879 |
**best**
| repo_id | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- |
| lerobot/pusht | 3.646 | 0.224 | 0.0000556 |
| lerobot/umi_cup_in_the_wild | 14.932 | 0.368 | 0.0000404 |
### `6_frames`
**`pix_fmt`**
| repo_id | pix_fmt | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- | --- |
| lerobot/pusht | yuv420p | 3.788 | 0.660 | 0.0000839 |
| lerobot/pusht | yuv444p | 3.646 | 0.546 | 0.0000542 |
| lerobot/umi_cup_in_the_wild | yuv420p | 14.391 | 1.225 | 0.0000497 |
| lerobot/umi_cup_in_the_wild | yuv444p | 14.932 | 0.908 | 0.0000428 |
**`g`**
| repo_id | g | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- | --- |
| lerobot/pusht | 1 | 2.543 | 0.552 | 0.0000646 |
| lerobot/pusht | 2 | 3.646 | 0.534 | 0.0000542 |
| lerobot/pusht | 3 | 4.431 | 0.563 | 0.0000546 |
| lerobot/pusht | 4 | 5.103 | 0.537 | 0.0000545 |
| lerobot/pusht | 5 | 5.625 | 0.477 | 0.0000532 |
| lerobot/pusht | 6 | 5.974 | 0.515 | 0.0000530 |
| lerobot/pusht | 10 | 6.814 | 0.410 | 0.0000512 |
| lerobot/pusht | 15 | 7.431 | 0.405 | 0.0000503 |
| lerobot/pusht | 20 | 7.662 | 0.345 | 0.0000500 |
| lerobot/pusht | 40 | 8.163 | 0.247 | 0.0000496 |
| lerobot/pusht | 100 | 8.761 | 0.147 | 0.0000510 |
| lerobot/pusht | None | 8.909 | 0.100 | 0.0000519 |
| lerobot/umi_cup_in_the_wild | 1 | 14.411 | 0.997 | 0.0000620 |
| lerobot/umi_cup_in_the_wild | 2 | 14.932 | 0.911 | 0.0000428 |
| lerobot/umi_cup_in_the_wild | 3 | 20.174 | 0.869 | 0.0000433 |
| lerobot/umi_cup_in_the_wild | 4 | 24.889 | 0.874 | 0.0000438 |
| lerobot/umi_cup_in_the_wild | 5 | 28.825 | 0.864 | 0.0000439 |
| lerobot/umi_cup_in_the_wild | 6 | 31.635 | 0.834 | 0.0000440 |
| lerobot/umi_cup_in_the_wild | 10 | 39.418 | 0.781 | 0.0000421 |
| lerobot/umi_cup_in_the_wild | 15 | 44.577 | 0.679 | 0.0000411 |
| lerobot/umi_cup_in_the_wild | 20 | 47.907 | 0.652 | 0.0000410 |
| lerobot/umi_cup_in_the_wild | 40 | 52.554 | 0.465 | 0.0000404 |
| lerobot/umi_cup_in_the_wild | 100 | 58.241 | 0.245 | 0.0000413 |
| lerobot/umi_cup_in_the_wild | None | 60.530 | 0.116 | 0.0000417 |
**`crf`**
| repo_id | crf | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- | --- |
| lerobot/pusht | 0 | 1.699 | 0.534 | 0.0000163 |
| lerobot/pusht | 5 | 1.409 | 0.524 | 0.0000205 |
| lerobot/pusht | 10 | 1.842 | 0.510 | 0.0000245 |
| lerobot/pusht | 15 | 2.322 | 0.512 | 0.0000324 |
| lerobot/pusht | 20 | 3.050 | 0.508 | 0.0000452 |
| lerobot/pusht | None | 3.646 | 0.518 | 0.0000542 |
| lerobot/pusht | 25 | 3.969 | 0.534 | 0.0000616 |
| lerobot/pusht | 30 | 5.687 | 0.530 | 0.0000927 |
| lerobot/pusht | 40 | 10.818 | 0.552 | 0.0001777 |
| lerobot/pusht | 50 | 18.185 | 0.564 | 0.0002644 |
| lerobot/umi_cup_in_the_wild | 0 | 1.918 | 0.401 | 0.0000101 |
| lerobot/umi_cup_in_the_wild | 5 | 3.207 | 0.499 | 0.0000156 |
| lerobot/umi_cup_in_the_wild | 10 | 4.818 | 0.599 | 0.0000197 |
| lerobot/umi_cup_in_the_wild | 15 | 7.329 | 0.704 | 0.0000258 |
| lerobot/umi_cup_in_the_wild | 20 | 11.361 | 0.834 | 0.0000352 |
| lerobot/umi_cup_in_the_wild | None | 14.932 | 0.925 | 0.0000428 |
| lerobot/umi_cup_in_the_wild | 25 | 17.741 | 0.978 | 0.0000480 |
| lerobot/umi_cup_in_the_wild | 30 | 27.983 | 1.088 | 0.0000648 |
| lerobot/umi_cup_in_the_wild | 40 | 82.449 | 1.324 | 0.0001190 |
| lerobot/umi_cup_in_the_wild | 50 | 186.145 | 1.436 | 0.0001880 |
**best**
| repo_id | compression_factor | load_time_factor | avg_per_pixel_l2_error |
| --- | --- | --- | --- |
| lerobot/pusht | 3.646 | 0.546 | 0.0000542 |
| lerobot/umi_cup_in_the_wild | 14.932 | 0.934 | 0.0000428 |

View File

@ -31,7 +31,12 @@ def get_directory_size(directory):
return total_size
def run_video_benchmark(output_dir, cfg, seed=1337, timestamps_mode="diffusion"):
def run_video_benchmark(
output_dir,
cfg,
timestamps_mode,
seed=1337,
):
output_dir = Path(output_dir)
if output_dir.exists():
shutil.rmtree(output_dir)
@ -73,19 +78,20 @@ def run_video_benchmark(output_dir, cfg, seed=1337, timestamps_mode="diffusion")
crf = cfg.get("crf")
pix_fmt = cfg["pix_fmt"]
ffmpeg_cmd = ""
ffmpeg_cmd += f"ffmpeg -r {fps} -f image2 "
ffmpeg_cmd += f"-i {str(imgs_dir / 'frame_%06d.png')} "
ffmpeg_cmd += "-vcodec libx264 "
cmd = f"ffmpeg -r {fps} "
cmd += "-f image2 "
cmd += "-loglevel error "
cmd += f"-i {str(imgs_dir / 'frame_%06d.png')} "
cmd += "-vcodec libx264 "
if g is not None:
ffmpeg_cmd += f"-g {g} " # ensures at least 1 keyframe every 10 frames
# ffmpeg_cmd += "-keyint_min 10 " set a minimum of 10 frames between 2 key frames
# ffmpeg_cmd += "-sc_threshold 0 " disable scene change detection to lower the number of key frames
cmd += f"-g {g} " # ensures at least 1 keyframe every 10 frames
# cmd += "-keyint_min 10 " set a minimum of 10 frames between 2 key frames
# cmd += "-sc_threshold 0 " disable scene change detection to lower the number of key frames
if crf is not None:
ffmpeg_cmd += f"-crf {crf} "
ffmpeg_cmd += f"-pix_fmt {pix_fmt} "
ffmpeg_cmd += f"{str(video_path)}"
subprocess.run(ffmpeg_cmd.split(" "), check=True)
cmd += f"-crf {crf} "
cmd += f"-pix_fmt {pix_fmt} "
cmd += f"{str(video_path)}"
subprocess.run(cmd.split(" "), check=True)
video_size_bytes = video_path.stat().st_size
@ -127,18 +133,23 @@ def run_video_benchmark(output_dir, cfg, seed=1337, timestamps_mode="diffusion")
# test loading 2 frames that are 4 frames appart, which might be a common setting
ts = random.randint(fps, ep_num_images - fps) / fps
if timestamps_mode == "diffusion":
prev_ts = round(ts - 4 / fps, 4)
timestamps = [prev_ts, ts]
elif timestamps_mode == "tdmpc":
timestamps = [round(ts - i / fps, 4) for i in range(6)][::-1]
if timestamps_mode == "1_frame":
timestamps = [ts]
elif timestamps_mode == "2_frames":
timestamps = [ts - 1 / fps, ts]
elif timestamps_mode == "2_frames_4_space":
timestamps = [ts - 4 / fps, ts]
elif timestamps_mode == "6_frames":
timestamps = [ts - i / fps for i in range(6)][::-1]
else:
raise ValueError(timestamps_mode)
num_frames = len(timestamps)
start_time_s = time.monotonic()
frames = decode_frames_fn(video_path, timestamps=timestamps, device=device, **decoder_kwgs)
frames = decode_frames_fn(
video_path, timestamps=timestamps, tolerance_s=1e-4, device=device, **decoder_kwgs
)
avg_load_time = (time.monotonic() - start_time_s) / num_frames
list_avg_load_time.append(avg_load_time)
@ -177,160 +188,200 @@ def run_video_benchmark(output_dir, cfg, seed=1337, timestamps_mode="diffusion")
"video_size_bytes": video_size_bytes,
"avg_load_time_from_images": avg_load_time_from_images,
"avg_load_time": avg_load_time,
"pc_compression": sum_original_frames_size_bytes / video_size_bytes,
"pc_load_time": avg_load_time_from_images / avg_load_time,
"compression_factor": sum_original_frames_size_bytes / video_size_bytes,
"load_time_factor": avg_load_time_from_images / avg_load_time,
"avg_per_pixel_l2_error": avg_per_pixel_l2_error,
}
for key in info:
print(key, info[key])
with open(output_dir / "info.json", "w") as f:
json.dump(info, f)
return info
def display_markdown_table(headers, rows):
for i, row in enumerate(rows):
new_row = []
for col in row:
if col is None:
new_col = "None"
elif isinstance(col, float):
new_col = f"{col:.3f}"
if new_col == "0.000":
new_col = f"{col:.7f}"
elif isinstance(col, int):
new_col = f"{col}"
else:
new_col = col
new_row.append(new_col)
rows[i] = new_row
header_line = "| " + " | ".join(headers) + " |"
separator_line = "| " + " | ".join(["---" for _ in headers]) + " |"
body_lines = ["| " + " | ".join(row) + " |" for row in rows]
markdown_table = "\n".join([header_line, separator_line] + body_lines)
print(markdown_table)
print()
def load_info(out_dir):
with open(out_dir / "info.json") as f:
info = json.load(f)
return info
def main():
dry_run = True
dry_run = False
repo_ids = ["lerobot/pusht", "lerobot/umi_cup_in_the_wild"]
timestamps_modes = [
"1_frame",
"2_frames",
"2_frames_4_space",
"6_frames",
]
for timestamps_mode in timestamps_modes:
bench_dir = Path(f"tmp/2024_05_01_{timestamps_mode}")
bench_dir = Path("tmp/2024_04_29_1049_6_timestamps")
def display_markdown_table(headers, rows):
for i, row in enumerate(rows):
new_row = []
for col in row:
if col is None:
new_col = "None"
elif isinstance(col, float):
new_col = f"{col:.3f}"
if new_col == "0.000":
new_col = f"{col:.7f}"
elif isinstance(col, int):
new_col = f"{col}"
else:
new_col = col
new_row.append(new_col)
rows[i] = new_row
header_line = "| " + " | ".join(headers) + " |"
separator_line = "| " + " | ".join(["---" for _ in headers]) + " |"
body_lines = ["| " + " | ".join(row) + " |" for row in rows]
markdown_table = "\n".join([header_line, separator_line] + body_lines)
print(markdown_table)
print(f"### `{timestamps_mode}`")
print()
def load_info(out_dir):
with open(out_dir / "info.json") as f:
info = json.load(f)
return info
# print("**`decoder`**")
# headers = ["repo_id", "decoder", "load_time_factor", "avg_per_pixel_l2_error"]
# rows = []
# for repo_id in repo_ids:
# for decoder in ["torchvision", "ffmpegio", "torchaudio"]:
# cfg = {
# "repo_id": repo_id,
# # video encoding
# "pix_fmt": "yuv444p",
# # video decoding
# "device": "cpu",
# "decoder": decoder,
# "decoder_kwgs": {},
# }
repo_ids = ["lerobot/pusht", "lerobot/umi_cup_in_the_wild"]
# if not dry_run:
# run_video_benchmark(bench_dir / repo_id / decoder, cfg, timestamps_mode)
# info = load_info(bench_dir / repo_id / decoder)
# rows.append([repo_id, decoder, info["load_time_factor"], info["avg_per_pixel_l2_error"]])
# display_markdown_table(headers, rows)
# torchvision vs ffmpegio vs torchaudio
print("**`pix_fmt`**")
headers = ["repo_id", "pix_fmt", "compression_factor", "load_time_factor", "avg_per_pixel_l2_error"]
rows = []
for repo_id in repo_ids:
for pix_fmt in ["yuv420p", "yuv444p"]:
cfg = {
"repo_id": repo_id,
# video encoding
"g": 2,
"crf": None,
"pix_fmt": pix_fmt,
# video decoding
"device": "cpu",
"decoder": "torchvision",
"decoder_kwgs": {},
}
if not dry_run:
run_video_benchmark(bench_dir / repo_id / f"torchvision_{pix_fmt}", cfg, timestamps_mode)
info = load_info(bench_dir / repo_id / f"torchvision_{pix_fmt}")
rows.append(
[
repo_id,
pix_fmt,
info["compression_factor"],
info["load_time_factor"],
info["avg_per_pixel_l2_error"],
]
)
display_markdown_table(headers, rows)
headers = ["repo_id", "decoder", "pc_load_time", "avg_per_pixel_l2_error"]
rows = []
for repo_id in repo_ids:
for decoder in ["torchvision", "ffmpegio", "torchaudio"]:
print("**`g`**")
headers = ["repo_id", "g", "compression_factor", "load_time_factor", "avg_per_pixel_l2_error"]
rows = []
for repo_id in repo_ids:
for g in [1, 2, 3, 4, 5, 6, 10, 15, 20, 40, 100, None]:
cfg = {
"repo_id": repo_id,
# video encoding
"g": g,
"pix_fmt": "yuv444p",
# video decoding
"device": "cpu",
"decoder": "torchvision",
"decoder_kwgs": {},
}
if not dry_run:
run_video_benchmark(bench_dir / repo_id / f"torchvision_g_{g}", cfg, timestamps_mode)
info = load_info(bench_dir / repo_id / f"torchvision_g_{g}")
rows.append(
[
repo_id,
g,
info["compression_factor"],
info["load_time_factor"],
info["avg_per_pixel_l2_error"],
]
)
display_markdown_table(headers, rows)
print("**`crf`**")
headers = ["repo_id", "crf", "compression_factor", "load_time_factor", "avg_per_pixel_l2_error"]
rows = []
for repo_id in repo_ids:
for crf in [0, 5, 10, 15, 20, None, 25, 30, 40, 50]:
cfg = {
"repo_id": repo_id,
# video encoding
"g": 2,
"crf": crf,
"pix_fmt": "yuv444p",
# video decoding
"device": "cpu",
"decoder": "torchvision",
"decoder_kwgs": {},
}
if not dry_run:
run_video_benchmark(bench_dir / repo_id / f"torchvision_crf_{crf}", cfg, timestamps_mode)
info = load_info(bench_dir / repo_id / f"torchvision_crf_{crf}")
rows.append(
[
repo_id,
crf,
info["compression_factor"],
info["load_time_factor"],
info["avg_per_pixel_l2_error"],
]
)
display_markdown_table(headers, rows)
print("**best**")
headers = ["repo_id", "compression_factor", "load_time_factor", "avg_per_pixel_l2_error"]
rows = []
for repo_id in repo_ids:
cfg = {
"repo_id": repo_id,
# video encoding
"g": 10,
"crf": 10,
"g": 2,
"crf": None,
"pix_fmt": "yuv444p",
# video decoding
"device": "cpu",
"decoder": decoder,
"decoder_kwgs": {},
}
if not dry_run:
run_video_benchmark(bench_dir / repo_id / decoder, cfg=cfg)
info = load_info(bench_dir / repo_id / decoder)
rows.append([repo_id, decoder, info["pc_load_time"], info["avg_per_pixel_l2_error"]])
display_markdown_table(headers, rows)
# yuv444p vs yuv420p
headers = ["repo_id", "pix_fmt", "pc_compression", "pc_load_time", "avg_per_pixel_l2_error"]
rows = []
for repo_id in repo_ids:
for pix_fmt in ["yuv420p", "yuv444p"]:
cfg = {
"repo_id": repo_id,
# video encoding
"g": 10,
"crf": 10,
"pix_fmt": pix_fmt,
# video decoding
"device": "cpu",
"decoder": "torchvision",
"decoder_kwgs": {},
}
if not dry_run:
run_video_benchmark(bench_dir / repo_id / f"torchvision_{pix_fmt}", cfg=cfg)
info = load_info(bench_dir / repo_id / f"torchvision_{pix_fmt}")
run_video_benchmark(bench_dir / repo_id / "torchvision_best", cfg, timestamps_mode)
info = load_info(bench_dir / repo_id / "torchvision_best")
rows.append(
[
repo_id,
pix_fmt,
info["pc_compression"],
info["pc_load_time"],
info["compression_factor"],
info["load_time_factor"],
info["avg_per_pixel_l2_error"],
]
)
display_markdown_table(headers, rows)
# g
headers = ["repo_id", "g", "pc_compression", "pc_load_time", "avg_per_pixel_l2_error"]
rows = []
for repo_id in repo_ids:
for g in [1, 5, 10, 15, 20, 30, 40, 60, 100, None]:
cfg = {
"repo_id": repo_id,
# video encoding
"g": g,
"crf": 10,
"pix_fmt": "yuv444p",
# video decoding
"device": "cpu",
"decoder": "torchvision",
"decoder_kwgs": {},
}
if not dry_run:
run_video_benchmark(bench_dir / repo_id / f"torchvision_g_{g}", cfg=cfg)
info = load_info(bench_dir / repo_id / f"torchvision_g_{g}")
rows.append(
[repo_id, g, info["pc_compression"], info["pc_load_time"], info["avg_per_pixel_l2_error"]]
)
display_markdown_table(headers, rows)
# crf
headers = ["repo_id", "crf", "pc_compression", "pc_load_time", "avg_per_pixel_l2_error"]
rows = []
for repo_id in repo_ids:
for crf in [0, 5, 10, 15, 20, None, 25, 30, 40, 50]:
cfg = {
"repo_id": repo_id,
# video encoding
"g": None,
"crf": crf,
"pix_fmt": "yuv444p",
# video decoding
"device": "cpu",
"decoder": "torchvision",
"decoder_kwgs": {},
}
if not dry_run:
run_video_benchmark(bench_dir / repo_id / f"torchvision_crf_{crf}", cfg=cfg)
info = load_info(bench_dir / repo_id / f"torchvision_crf_{crf}")
rows.append(
[repo_id, crf, info["pc_compression"], info["pc_load_time"], info["avg_per_pixel_l2_error"]]
)
display_markdown_table(headers, rows)
display_markdown_table(headers, rows)
if __name__ == "__main__":

View File

@ -70,7 +70,7 @@ def compute_stats(dataset: LeRobotDataset | datasets.Dataset, batch_size=32, max
generator.manual_seed(seed)
dataloader = torch.utils.data.DataLoader(
dataset,
num_workers=4,
num_workers=16,
batch_size=batch_size,
shuffle=True,
drop_last=False,

View File

@ -216,9 +216,14 @@ def load_previous_and_future_frames(
# load frames modality
item[key] = hf_dataset.select_columns(key)[data_ids][key]
item[key] = torch.stack(item[key])
if isinstance(item[key][0], dict) and "path" in item[key][0]:
# video mode where frame are expressed as dict of path and timestamp
item[key] = item[key]
else:
item[key] = torch.stack(item[key])
item[f"{key}_is_pad"] = is_pad
item[f"{key}_timestamp"] = query_ts
return item

View File

@ -1,5 +1,6 @@
import logging
import subprocess
import warnings
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, ClassVar
@ -26,7 +27,7 @@ def load_from_videos(
# load multiple frames at once (expected when delta_timestamps is not None)
timestamps = [frame["timestamp"] for frame in item[key]]
paths = [frame["path"] for frame in item[key]]
if len(set(paths)) == 1:
if len(set(paths)) > 1:
raise NotImplementedError("All video paths are expected to be the same for now.")
video_path = data_dir / paths[0]
@ -61,9 +62,11 @@ def decode_video_frames_torchvision(
video_path = str(video_path)
# set backend
keyframes_only = False
if device == "cpu":
# explicitely use pyav
torchvision.set_video_backend("pyav")
keyframes_only = True # pyav doesnt support accuracte seek
elif device == "cuda":
# TODO(rcadene, aliberts): implement video decoding with GPU
# torchvision.set_video_backend("cuda")
@ -86,7 +89,7 @@ def decode_video_frames_torchvision(
# access closest key frame of the first requested frame
# Note: closest key frame timestamp is usally smaller than `first_ts` (e.g. key frame can be the first frame of the video)
# for details on what `seek` is doing see: https://pyav.basswood-io.com/docs/stable/api/container.html?highlight=inputcontainer#av.container.InputContainer.seek
reader.seek(first_ts)
reader.seek(first_ts, keyframes_only=keyframes_only)
# load all frames until last requested frame
loaded_frames = []
@ -130,7 +133,7 @@ def decode_video_frames_torchvision(
def encode_video_frames(imgs_dir: Path, video_path: Path, fps: int):
# For more info this setting, see: `lerobot/common/datasets/_video_benchmark/README.md`
"""More info on ffmpeg arguments tuning on `lerobot/common/datasets/_video_benchmark/README.md`"""
video_path = Path(video_path)
video_path.parent.mkdir(parents=True, exist_ok=True)
@ -140,6 +143,7 @@ def encode_video_frames(imgs_dir: Path, video_path: Path, fps: int):
"-loglevel error "
f"-i {str(imgs_dir / 'frame_%06d.png')} "
"-vcodec libx264 "
"-g 2 "
"-pix_fmt yuv444p "
f"{str(video_path)}"
)
@ -168,5 +172,11 @@ class VideoFrame:
return self.pa_type
# to make it available in HuggingFace `datasets`
register_feature(VideoFrame, "VideoFrame")
with warnings.catch_warnings():
warnings.filterwarnings(
"ignore",
"'register_feature' is experimental and might be subject to breaking changes in the future.",
category=UserWarning,
)
# to make VideoFrame available in HuggingFace `datasets`
register_feature(VideoFrame, "VideoFrame")

View File

@ -1,3 +1,6 @@
# TODO(rcadene, alexander-soare): clean this file
"""Borrowed from https://github.com/fyhMer/fowm/blob/main/src/logger.py"""
import logging
import os
from pathlib import Path

View File

@ -350,7 +350,7 @@ def train(cfg: dict, out_dir=None, job_name=None):
# create dataloader for offline training
dataloader = torch.utils.data.DataLoader(
offline_dataset,
num_workers=4,
num_workers=8,
batch_size=cfg.policy.batch_size,
shuffle=True,
pin_memory=cfg.device != "cpu",