Use -g 2, Fix delta_timestamps, Redo benchmark
This commit is contained in:
parent
a00102b643
commit
370fb5348e
|
@ -19,10 +19,10 @@ How to decode videos?
|
||||||
## Metrics
|
## Metrics
|
||||||
|
|
||||||
**Percentage of data compression (higher is better)**
|
**Percentage of data compression (higher is better)**
|
||||||
`pc_compression` is the ratio of the memory space on disk taken by the original images to encode, to the memory space taken by the encoded video. For instance, `pc_compression=400%` means that the video takes 4 times less memory space on disk compared to the original images.
|
`compression_factor` is the ratio of the memory space on disk taken by the original images to encode, to the memory space taken by the encoded video. For instance, `compression_factor=4` means that the video takes 4 times less memory space on disk compared to the original images.
|
||||||
|
|
||||||
**Percentage of loading time (lower is better)**
|
**Percentage of loading time (higher is better)**
|
||||||
`pc_load_time` is the ratio of the time it takes to load original images at given timestamps, to the time it takes to decode the exact same frames from the video. Lower is better. For instance, `pc_load_time=120%` means that decoding from video is a bit slower than loading the original images.
|
`load_time_factor` is the ratio of the time it takes to load original images at given timestamps, to the time it takes to decode the exact same frames from the video. Higher is better. For instance, `load_time_factor=0.5` means that decoding from video is 2 times slower than loading the original images.
|
||||||
|
|
||||||
**Average L2 error per pixel (lower is better)**
|
**Average L2 error per pixel (lower is better)**
|
||||||
`avg_per_pixel_l2_error` is the average L2 error between each decoded frame and its corresponding original image over all requested timestamps, and also divided by the number of pixels in the image to be comparable when switching to different image sizes.
|
`avg_per_pixel_l2_error` is the average L2 error between each decoded frame and its corresponding original image over all requested timestamps, and also divided by the number of pixels in the image to be comparable when switching to different image sizes.
|
||||||
|
@ -40,7 +40,12 @@ How to decode videos?
|
||||||
We don't expect the same optimal settings for a dataset of images from a simulation, or from real-world in an appartment, or in a factory, or outdoor, etc. Hence, we run this bechmark on two datasets: `pusht` (simulation) and `umi` (real-world outdoor).
|
We don't expect the same optimal settings for a dataset of images from a simulation, or from real-world in an appartment, or in a factory, or outdoor, etc. Hence, we run this bechmark on two datasets: `pusht` (simulation) and `umi` (real-world outdoor).
|
||||||
|
|
||||||
**Requested timestamps**
|
**Requested timestamps**
|
||||||
In this benchmark, we focus on the loading time of random access, so we are not interested about sequentially loading all frames of a video like in a movie. However, the number of consecutive timestamps requested and their spacing can greatly affect the `pc_load_time`. In fact, it is expected to get faster loading time by decoding a large number of consecutive frames from a video, than to load the same data from individual images. To reflect our robotics use case, we consider a setting where we load 2 consecutive frames with 4 frames of spacing.
|
In this benchmark, we focus on the loading time of random access, so we are not interested about sequentially loading all frames of a video like in a movie. However, the number of consecutive timestamps requested and their spacing can greatly affect the `load_time_factor`. In fact, it is expected to get faster loading time by decoding a large number of consecutive frames from a video, than to load the same data from individual images. To reflect our robotics use case, we consider a few settings:
|
||||||
|
- `single_frame`: 1 frame,
|
||||||
|
- `2_frames`: 2 consecutive frames (e.g. `[t, t + 1 / fps]`),
|
||||||
|
- `2_frames_4_space`: 2 consecutive frames with 4 frames of spacing (e.g `[t, t + 4 / fps]`),
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
**Data augmentations**
|
**Data augmentations**
|
||||||
We might revisit this benchmark and find better settings if we train our policies with various data augmentations to make them more robusts (e.g. robust to color changes, compression, etc.).
|
We might revisit this benchmark and find better settings if we train our policies with various data augmentations to make them more robusts (e.g. robust to color changes, compression, etc.).
|
||||||
|
@ -48,10 +53,8 @@ We might revisit this benchmark and find better settings if we train our policie
|
||||||
|
|
||||||
## Results
|
## Results
|
||||||
|
|
||||||
### Loading 2 consecutive frames with 4 frames spacing (Diffusion Policy setting)
|
|
||||||
|
|
||||||
**`decoder`**
|
**`decoder`**
|
||||||
| repo_id | decoder | pc_load_time | avg_per_pixel_l2_error |
|
| repo_id | decoder | load_time_factor | avg_per_pixel_l2_error |
|
||||||
| --- | --- | --- | --- |
|
| --- | --- | --- | --- |
|
||||||
| lerobot/pusht | <span style="color: #32CD32;">torchvision</span> | 0.166 | 0.0000119 |
|
| lerobot/pusht | <span style="color: #32CD32;">torchvision</span> | 0.166 | 0.0000119 |
|
||||||
| lerobot/pusht | ffmpegio | 0.009 | 0.0001182 |
|
| lerobot/pusht | ffmpegio | 0.009 | 0.0001182 |
|
||||||
|
@ -60,127 +63,274 @@ We might revisit this benchmark and find better settings if we train our policie
|
||||||
| lerobot/umi_cup_in_the_wild | ffmpegio | 0.010 | 0.0000735 |
|
| lerobot/umi_cup_in_the_wild | ffmpegio | 0.010 | 0.0000735 |
|
||||||
| lerobot/umi_cup_in_the_wild | torchaudio | 0.154 | 0.0000340 |
|
| lerobot/umi_cup_in_the_wild | torchaudio | 0.154 | 0.0000340 |
|
||||||
|
|
||||||
|
### `1_frame`
|
||||||
|
|
||||||
**`pix_fmt`**
|
**`pix_fmt`**
|
||||||
| repo_id | pix_fmt | pc_compression | pc_load_time | avg_per_pixel_l2_error |
|
| repo_id | pix_fmt | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
| --- | --- | --- | --- | --- |
|
| --- | --- | --- | --- | --- |
|
||||||
| lerobot/pusht | yuv420p | 3.602 | 0.202 | 0.0000661 |
|
| lerobot/pusht | yuv420p | 3.788 | 0.224 | 0.0000760 |
|
||||||
| lerobot/pusht | <span style="color: #32CD32;">yuv444p</span> | 3.213 | 0.153 | 0.0000110 |
|
| lerobot/pusht | yuv444p | 3.646 | 0.185 | 0.0000443 |
|
||||||
| lerobot/umi_cup_in_the_wild | yuv420p | 8.879 | 0.202 | 0.0000332 |
|
| lerobot/umi_cup_in_the_wild | yuv420p | 14.391 | 0.388 | 0.0000469 |
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">yuv444p</span> | 8.517 | 0.165 | 0.0000175 |
|
| lerobot/umi_cup_in_the_wild | yuv444p | 14.932 | 0.329 | 0.0000397 |
|
||||||
|
|
||||||
**`g`**
|
**`g`**
|
||||||
| repo_id | g | pc_compression | pc_load_time | avg_per_pixel_l2_error |
|
| repo_id | g | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
| --- | --- | --- | --- | --- |
|
| --- | --- | --- | --- | --- |
|
||||||
| lerobot/pusht | 1 | 1.308 | 0.190 | 0.0000151 |
|
| lerobot/pusht | 1 | 2.543 | 0.204 | 0.0000556 |
|
||||||
| lerobot/pusht | 5 | 2.739 | 0.184 | 0.0000123 |
|
| lerobot/pusht | 2 | 3.646 | 0.182 | 0.0000443 |
|
||||||
| lerobot/pusht | 10 | 3.213 | 0.144 | 0.0000116 |
|
| lerobot/pusht | 3 | 4.431 | 0.174 | 0.0000450 |
|
||||||
| lerobot/pusht | 15 | 3.460 | 0.137 | 0.0000112 |
|
| lerobot/pusht | 4 | 5.103 | 0.163 | 0.0000448 |
|
||||||
| lerobot/pusht | 20 | 3.559 | 0.118 | 0.0000109 |
|
| lerobot/pusht | 5 | 5.625 | 0.163 | 0.0000436 |
|
||||||
| lerobot/pusht | 30 | 3.697 | 0.104 | 0.0000117 |
|
| lerobot/pusht | 6 | 5.974 | 0.155 | 0.0000427 |
|
||||||
| lerobot/pusht | 40 | 3.763 | 0.092 | 0.0000116 |
|
| lerobot/pusht | 10 | 6.814 | 0.130 | 0.0000410 |
|
||||||
| lerobot/pusht | 60 | 3.925 | 0.068 | 0.0000117 |
|
| lerobot/pusht | 15 | 7.431 | 0.105 | 0.0000406 |
|
||||||
| lerobot/pusht | 100 | 4.010 | 0.054 | 0.0000117 |
|
| lerobot/pusht | 20 | 7.662 | 0.097 | 0.0000400 |
|
||||||
| lerobot/pusht | <span style="color: #32CD32;">None</span> | 4.058 | 0.043 | 0.0000117 |
|
| lerobot/pusht | 40 | 8.163 | 0.061 | 0.0000405 |
|
||||||
| lerobot/umi_cup_in_the_wild | 1 | 4.790 | 0.236 | 0.0000221 |
|
| lerobot/pusht | 100 | 8.761 | 0.039 | 0.0000422 |
|
||||||
| lerobot/umi_cup_in_the_wild | 5 | 7.707 | 0.201 | 0.0000185 |
|
| lerobot/pusht | None | 8.909 | 0.024 | 0.0000431 |
|
||||||
| lerobot/umi_cup_in_the_wild | 10 | 8.517 | 0.172 | 0.0000177 |
|
| lerobot/umi_cup_in_the_wild | 1 | 14.411 | 0.444 | 0.0000601 |
|
||||||
| lerobot/umi_cup_in_the_wild | 15 | 8.830 | 0.152 | 0.0000170 |
|
| lerobot/umi_cup_in_the_wild | 2 | 14.932 | 0.345 | 0.0000397 |
|
||||||
| lerobot/umi_cup_in_the_wild | 20 | 8.961 | 0.133 | 0.0000167 |
|
| lerobot/umi_cup_in_the_wild | 3 | 20.174 | 0.282 | 0.0000416 |
|
||||||
| lerobot/umi_cup_in_the_wild | 30 | 8.850 | 0.113 | 0.0000167 |
|
| lerobot/umi_cup_in_the_wild | 4 | 24.889 | 0.271 | 0.0000415 |
|
||||||
| lerobot/umi_cup_in_the_wild | 40 | 8.996 | 0.109 | 0.0000174 |
|
| lerobot/umi_cup_in_the_wild | 5 | 28.825 | 0.260 | 0.0000415 |
|
||||||
| lerobot/umi_cup_in_the_wild | 60 | 9.113 | 0.081 | 0.0000163 |
|
| lerobot/umi_cup_in_the_wild | 6 | 31.635 | 0.249 | 0.0000415 |
|
||||||
| lerobot/umi_cup_in_the_wild | 100 | 9.278 | 0.051 | 0.0000173 |
|
| lerobot/umi_cup_in_the_wild | 10 | 39.418 | 0.195 | 0.0000399 |
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">None</span> | 9.396 | 0.030 | 0.0000165 |
|
| lerobot/umi_cup_in_the_wild | 15 | 44.577 | 0.169 | 0.0000394 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 20 | 47.907 | 0.140 | 0.0000390 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 40 | 52.554 | 0.096 | 0.0000384 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 100 | 58.241 | 0.046 | 0.0000390 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | None | 60.530 | 0.022 | 0.0000400 |
|
||||||
|
|
||||||
**`crf`**
|
**`crf`**
|
||||||
| repo_id | crf | pc_compression | pc_load_time | avg_per_pixel_l2_error |
|
| repo_id | crf | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
| --- | --- | --- | --- | --- |
|
| --- | --- | --- | --- | --- |
|
||||||
| lerobot/pusht | 0 | 4.529 | 0.041 | 0.0000035 |
|
| lerobot/pusht | 0 | 1.699 | 0.175 | 0.0000035 |
|
||||||
| lerobot/pusht | 5 | 3.138 | 0.040 | 0.0000077 |
|
| lerobot/pusht | 5 | 1.409 | 0.181 | 0.0000080 |
|
||||||
| lerobot/pusht | <span style="color: #32CD32;">10</span> | 4.058 | 0.038 | 0.0000121 |
|
| lerobot/pusht | 10 | 1.842 | 0.172 | 0.0000123 |
|
||||||
| lerobot/pusht | <span style="color: #32CD32;">15</span> | 5.407 | 0.039 | 0.0000195 |
|
| lerobot/pusht | 15 | 2.322 | 0.187 | 0.0000211 |
|
||||||
| lerobot/pusht | <span style="color: #32CD32;">20</span> | 7.335 | 0.039 | 0.0000319 |
|
| lerobot/pusht | 20 | 3.050 | 0.181 | 0.0000346 |
|
||||||
| lerobot/pusht | <span style="color: #32CD32;">None</span> | 8.909 | 0.046 | 0.0000425 |
|
| lerobot/pusht | None | 3.646 | 0.189 | 0.0000443 |
|
||||||
| lerobot/pusht | 25 | 10.213 | 0.039 | 0.0000519 |
|
| lerobot/pusht | 25 | 3.969 | 0.186 | 0.0000521 |
|
||||||
| lerobot/pusht | 30 | 14.516 | 0.041 | 0.0000795 |
|
| lerobot/pusht | 30 | 5.687 | 0.184 | 0.0000850 |
|
||||||
| lerobot/pusht | 40 | 23.546 | 0.041 | 0.0001557 |
|
| lerobot/pusht | 40 | 10.818 | 0.193 | 0.0001726 |
|
||||||
| lerobot/pusht | 50 | 28.460 | 0.042 | 0.0002723 |
|
| lerobot/pusht | 50 | 18.185 | 0.183 | 0.0002606 |
|
||||||
| lerobot/umi_cup_in_the_wild | 0 | 2.318 | 0.012 | 0.0000056 |
|
| lerobot/umi_cup_in_the_wild | 0 | 1.918 | 0.165 | 0.0000056 |
|
||||||
| lerobot/umi_cup_in_the_wild | 5 | 4.899 | 0.019 | 0.0000132 |
|
| lerobot/umi_cup_in_the_wild | 5 | 3.207 | 0.171 | 0.0000111 |
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">10</span> | 9.396 | 0.026 | 0.0000183 |
|
| lerobot/umi_cup_in_the_wild | 10 | 4.818 | 0.212 | 0.0000153 |
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">15</span> | 19.161 | 0.034 | 0.0000241 |
|
| lerobot/umi_cup_in_the_wild | 15 | 7.329 | 0.261 | 0.0000218 |
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">20</span> | 39.311 | 0.039 | 0.0000329 |
|
| lerobot/umi_cup_in_the_wild | 20 | 11.361 | 0.312 | 0.0000317 |
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">None</span> | 60.530 | 0.043 | 0.0000401 |
|
| lerobot/umi_cup_in_the_wild | None | 14.932 | 0.339 | 0.0000397 |
|
||||||
| lerobot/umi_cup_in_the_wild | 25 | 81.048 | 0.046 | 0.0000454 |
|
| lerobot/umi_cup_in_the_wild | 25 | 17.741 | 0.297 | 0.0000452 |
|
||||||
| lerobot/umi_cup_in_the_wild | 30 | 165.189 | 0.051 | 0.0000609 |
|
| lerobot/umi_cup_in_the_wild | 30 | 27.983 | 0.406 | 0.0000629 |
|
||||||
| lerobot/umi_cup_in_the_wild | 40 | 544.478 | 0.056 | 0.0001095 |
|
| lerobot/umi_cup_in_the_wild | 40 | 82.449 | 0.468 | 0.0001184 |
|
||||||
| lerobot/umi_cup_in_the_wild | 50 | 1109.556 | 0.072 | 0.0001815 |
|
| lerobot/umi_cup_in_the_wild | 50 | 186.145 | 0.515 | 0.0001879 |
|
||||||
|
|
||||||
|
**best**
|
||||||
### Loading 6 consecutive frames with no spacing (TDMPC setting)
|
| repo_id | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
|
|
||||||
**`decoder`**
|
|
||||||
| repo_id | decoder | pc_load_time | avg_per_pixel_l2_error |
|
|
||||||
| --- | --- | --- | --- |
|
| --- | --- | --- | --- |
|
||||||
| lerobot/pusht | <span style="color: #32CD32;">torchvision</span> | 0.386 | 0.0000117 |
|
| lerobot/pusht | 3.646 | 0.188 | 0.0000443 |
|
||||||
| lerobot/pusht | ffmpegio | 0.008 | 0.0000117 |
|
| lerobot/umi_cup_in_the_wild | 14.932 | 0.339 | 0.0000397 |
|
||||||
| lerobot/pusht | torchaudio | 0.184 | 0.0000356 |
|
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">torchvision</span> | 0.448 | 0.0000178 |
|
### `2_frames`
|
||||||
| lerobot/umi_cup_in_the_wild | ffmpegio | 0.009 | 0.0000178 |
|
|
||||||
| lerobot/umi_cup_in_the_wild | torchaudio | 0.149 | 0.0000349 |
|
|
||||||
|
|
||||||
**`pix_fmt`**
|
**`pix_fmt`**
|
||||||
| repo_id | pix_fmt | pc_compression | pc_load_time | avg_per_pixel_l2_error |
|
| repo_id | pix_fmt | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
| --- | --- | --- | --- | --- |
|
| --- | --- | --- | --- | --- |
|
||||||
| lerobot/pusht | yuv420p | 3.602 | 0.518 | 0.0000651 |
|
| lerobot/pusht | yuv420p | 3.788 | 0.314 | 0.0000799 |
|
||||||
| lerobot/pusht | <span style="color: #32CD32;">yuv444p</span> | 3.213 | 0.401 | 0.0000117 |
|
| lerobot/pusht | yuv444p | 3.646 | 0.303 | 0.0000496 |
|
||||||
| lerobot/umi_cup_in_the_wild | yuv420p | 8.879 | 0.578 | 0.0000334 |
|
| lerobot/umi_cup_in_the_wild | yuv420p | 14.391 | 0.642 | 0.0000503 |
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">yuv444p</span> | 8.517 | 0.479 | 0.0000178 |
|
| lerobot/umi_cup_in_the_wild | yuv444p | 14.932 | 0.529 | 0.0000436 |
|
||||||
|
|
||||||
**`g`**
|
**`g`**
|
||||||
| repo_id | g | pc_compression | pc_load_time | avg_per_pixel_l2_error |
|
| repo_id | g | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
| --- | --- | --- | --- | --- |
|
| --- | --- | --- | --- | --- |
|
||||||
| lerobot/pusht | 1 | 1.308 | 0.528 | 0.0000152 |
|
| lerobot/pusht | 1 | 2.543 | 0.308 | 0.0000599 |
|
||||||
| lerobot/pusht | 5 | 2.739 | 0.483 | 0.0000124 |
|
| lerobot/pusht | 2 | 3.646 | 0.279 | 0.0000496 |
|
||||||
| lerobot/pusht | 10 | 3.213 | 0.396 | 0.0000117 |
|
| lerobot/pusht | 3 | 4.431 | 0.259 | 0.0000498 |
|
||||||
| lerobot/pusht | 15 | 3.460 | 0.379 | 0.0000118 |
|
| lerobot/pusht | 4 | 5.103 | 0.243 | 0.0000501 |
|
||||||
| lerobot/pusht | 20 | 3.559 | 0.319 | 0.0000114 |
|
| lerobot/pusht | 5 | 5.625 | 0.235 | 0.0000492 |
|
||||||
| lerobot/pusht | 30 | 3.697 | 0.278 | 0.0000116 |
|
| lerobot/pusht | 6 | 5.974 | 0.230 | 0.0000481 |
|
||||||
| lerobot/pusht | 40 | 3.763 | 0.243 | 0.0000115 |
|
| lerobot/pusht | 10 | 6.814 | 0.194 | 0.0000468 |
|
||||||
| lerobot/pusht | 60 | 3.925 | 0.186 | 0.0000118 |
|
| lerobot/pusht | 15 | 7.431 | 0.152 | 0.0000460 |
|
||||||
| lerobot/pusht | 100 | 4.010 | 0.156 | 0.0000119 |
|
| lerobot/pusht | 20 | 7.662 | 0.151 | 0.0000455 |
|
||||||
| lerobot/pusht | <span style="color: #32CD32;">None</span> | 4.058 | 0.105 | 0.0000121 |
|
| lerobot/pusht | 40 | 8.163 | 0.095 | 0.0000454 |
|
||||||
| lerobot/umi_cup_in_the_wild | 1 | 4.790 | 0.605 | 0.0000221 |
|
| lerobot/pusht | 100 | 8.761 | 0.062 | 0.0000472 |
|
||||||
| lerobot/umi_cup_in_the_wild | 5 | 7.707 | 0.533 | 0.0000183 |
|
| lerobot/pusht | None | 8.909 | 0.037 | 0.0000479 |
|
||||||
| lerobot/umi_cup_in_the_wild | 10 | 8.517 | 0.469 | 0.0000178 |
|
| lerobot/umi_cup_in_the_wild | 1 | 14.411 | 0.638 | 0.0000625 |
|
||||||
| lerobot/umi_cup_in_the_wild | 15 | 8.830 | 0.399 | 0.0000174 |
|
| lerobot/umi_cup_in_the_wild | 2 | 14.932 | 0.537 | 0.0000436 |
|
||||||
| lerobot/umi_cup_in_the_wild | 20 | 8.961 | 0.382 | 0.0000175 |
|
| lerobot/umi_cup_in_the_wild | 3 | 20.174 | 0.493 | 0.0000437 |
|
||||||
| lerobot/umi_cup_in_the_wild | 30 | 8.850 | 0.326 | 0.0000172 |
|
| lerobot/umi_cup_in_the_wild | 4 | 24.889 | 0.458 | 0.0000446 |
|
||||||
| lerobot/umi_cup_in_the_wild | 40 | 8.996 | 0.279 | 0.0000173 |
|
| lerobot/umi_cup_in_the_wild | 5 | 28.825 | 0.438 | 0.0000445 |
|
||||||
| lerobot/umi_cup_in_the_wild | 60 | 9.113 | 0.226 | 0.0000174 |
|
| lerobot/umi_cup_in_the_wild | 6 | 31.635 | 0.424 | 0.0000444 |
|
||||||
| lerobot/umi_cup_in_the_wild | 100 | 9.278 | 0.150 | 0.0000175 |
|
| lerobot/umi_cup_in_the_wild | 10 | 39.418 | 0.345 | 0.0000435 |
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">None</span> | 9.396 | 0.076 | 0.0000176 |
|
| lerobot/umi_cup_in_the_wild | 15 | 44.577 | 0.313 | 0.0000417 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 20 | 47.907 | 0.264 | 0.0000421 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 40 | 52.554 | 0.185 | 0.0000414 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 100 | 58.241 | 0.090 | 0.0000420 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | None | 60.530 | 0.042 | 0.0000424 |
|
||||||
|
|
||||||
**`crf`**
|
**`crf`**
|
||||||
| repo_id | crf | pc_compression | pc_load_time | avg_per_pixel_l2_error |
|
| repo_id | crf | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
| --- | --- | --- | --- | --- |
|
| --- | --- | --- | --- | --- |
|
||||||
| lerobot/pusht | 0 | 4.529 | 0.108 | 0.0000035 |
|
| lerobot/pusht | 0 | 1.699 | 0.302 | 0.0000097 |
|
||||||
| lerobot/pusht | 5 | 3.138 | 0.099 | 0.0000077 |
|
| lerobot/pusht | 5 | 1.409 | 0.287 | 0.0000142 |
|
||||||
| lerobot/pusht | 10 | 4.058 | 0.091 | 0.0000121 |
|
| lerobot/pusht | 10 | 1.842 | 0.283 | 0.0000184 |
|
||||||
| lerobot/pusht | 15 | 5.407 | 0.095 | 0.0000195 |
|
| lerobot/pusht | 15 | 2.322 | 0.305 | 0.0000268 |
|
||||||
| lerobot/pusht | 20 | 7.335 | 0.100 | 0.0000318 |
|
| lerobot/pusht | 20 | 3.050 | 0.285 | 0.0000402 |
|
||||||
| lerobot/pusht | <span style="color: #32CD32;">None</span> | 8.909 | 0.102 | 0.0000422 |
|
| lerobot/pusht | None | 3.646 | 0.285 | 0.0000496 |
|
||||||
| lerobot/pusht | 25 | 10.213 | 0.102 | 0.0000517 |
|
| lerobot/pusht | 25 | 3.969 | 0.293 | 0.0000572 |
|
||||||
| lerobot/pusht | 30 | 14.516 | 0.104 | 0.0000795 |
|
| lerobot/pusht | 30 | 5.687 | 0.293 | 0.0000893 |
|
||||||
| lerobot/pusht | 40 | 23.546 | 0.106 | 0.0001555 |
|
| lerobot/pusht | 40 | 10.818 | 0.319 | 0.0001762 |
|
||||||
| lerobot/pusht | 50 | 28.460 | 0.110 | 0.0002723 |
|
| lerobot/pusht | 50 | 18.185 | 0.304 | 0.0002626 |
|
||||||
| lerobot/umi_cup_in_the_wild | 0 | 2.318 | 0.032 | 0.0000056 |
|
| lerobot/umi_cup_in_the_wild | 0 | 1.918 | 0.235 | 0.0000112 |
|
||||||
| lerobot/umi_cup_in_the_wild | 5 | 4.899 | 0.052 | 0.0000127 |
|
| lerobot/umi_cup_in_the_wild | 5 | 3.207 | 0.261 | 0.0000166 |
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">10</span> | 9.396 | 0.073 | 0.0000176 |
|
| lerobot/umi_cup_in_the_wild | 10 | 4.818 | 0.333 | 0.0000207 |
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">15</span> | 19.161 | 0.097 | 0.0000234 |
|
| lerobot/umi_cup_in_the_wild | 15 | 7.329 | 0.406 | 0.0000267 |
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">20</span> | 39.311 | 0.110 | 0.0000321 |
|
| lerobot/umi_cup_in_the_wild | 20 | 11.361 | 0.489 | 0.0000361 |
|
||||||
| lerobot/umi_cup_in_the_wild | <span style="color: #32CD32;">None</span> | 60.530 | 0.117 | 0.0000393 |
|
| lerobot/umi_cup_in_the_wild | None | 14.932 | 0.537 | 0.0000436 |
|
||||||
| lerobot/umi_cup_in_the_wild | 25 | 81.048 | 0.126 | 0.0000446 |
|
| lerobot/umi_cup_in_the_wild | 25 | 17.741 | 0.578 | 0.0000487 |
|
||||||
| lerobot/umi_cup_in_the_wild | 30 | 165.189 | 0.138 | 0.0000603 |
|
| lerobot/umi_cup_in_the_wild | 30 | 27.983 | 0.453 | 0.0000655 |
|
||||||
| lerobot/umi_cup_in_the_wild | 40 | 544.478 | 0.151 | 0.0001095 |
|
| lerobot/umi_cup_in_the_wild | 40 | 82.449 | 0.767 | 0.0001192 |
|
||||||
| lerobot/umi_cup_in_the_wild | 50 | 1109.556 | 0.167 | 0.0001817 |
|
| lerobot/umi_cup_in_the_wild | 50 | 186.145 | 0.816 | 0.0001881 |
|
||||||
|
|
||||||
|
**best**
|
||||||
|
| repo_id | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
|
| --- | --- | --- | --- |
|
||||||
|
| lerobot/pusht | 3.646 | 0.283 | 0.0000496 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 14.932 | 0.543 | 0.0000436 |
|
||||||
|
|
||||||
|
### `2_frames_4_space`
|
||||||
|
|
||||||
|
**`pix_fmt`**
|
||||||
|
| repo_id | pix_fmt | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
|
| --- | --- | --- | --- | --- |
|
||||||
|
| lerobot/pusht | yuv420p | 3.788 | 0.257 | 0.0000855 |
|
||||||
|
| lerobot/pusht | yuv444p | 3.646 | 0.261 | 0.0000556 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | yuv420p | 14.391 | 0.493 | 0.0000476 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | yuv444p | 14.932 | 0.371 | 0.0000404 |
|
||||||
|
|
||||||
|
**`g`**
|
||||||
|
| repo_id | g | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
|
| --- | --- | --- | --- | --- |
|
||||||
|
| lerobot/pusht | 1 | 2.543 | 0.226 | 0.0000670 |
|
||||||
|
| lerobot/pusht | 2 | 3.646 | 0.222 | 0.0000556 |
|
||||||
|
| lerobot/pusht | 3 | 4.431 | 0.217 | 0.0000567 |
|
||||||
|
| lerobot/pusht | 4 | 5.103 | 0.204 | 0.0000555 |
|
||||||
|
| lerobot/pusht | 5 | 5.625 | 0.179 | 0.0000556 |
|
||||||
|
| lerobot/pusht | 6 | 5.974 | 0.188 | 0.0000544 |
|
||||||
|
| lerobot/pusht | 10 | 6.814 | 0.160 | 0.0000531 |
|
||||||
|
| lerobot/pusht | 15 | 7.431 | 0.150 | 0.0000521 |
|
||||||
|
| lerobot/pusht | 20 | 7.662 | 0.123 | 0.0000519 |
|
||||||
|
| lerobot/pusht | 40 | 8.163 | 0.092 | 0.0000519 |
|
||||||
|
| lerobot/pusht | 100 | 8.761 | 0.053 | 0.0000533 |
|
||||||
|
| lerobot/pusht | None | 8.909 | 0.034 | 0.0000541 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 1 | 14.411 | 0.409 | 0.0000607 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 2 | 14.932 | 0.381 | 0.0000404 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 3 | 20.174 | 0.355 | 0.0000418 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 4 | 24.889 | 0.346 | 0.0000425 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 5 | 28.825 | 0.354 | 0.0000419 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 6 | 31.635 | 0.336 | 0.0000419 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 10 | 39.418 | 0.314 | 0.0000402 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 15 | 44.577 | 0.269 | 0.0000397 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 20 | 47.907 | 0.246 | 0.0000395 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 40 | 52.554 | 0.171 | 0.0000390 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 100 | 58.241 | 0.091 | 0.0000399 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | None | 60.530 | 0.043 | 0.0000409 |
|
||||||
|
|
||||||
|
**`crf`**
|
||||||
|
| repo_id | crf | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
|
| --- | --- | --- | --- | --- |
|
||||||
|
| lerobot/pusht | 0 | 1.699 | 0.212 | 0.0000193 |
|
||||||
|
| lerobot/pusht | 5 | 1.409 | 0.211 | 0.0000232 |
|
||||||
|
| lerobot/pusht | 10 | 1.842 | 0.199 | 0.0000270 |
|
||||||
|
| lerobot/pusht | 15 | 2.322 | 0.198 | 0.0000347 |
|
||||||
|
| lerobot/pusht | 20 | 3.050 | 0.211 | 0.0000469 |
|
||||||
|
| lerobot/pusht | None | 3.646 | 0.206 | 0.0000556 |
|
||||||
|
| lerobot/pusht | 25 | 3.969 | 0.210 | 0.0000626 |
|
||||||
|
| lerobot/pusht | 30 | 5.687 | 0.223 | 0.0000927 |
|
||||||
|
| lerobot/pusht | 40 | 10.818 | 0.227 | 0.0001763 |
|
||||||
|
| lerobot/pusht | 50 | 18.185 | 0.223 | 0.0002625 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 0 | 1.918 | 0.147 | 0.0000071 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 5 | 3.207 | 0.182 | 0.0000125 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 10 | 4.818 | 0.222 | 0.0000166 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 15 | 7.329 | 0.270 | 0.0000229 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 20 | 11.361 | 0.325 | 0.0000326 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | None | 14.932 | 0.362 | 0.0000404 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 25 | 17.741 | 0.390 | 0.0000459 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 30 | 27.983 | 0.437 | 0.0000633 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 40 | 82.449 | 0.499 | 0.0001186 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 50 | 186.145 | 0.564 | 0.0001879 |
|
||||||
|
|
||||||
|
**best**
|
||||||
|
| repo_id | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
|
| --- | --- | --- | --- |
|
||||||
|
| lerobot/pusht | 3.646 | 0.224 | 0.0000556 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 14.932 | 0.368 | 0.0000404 |
|
||||||
|
|
||||||
|
### `6_frames`
|
||||||
|
|
||||||
|
**`pix_fmt`**
|
||||||
|
| repo_id | pix_fmt | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
|
| --- | --- | --- | --- | --- |
|
||||||
|
| lerobot/pusht | yuv420p | 3.788 | 0.660 | 0.0000839 |
|
||||||
|
| lerobot/pusht | yuv444p | 3.646 | 0.546 | 0.0000542 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | yuv420p | 14.391 | 1.225 | 0.0000497 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | yuv444p | 14.932 | 0.908 | 0.0000428 |
|
||||||
|
|
||||||
|
**`g`**
|
||||||
|
| repo_id | g | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
|
| --- | --- | --- | --- | --- |
|
||||||
|
| lerobot/pusht | 1 | 2.543 | 0.552 | 0.0000646 |
|
||||||
|
| lerobot/pusht | 2 | 3.646 | 0.534 | 0.0000542 |
|
||||||
|
| lerobot/pusht | 3 | 4.431 | 0.563 | 0.0000546 |
|
||||||
|
| lerobot/pusht | 4 | 5.103 | 0.537 | 0.0000545 |
|
||||||
|
| lerobot/pusht | 5 | 5.625 | 0.477 | 0.0000532 |
|
||||||
|
| lerobot/pusht | 6 | 5.974 | 0.515 | 0.0000530 |
|
||||||
|
| lerobot/pusht | 10 | 6.814 | 0.410 | 0.0000512 |
|
||||||
|
| lerobot/pusht | 15 | 7.431 | 0.405 | 0.0000503 |
|
||||||
|
| lerobot/pusht | 20 | 7.662 | 0.345 | 0.0000500 |
|
||||||
|
| lerobot/pusht | 40 | 8.163 | 0.247 | 0.0000496 |
|
||||||
|
| lerobot/pusht | 100 | 8.761 | 0.147 | 0.0000510 |
|
||||||
|
| lerobot/pusht | None | 8.909 | 0.100 | 0.0000519 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 1 | 14.411 | 0.997 | 0.0000620 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 2 | 14.932 | 0.911 | 0.0000428 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 3 | 20.174 | 0.869 | 0.0000433 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 4 | 24.889 | 0.874 | 0.0000438 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 5 | 28.825 | 0.864 | 0.0000439 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 6 | 31.635 | 0.834 | 0.0000440 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 10 | 39.418 | 0.781 | 0.0000421 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 15 | 44.577 | 0.679 | 0.0000411 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 20 | 47.907 | 0.652 | 0.0000410 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 40 | 52.554 | 0.465 | 0.0000404 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 100 | 58.241 | 0.245 | 0.0000413 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | None | 60.530 | 0.116 | 0.0000417 |
|
||||||
|
|
||||||
|
**`crf`**
|
||||||
|
| repo_id | crf | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
|
| --- | --- | --- | --- | --- |
|
||||||
|
| lerobot/pusht | 0 | 1.699 | 0.534 | 0.0000163 |
|
||||||
|
| lerobot/pusht | 5 | 1.409 | 0.524 | 0.0000205 |
|
||||||
|
| lerobot/pusht | 10 | 1.842 | 0.510 | 0.0000245 |
|
||||||
|
| lerobot/pusht | 15 | 2.322 | 0.512 | 0.0000324 |
|
||||||
|
| lerobot/pusht | 20 | 3.050 | 0.508 | 0.0000452 |
|
||||||
|
| lerobot/pusht | None | 3.646 | 0.518 | 0.0000542 |
|
||||||
|
| lerobot/pusht | 25 | 3.969 | 0.534 | 0.0000616 |
|
||||||
|
| lerobot/pusht | 30 | 5.687 | 0.530 | 0.0000927 |
|
||||||
|
| lerobot/pusht | 40 | 10.818 | 0.552 | 0.0001777 |
|
||||||
|
| lerobot/pusht | 50 | 18.185 | 0.564 | 0.0002644 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 0 | 1.918 | 0.401 | 0.0000101 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 5 | 3.207 | 0.499 | 0.0000156 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 10 | 4.818 | 0.599 | 0.0000197 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 15 | 7.329 | 0.704 | 0.0000258 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 20 | 11.361 | 0.834 | 0.0000352 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | None | 14.932 | 0.925 | 0.0000428 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 25 | 17.741 | 0.978 | 0.0000480 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 30 | 27.983 | 1.088 | 0.0000648 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 40 | 82.449 | 1.324 | 0.0001190 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 50 | 186.145 | 1.436 | 0.0001880 |
|
||||||
|
|
||||||
|
**best**
|
||||||
|
| repo_id | compression_factor | load_time_factor | avg_per_pixel_l2_error |
|
||||||
|
| --- | --- | --- | --- |
|
||||||
|
| lerobot/pusht | 3.646 | 0.546 | 0.0000542 |
|
||||||
|
| lerobot/umi_cup_in_the_wild | 14.932 | 0.934 | 0.0000428 |
|
||||||
|
|
|
@ -31,7 +31,12 @@ def get_directory_size(directory):
|
||||||
return total_size
|
return total_size
|
||||||
|
|
||||||
|
|
||||||
def run_video_benchmark(output_dir, cfg, seed=1337, timestamps_mode="diffusion"):
|
def run_video_benchmark(
|
||||||
|
output_dir,
|
||||||
|
cfg,
|
||||||
|
timestamps_mode,
|
||||||
|
seed=1337,
|
||||||
|
):
|
||||||
output_dir = Path(output_dir)
|
output_dir = Path(output_dir)
|
||||||
if output_dir.exists():
|
if output_dir.exists():
|
||||||
shutil.rmtree(output_dir)
|
shutil.rmtree(output_dir)
|
||||||
|
@ -73,19 +78,20 @@ def run_video_benchmark(output_dir, cfg, seed=1337, timestamps_mode="diffusion")
|
||||||
crf = cfg.get("crf")
|
crf = cfg.get("crf")
|
||||||
pix_fmt = cfg["pix_fmt"]
|
pix_fmt = cfg["pix_fmt"]
|
||||||
|
|
||||||
ffmpeg_cmd = ""
|
cmd = f"ffmpeg -r {fps} "
|
||||||
ffmpeg_cmd += f"ffmpeg -r {fps} -f image2 "
|
cmd += "-f image2 "
|
||||||
ffmpeg_cmd += f"-i {str(imgs_dir / 'frame_%06d.png')} "
|
cmd += "-loglevel error "
|
||||||
ffmpeg_cmd += "-vcodec libx264 "
|
cmd += f"-i {str(imgs_dir / 'frame_%06d.png')} "
|
||||||
|
cmd += "-vcodec libx264 "
|
||||||
if g is not None:
|
if g is not None:
|
||||||
ffmpeg_cmd += f"-g {g} " # ensures at least 1 keyframe every 10 frames
|
cmd += f"-g {g} " # ensures at least 1 keyframe every 10 frames
|
||||||
# ffmpeg_cmd += "-keyint_min 10 " set a minimum of 10 frames between 2 key frames
|
# cmd += "-keyint_min 10 " set a minimum of 10 frames between 2 key frames
|
||||||
# ffmpeg_cmd += "-sc_threshold 0 " disable scene change detection to lower the number of key frames
|
# cmd += "-sc_threshold 0 " disable scene change detection to lower the number of key frames
|
||||||
if crf is not None:
|
if crf is not None:
|
||||||
ffmpeg_cmd += f"-crf {crf} "
|
cmd += f"-crf {crf} "
|
||||||
ffmpeg_cmd += f"-pix_fmt {pix_fmt} "
|
cmd += f"-pix_fmt {pix_fmt} "
|
||||||
ffmpeg_cmd += f"{str(video_path)}"
|
cmd += f"{str(video_path)}"
|
||||||
subprocess.run(ffmpeg_cmd.split(" "), check=True)
|
subprocess.run(cmd.split(" "), check=True)
|
||||||
|
|
||||||
video_size_bytes = video_path.stat().st_size
|
video_size_bytes = video_path.stat().st_size
|
||||||
|
|
||||||
|
@ -127,18 +133,23 @@ def run_video_benchmark(output_dir, cfg, seed=1337, timestamps_mode="diffusion")
|
||||||
# test loading 2 frames that are 4 frames appart, which might be a common setting
|
# test loading 2 frames that are 4 frames appart, which might be a common setting
|
||||||
ts = random.randint(fps, ep_num_images - fps) / fps
|
ts = random.randint(fps, ep_num_images - fps) / fps
|
||||||
|
|
||||||
if timestamps_mode == "diffusion":
|
if timestamps_mode == "1_frame":
|
||||||
prev_ts = round(ts - 4 / fps, 4)
|
timestamps = [ts]
|
||||||
timestamps = [prev_ts, ts]
|
elif timestamps_mode == "2_frames":
|
||||||
elif timestamps_mode == "tdmpc":
|
timestamps = [ts - 1 / fps, ts]
|
||||||
timestamps = [round(ts - i / fps, 4) for i in range(6)][::-1]
|
elif timestamps_mode == "2_frames_4_space":
|
||||||
|
timestamps = [ts - 4 / fps, ts]
|
||||||
|
elif timestamps_mode == "6_frames":
|
||||||
|
timestamps = [ts - i / fps for i in range(6)][::-1]
|
||||||
else:
|
else:
|
||||||
raise ValueError(timestamps_mode)
|
raise ValueError(timestamps_mode)
|
||||||
|
|
||||||
num_frames = len(timestamps)
|
num_frames = len(timestamps)
|
||||||
|
|
||||||
start_time_s = time.monotonic()
|
start_time_s = time.monotonic()
|
||||||
frames = decode_frames_fn(video_path, timestamps=timestamps, device=device, **decoder_kwgs)
|
frames = decode_frames_fn(
|
||||||
|
video_path, timestamps=timestamps, tolerance_s=1e-4, device=device, **decoder_kwgs
|
||||||
|
)
|
||||||
avg_load_time = (time.monotonic() - start_time_s) / num_frames
|
avg_load_time = (time.monotonic() - start_time_s) / num_frames
|
||||||
list_avg_load_time.append(avg_load_time)
|
list_avg_load_time.append(avg_load_time)
|
||||||
|
|
||||||
|
@ -177,25 +188,17 @@ def run_video_benchmark(output_dir, cfg, seed=1337, timestamps_mode="diffusion")
|
||||||
"video_size_bytes": video_size_bytes,
|
"video_size_bytes": video_size_bytes,
|
||||||
"avg_load_time_from_images": avg_load_time_from_images,
|
"avg_load_time_from_images": avg_load_time_from_images,
|
||||||
"avg_load_time": avg_load_time,
|
"avg_load_time": avg_load_time,
|
||||||
"pc_compression": sum_original_frames_size_bytes / video_size_bytes,
|
"compression_factor": sum_original_frames_size_bytes / video_size_bytes,
|
||||||
"pc_load_time": avg_load_time_from_images / avg_load_time,
|
"load_time_factor": avg_load_time_from_images / avg_load_time,
|
||||||
"avg_per_pixel_l2_error": avg_per_pixel_l2_error,
|
"avg_per_pixel_l2_error": avg_per_pixel_l2_error,
|
||||||
}
|
}
|
||||||
|
|
||||||
for key in info:
|
|
||||||
print(key, info[key])
|
|
||||||
|
|
||||||
with open(output_dir / "info.json", "w") as f:
|
with open(output_dir / "info.json", "w") as f:
|
||||||
json.dump(info, f)
|
json.dump(info, f)
|
||||||
|
|
||||||
return info
|
return info
|
||||||
|
|
||||||
|
|
||||||
def main():
|
|
||||||
dry_run = True
|
|
||||||
|
|
||||||
bench_dir = Path("tmp/2024_04_29_1049_6_timestamps")
|
|
||||||
|
|
||||||
def display_markdown_table(headers, rows):
|
def display_markdown_table(headers, rows):
|
||||||
for i, row in enumerate(rows):
|
for i, row in enumerate(rows):
|
||||||
new_row = []
|
new_row = []
|
||||||
|
@ -220,48 +223,59 @@ def main():
|
||||||
print(markdown_table)
|
print(markdown_table)
|
||||||
print()
|
print()
|
||||||
|
|
||||||
|
|
||||||
def load_info(out_dir):
|
def load_info(out_dir):
|
||||||
with open(out_dir / "info.json") as f:
|
with open(out_dir / "info.json") as f:
|
||||||
info = json.load(f)
|
info = json.load(f)
|
||||||
return info
|
return info
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
dry_run = False
|
||||||
repo_ids = ["lerobot/pusht", "lerobot/umi_cup_in_the_wild"]
|
repo_ids = ["lerobot/pusht", "lerobot/umi_cup_in_the_wild"]
|
||||||
|
timestamps_modes = [
|
||||||
|
"1_frame",
|
||||||
|
"2_frames",
|
||||||
|
"2_frames_4_space",
|
||||||
|
"6_frames",
|
||||||
|
]
|
||||||
|
for timestamps_mode in timestamps_modes:
|
||||||
|
bench_dir = Path(f"tmp/2024_05_01_{timestamps_mode}")
|
||||||
|
|
||||||
# torchvision vs ffmpegio vs torchaudio
|
print(f"### `{timestamps_mode}`")
|
||||||
|
print()
|
||||||
|
|
||||||
headers = ["repo_id", "decoder", "pc_load_time", "avg_per_pixel_l2_error"]
|
# print("**`decoder`**")
|
||||||
rows = []
|
# headers = ["repo_id", "decoder", "load_time_factor", "avg_per_pixel_l2_error"]
|
||||||
for repo_id in repo_ids:
|
# rows = []
|
||||||
for decoder in ["torchvision", "ffmpegio", "torchaudio"]:
|
# for repo_id in repo_ids:
|
||||||
cfg = {
|
# for decoder in ["torchvision", "ffmpegio", "torchaudio"]:
|
||||||
"repo_id": repo_id,
|
# cfg = {
|
||||||
# video encoding
|
# "repo_id": repo_id,
|
||||||
"g": 10,
|
# # video encoding
|
||||||
"crf": 10,
|
# "pix_fmt": "yuv444p",
|
||||||
"pix_fmt": "yuv444p",
|
# # video decoding
|
||||||
# video decoding
|
# "device": "cpu",
|
||||||
"device": "cpu",
|
# "decoder": decoder,
|
||||||
"decoder": decoder,
|
# "decoder_kwgs": {},
|
||||||
"decoder_kwgs": {},
|
# }
|
||||||
}
|
|
||||||
|
|
||||||
if not dry_run:
|
# if not dry_run:
|
||||||
run_video_benchmark(bench_dir / repo_id / decoder, cfg=cfg)
|
# run_video_benchmark(bench_dir / repo_id / decoder, cfg, timestamps_mode)
|
||||||
info = load_info(bench_dir / repo_id / decoder)
|
# info = load_info(bench_dir / repo_id / decoder)
|
||||||
rows.append([repo_id, decoder, info["pc_load_time"], info["avg_per_pixel_l2_error"]])
|
# rows.append([repo_id, decoder, info["load_time_factor"], info["avg_per_pixel_l2_error"]])
|
||||||
display_markdown_table(headers, rows)
|
# display_markdown_table(headers, rows)
|
||||||
|
|
||||||
# yuv444p vs yuv420p
|
print("**`pix_fmt`**")
|
||||||
|
headers = ["repo_id", "pix_fmt", "compression_factor", "load_time_factor", "avg_per_pixel_l2_error"]
|
||||||
headers = ["repo_id", "pix_fmt", "pc_compression", "pc_load_time", "avg_per_pixel_l2_error"]
|
|
||||||
rows = []
|
rows = []
|
||||||
for repo_id in repo_ids:
|
for repo_id in repo_ids:
|
||||||
for pix_fmt in ["yuv420p", "yuv444p"]:
|
for pix_fmt in ["yuv420p", "yuv444p"]:
|
||||||
cfg = {
|
cfg = {
|
||||||
"repo_id": repo_id,
|
"repo_id": repo_id,
|
||||||
# video encoding
|
# video encoding
|
||||||
"g": 10,
|
"g": 2,
|
||||||
"crf": 10,
|
"crf": None,
|
||||||
"pix_fmt": pix_fmt,
|
"pix_fmt": pix_fmt,
|
||||||
# video decoding
|
# video decoding
|
||||||
"device": "cpu",
|
"device": "cpu",
|
||||||
|
@ -269,30 +283,28 @@ def main():
|
||||||
"decoder_kwgs": {},
|
"decoder_kwgs": {},
|
||||||
}
|
}
|
||||||
if not dry_run:
|
if not dry_run:
|
||||||
run_video_benchmark(bench_dir / repo_id / f"torchvision_{pix_fmt}", cfg=cfg)
|
run_video_benchmark(bench_dir / repo_id / f"torchvision_{pix_fmt}", cfg, timestamps_mode)
|
||||||
info = load_info(bench_dir / repo_id / f"torchvision_{pix_fmt}")
|
info = load_info(bench_dir / repo_id / f"torchvision_{pix_fmt}")
|
||||||
rows.append(
|
rows.append(
|
||||||
[
|
[
|
||||||
repo_id,
|
repo_id,
|
||||||
pix_fmt,
|
pix_fmt,
|
||||||
info["pc_compression"],
|
info["compression_factor"],
|
||||||
info["pc_load_time"],
|
info["load_time_factor"],
|
||||||
info["avg_per_pixel_l2_error"],
|
info["avg_per_pixel_l2_error"],
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
display_markdown_table(headers, rows)
|
display_markdown_table(headers, rows)
|
||||||
|
|
||||||
# g
|
print("**`g`**")
|
||||||
|
headers = ["repo_id", "g", "compression_factor", "load_time_factor", "avg_per_pixel_l2_error"]
|
||||||
headers = ["repo_id", "g", "pc_compression", "pc_load_time", "avg_per_pixel_l2_error"]
|
|
||||||
rows = []
|
rows = []
|
||||||
for repo_id in repo_ids:
|
for repo_id in repo_ids:
|
||||||
for g in [1, 5, 10, 15, 20, 30, 40, 60, 100, None]:
|
for g in [1, 2, 3, 4, 5, 6, 10, 15, 20, 40, 100, None]:
|
||||||
cfg = {
|
cfg = {
|
||||||
"repo_id": repo_id,
|
"repo_id": repo_id,
|
||||||
# video encoding
|
# video encoding
|
||||||
"g": g,
|
"g": g,
|
||||||
"crf": 10,
|
|
||||||
"pix_fmt": "yuv444p",
|
"pix_fmt": "yuv444p",
|
||||||
# video decoding
|
# video decoding
|
||||||
"device": "cpu",
|
"device": "cpu",
|
||||||
|
@ -300,23 +312,28 @@ def main():
|
||||||
"decoder_kwgs": {},
|
"decoder_kwgs": {},
|
||||||
}
|
}
|
||||||
if not dry_run:
|
if not dry_run:
|
||||||
run_video_benchmark(bench_dir / repo_id / f"torchvision_g_{g}", cfg=cfg)
|
run_video_benchmark(bench_dir / repo_id / f"torchvision_g_{g}", cfg, timestamps_mode)
|
||||||
info = load_info(bench_dir / repo_id / f"torchvision_g_{g}")
|
info = load_info(bench_dir / repo_id / f"torchvision_g_{g}")
|
||||||
rows.append(
|
rows.append(
|
||||||
[repo_id, g, info["pc_compression"], info["pc_load_time"], info["avg_per_pixel_l2_error"]]
|
[
|
||||||
|
repo_id,
|
||||||
|
g,
|
||||||
|
info["compression_factor"],
|
||||||
|
info["load_time_factor"],
|
||||||
|
info["avg_per_pixel_l2_error"],
|
||||||
|
]
|
||||||
)
|
)
|
||||||
display_markdown_table(headers, rows)
|
display_markdown_table(headers, rows)
|
||||||
|
|
||||||
# crf
|
print("**`crf`**")
|
||||||
|
headers = ["repo_id", "crf", "compression_factor", "load_time_factor", "avg_per_pixel_l2_error"]
|
||||||
headers = ["repo_id", "crf", "pc_compression", "pc_load_time", "avg_per_pixel_l2_error"]
|
|
||||||
rows = []
|
rows = []
|
||||||
for repo_id in repo_ids:
|
for repo_id in repo_ids:
|
||||||
for crf in [0, 5, 10, 15, 20, None, 25, 30, 40, 50]:
|
for crf in [0, 5, 10, 15, 20, None, 25, 30, 40, 50]:
|
||||||
cfg = {
|
cfg = {
|
||||||
"repo_id": repo_id,
|
"repo_id": repo_id,
|
||||||
# video encoding
|
# video encoding
|
||||||
"g": None,
|
"g": 2,
|
||||||
"crf": crf,
|
"crf": crf,
|
||||||
"pix_fmt": "yuv444p",
|
"pix_fmt": "yuv444p",
|
||||||
# video decoding
|
# video decoding
|
||||||
|
@ -325,10 +342,44 @@ def main():
|
||||||
"decoder_kwgs": {},
|
"decoder_kwgs": {},
|
||||||
}
|
}
|
||||||
if not dry_run:
|
if not dry_run:
|
||||||
run_video_benchmark(bench_dir / repo_id / f"torchvision_crf_{crf}", cfg=cfg)
|
run_video_benchmark(bench_dir / repo_id / f"torchvision_crf_{crf}", cfg, timestamps_mode)
|
||||||
info = load_info(bench_dir / repo_id / f"torchvision_crf_{crf}")
|
info = load_info(bench_dir / repo_id / f"torchvision_crf_{crf}")
|
||||||
rows.append(
|
rows.append(
|
||||||
[repo_id, crf, info["pc_compression"], info["pc_load_time"], info["avg_per_pixel_l2_error"]]
|
[
|
||||||
|
repo_id,
|
||||||
|
crf,
|
||||||
|
info["compression_factor"],
|
||||||
|
info["load_time_factor"],
|
||||||
|
info["avg_per_pixel_l2_error"],
|
||||||
|
]
|
||||||
|
)
|
||||||
|
display_markdown_table(headers, rows)
|
||||||
|
|
||||||
|
print("**best**")
|
||||||
|
headers = ["repo_id", "compression_factor", "load_time_factor", "avg_per_pixel_l2_error"]
|
||||||
|
rows = []
|
||||||
|
for repo_id in repo_ids:
|
||||||
|
cfg = {
|
||||||
|
"repo_id": repo_id,
|
||||||
|
# video encoding
|
||||||
|
"g": 2,
|
||||||
|
"crf": None,
|
||||||
|
"pix_fmt": "yuv444p",
|
||||||
|
# video decoding
|
||||||
|
"device": "cpu",
|
||||||
|
"decoder": "torchvision",
|
||||||
|
"decoder_kwgs": {},
|
||||||
|
}
|
||||||
|
if not dry_run:
|
||||||
|
run_video_benchmark(bench_dir / repo_id / "torchvision_best", cfg, timestamps_mode)
|
||||||
|
info = load_info(bench_dir / repo_id / "torchvision_best")
|
||||||
|
rows.append(
|
||||||
|
[
|
||||||
|
repo_id,
|
||||||
|
info["compression_factor"],
|
||||||
|
info["load_time_factor"],
|
||||||
|
info["avg_per_pixel_l2_error"],
|
||||||
|
]
|
||||||
)
|
)
|
||||||
display_markdown_table(headers, rows)
|
display_markdown_table(headers, rows)
|
||||||
|
|
||||||
|
|
|
@ -70,7 +70,7 @@ def compute_stats(dataset: LeRobotDataset | datasets.Dataset, batch_size=32, max
|
||||||
generator.manual_seed(seed)
|
generator.manual_seed(seed)
|
||||||
dataloader = torch.utils.data.DataLoader(
|
dataloader = torch.utils.data.DataLoader(
|
||||||
dataset,
|
dataset,
|
||||||
num_workers=4,
|
num_workers=16,
|
||||||
batch_size=batch_size,
|
batch_size=batch_size,
|
||||||
shuffle=True,
|
shuffle=True,
|
||||||
drop_last=False,
|
drop_last=False,
|
||||||
|
|
|
@ -216,9 +216,14 @@ def load_previous_and_future_frames(
|
||||||
|
|
||||||
# load frames modality
|
# load frames modality
|
||||||
item[key] = hf_dataset.select_columns(key)[data_ids][key]
|
item[key] = hf_dataset.select_columns(key)[data_ids][key]
|
||||||
|
|
||||||
|
if isinstance(item[key][0], dict) and "path" in item[key][0]:
|
||||||
|
# video mode where frame are expressed as dict of path and timestamp
|
||||||
|
item[key] = item[key]
|
||||||
|
else:
|
||||||
item[key] = torch.stack(item[key])
|
item[key] = torch.stack(item[key])
|
||||||
|
|
||||||
item[f"{key}_is_pad"] = is_pad
|
item[f"{key}_is_pad"] = is_pad
|
||||||
item[f"{key}_timestamp"] = query_ts
|
|
||||||
|
|
||||||
return item
|
return item
|
||||||
|
|
||||||
|
|
|
@ -1,5 +1,6 @@
|
||||||
import logging
|
import logging
|
||||||
import subprocess
|
import subprocess
|
||||||
|
import warnings
|
||||||
from dataclasses import dataclass, field
|
from dataclasses import dataclass, field
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Any, ClassVar
|
from typing import Any, ClassVar
|
||||||
|
@ -26,7 +27,7 @@ def load_from_videos(
|
||||||
# load multiple frames at once (expected when delta_timestamps is not None)
|
# load multiple frames at once (expected when delta_timestamps is not None)
|
||||||
timestamps = [frame["timestamp"] for frame in item[key]]
|
timestamps = [frame["timestamp"] for frame in item[key]]
|
||||||
paths = [frame["path"] for frame in item[key]]
|
paths = [frame["path"] for frame in item[key]]
|
||||||
if len(set(paths)) == 1:
|
if len(set(paths)) > 1:
|
||||||
raise NotImplementedError("All video paths are expected to be the same for now.")
|
raise NotImplementedError("All video paths are expected to be the same for now.")
|
||||||
video_path = data_dir / paths[0]
|
video_path = data_dir / paths[0]
|
||||||
|
|
||||||
|
@ -61,9 +62,11 @@ def decode_video_frames_torchvision(
|
||||||
video_path = str(video_path)
|
video_path = str(video_path)
|
||||||
|
|
||||||
# set backend
|
# set backend
|
||||||
|
keyframes_only = False
|
||||||
if device == "cpu":
|
if device == "cpu":
|
||||||
# explicitely use pyav
|
# explicitely use pyav
|
||||||
torchvision.set_video_backend("pyav")
|
torchvision.set_video_backend("pyav")
|
||||||
|
keyframes_only = True # pyav doesnt support accuracte seek
|
||||||
elif device == "cuda":
|
elif device == "cuda":
|
||||||
# TODO(rcadene, aliberts): implement video decoding with GPU
|
# TODO(rcadene, aliberts): implement video decoding with GPU
|
||||||
# torchvision.set_video_backend("cuda")
|
# torchvision.set_video_backend("cuda")
|
||||||
|
@ -86,7 +89,7 @@ def decode_video_frames_torchvision(
|
||||||
# access closest key frame of the first requested frame
|
# access closest key frame of the first requested frame
|
||||||
# Note: closest key frame timestamp is usally smaller than `first_ts` (e.g. key frame can be the first frame of the video)
|
# Note: closest key frame timestamp is usally smaller than `first_ts` (e.g. key frame can be the first frame of the video)
|
||||||
# for details on what `seek` is doing see: https://pyav.basswood-io.com/docs/stable/api/container.html?highlight=inputcontainer#av.container.InputContainer.seek
|
# for details on what `seek` is doing see: https://pyav.basswood-io.com/docs/stable/api/container.html?highlight=inputcontainer#av.container.InputContainer.seek
|
||||||
reader.seek(first_ts)
|
reader.seek(first_ts, keyframes_only=keyframes_only)
|
||||||
|
|
||||||
# load all frames until last requested frame
|
# load all frames until last requested frame
|
||||||
loaded_frames = []
|
loaded_frames = []
|
||||||
|
@ -130,7 +133,7 @@ def decode_video_frames_torchvision(
|
||||||
|
|
||||||
|
|
||||||
def encode_video_frames(imgs_dir: Path, video_path: Path, fps: int):
|
def encode_video_frames(imgs_dir: Path, video_path: Path, fps: int):
|
||||||
# For more info this setting, see: `lerobot/common/datasets/_video_benchmark/README.md`
|
"""More info on ffmpeg arguments tuning on `lerobot/common/datasets/_video_benchmark/README.md`"""
|
||||||
video_path = Path(video_path)
|
video_path = Path(video_path)
|
||||||
video_path.parent.mkdir(parents=True, exist_ok=True)
|
video_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
@ -140,6 +143,7 @@ def encode_video_frames(imgs_dir: Path, video_path: Path, fps: int):
|
||||||
"-loglevel error "
|
"-loglevel error "
|
||||||
f"-i {str(imgs_dir / 'frame_%06d.png')} "
|
f"-i {str(imgs_dir / 'frame_%06d.png')} "
|
||||||
"-vcodec libx264 "
|
"-vcodec libx264 "
|
||||||
|
"-g 2 "
|
||||||
"-pix_fmt yuv444p "
|
"-pix_fmt yuv444p "
|
||||||
f"{str(video_path)}"
|
f"{str(video_path)}"
|
||||||
)
|
)
|
||||||
|
@ -168,5 +172,11 @@ class VideoFrame:
|
||||||
return self.pa_type
|
return self.pa_type
|
||||||
|
|
||||||
|
|
||||||
# to make it available in HuggingFace `datasets`
|
with warnings.catch_warnings():
|
||||||
|
warnings.filterwarnings(
|
||||||
|
"ignore",
|
||||||
|
"'register_feature' is experimental and might be subject to breaking changes in the future.",
|
||||||
|
category=UserWarning,
|
||||||
|
)
|
||||||
|
# to make VideoFrame available in HuggingFace `datasets`
|
||||||
register_feature(VideoFrame, "VideoFrame")
|
register_feature(VideoFrame, "VideoFrame")
|
||||||
|
|
|
@ -1,3 +1,6 @@
|
||||||
|
# TODO(rcadene, alexander-soare): clean this file
|
||||||
|
"""Borrowed from https://github.com/fyhMer/fowm/blob/main/src/logger.py"""
|
||||||
|
|
||||||
import logging
|
import logging
|
||||||
import os
|
import os
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
|
@ -350,7 +350,7 @@ def train(cfg: dict, out_dir=None, job_name=None):
|
||||||
# create dataloader for offline training
|
# create dataloader for offline training
|
||||||
dataloader = torch.utils.data.DataLoader(
|
dataloader = torch.utils.data.DataLoader(
|
||||||
offline_dataset,
|
offline_dataset,
|
||||||
num_workers=4,
|
num_workers=8,
|
||||||
batch_size=cfg.policy.batch_size,
|
batch_size=cfg.policy.batch_size,
|
||||||
shuffle=True,
|
shuffle=True,
|
||||||
pin_memory=cfg.device != "cpu",
|
pin_memory=cfg.device != "cpu",
|
||||||
|
|
Loading…
Reference in New Issue