# Port DROID 1.0.1 dataset to LeRobotDataset ## Download TODO It will take 2 TB in your local disk. ## Port on a single computer First, install tensorflow dataset utilities to read from raw files: ```bash pip install tensorflow pip install tensorflow_datasets ``` Then run this script to start porting the dataset: ```bash python examples/port_datasets/droid_rlds/port_droid.py \ --raw-dir /your/data/droid/1.0.1 \ --repo-id your_id/droid_1.0.1 \ --push-to-hub ``` It will take 400GB in your local disk. As usual, your LeRobotDataset will be stored in your huggingface/lerobot cache folder. WARNING: it will take 7 days for porting the dataset locally and 3 days to upload, so we will need to parallelize over multiple nodes on a slurm cluster. NOTE: For development, run this script to start porting a shard: ```bash python examples/port_datasets/droid_rlds/port.py \ --raw-dir /your/data/droid/1.0.1 \ --repo-id your_id/droid_1.0.1 \ --num-shards 2048 \ --shard-index 0 ``` ## Port over SLURM Install slurm utilities from Hugging Face: ```bash pip install datatrove ``` ### 1. Port one shard per job Run this script to start porting shards of the dataset: ```bash python examples/port_datasets/droid_rlds/slurm_port_shards.py \ --raw-dir /your/data/droid/1.0.1 \ --repo-id your_id/droid_1.0.1 \ --logs-dir /your/logs \ --job-name port_droid \ --partition your_partition \ --workers 2048 \ --cpus-per-task 8 \ --mem-per-cpu 1950M ``` **Note on how to set your command line arguments** Regarding `--partition`, find yours by running: ```bash info --format="%R"` ``` and select the CPU partition if you have one. No GPU needed. Regarding `--workers`, it is the number of slurm jobs you will launch in parallel. 2048 is the maximum number, since there is 2048 shards in Droid. This big number will certainly max-out your cluster. Regarding `--cpus-per-task` and `--mem-per-cpu`, by default it will use ~16GB of RAM (8*1950M) which is recommended to load the raw frames and 8 CPUs which can be useful to parallelize the encoding of the frames. Find the number of CPUs and Memory of the nodes of your partition by running: ```bash sinfo -N -p your_partition -h -o "%N cpus=%c mem=%m" ``` **Useful commands to check progress and debug** Check if your jobs are running: ```bash squeue -u $USER` ``` You should see a list with job indices like `15125385_155` where `15125385` is the index of the run and `155` is the worker index. The output/print of this worker is written in real time in `/your/logs/job_name/slurm_jobs/15125385_155.out`. For instance, you can inspect the content of this file by running `less /your/logs/job_name/slurm_jobs/15125385_155.out`. Check the progression of your jobs by running: ```bash jobs_status /your/logs ``` If it's not 100% and no more slurm job is running, it means that some of them failed. Inspect the logs by running: ```bash failed_logs /your/logs/job_name ``` If there is an issue in the code, you can fix it in debug mode with `--slurm 0` which allows to set breakpoint: ```bash python examples/port_datasets/droid_rlds/slurm_port_shards.py --slurm 0 ... ``` And you can relaunch the same command, which will skip the completed jobs: ```bash python examples/port_datasets/droid_rlds/slurm_port_shards.py --slurm 1 ... ``` Once all jobs are completed, you will have one dataset per shard (e.g. `droid_1.0.1_world_2048_rank_1594`) saved on disk in your `/lerobot/home/dir/your_id` directory. You can find your `/lerobot/home/dir` by running: ```bash python -c "from lerobot.common.constants import HF_LEROBOT_HOME;print(HF_LEROBOT_HOME)" ``` ### 2. Aggregate all shards Run this script to start aggregation: ```bash python examples/port_datasets/droid_rlds/slurm_aggregate_shards.py \ --repo-id your_id/droid_1.0.1 \ --logs-dir /your/logs \ --job-name aggr_droid \ --partition your_partition \ --workers 2048 \ --cpus-per-task 8 \ --mem-per-cpu 1950M ``` Once all jobs are completed, you will have one dataset your `/lerobot/home/dir/your_id/droid_1.0.1` directory. ### 3. Upload dataset Run this script to start uploading: ```bash python examples/port_datasets/droid_rlds/slurm_upload.py \ --repo-id your_id/droid_1.0.1 \ --logs-dir /your/logs \ --job-name upload_droid \ --partition your_partition \ --workers 50 \ --cpus-per-task 4 \ --mem-per-cpu 1950M ```