Modifications
Introduced the argparse module to parse command-line arguments
Converted previously hardcoded source folder paths, output folder path, max_dim, and FPS values to configurable command-line parameters
Important Note
Note: When merging datasets, the FPS for each collected task must be identical to ensure data synchronization and consistency
This commit fixes 8 linter warnings in the merge.py file, including:
1.Added contextlib import and used contextlib.suppress instead of the try-except-pass pattern
2.Removed unnecessary .keys() calls, using Pythonic way to iterate dictionaries directly
3.Renamed unused loop variables with underscore prefix (idx → _idx, dirs → _dirs, folder → _folder)
4. Combined nested if statements to improve code conciseness
These changes maintain the same functionality while improving code quality and readability to conform to the project's coding standards.
This PR addresses issue regarding merging, converting and editing datasets. The improved merge.py script provides robust functionality for combining multiple datasets with different dimensions, tasks, and indices.
Key Improvements:
1、Multi-dataset Merging: Fixed the logic for merging datasets from different sources while preserving data integrity and continuity.
2、Dimension Handling: Added dynamic dimension detection and padding to ensure all observation and action vectors are consistently sized. The script now supports configurable maximum dimensions (default is 18, but can be overridden).
3. Index Consistency: Implemented continuous global frame indexing to avoid overlapping or gaps in indices after merging.
4、Task Mapping: Fixed task_index updates to ensure proper mapping across merged datasets with different task descriptions.
5、FPS Consistency: Added checks to ensure consistent FPS across datasets, with configurable default values.
6、Directory Structure: Improved output directory organization using chunk-based structure for better scalability.
7、Error Logging: Enhanced error reporting for failed files to aid debugging.
Usage Example:
# Define source folders and output folder
source_folders = [
"/path/to/dataset1/",
"/path/to/dataset2/",
"/path/to/dataset3/"
]
output_folder = "/path/to/merged_dataset/"
# Merge the datasets with custom parameters
merge_datasets(
source_folders,
output_folder,
max_dim=32, # Set maximum dimension for observation.state and action
default_fps=20 # Set default FPS if not specified in datasets
)