A collection of video processing scripts designed to curate pre-training and fine-tuning data. It includes tools to split video files into frames and deduplicate them based on hash similarity, facilitating efficient data preparation for machine learning models.