Skip to content

Conversation

deadbits
Copy link
Owner

I didn't realize how awful and broken the dataset loading process was. Hopefully this PR is a big improvement.

The utils directory has been removed and all dataset loading is now handled by loader.py.
Users pass a hugging face repo and Vigil config file, and everything else is handled. No more cloning the repos and using that parquet loader.

(venv) adam:vigil-llm/ (dataloader✗) $ python loader.py --help                                                                                                        [0:13:19]
usage: loader.py [-h] -d DATASET -c CONFIG

Load text embedding data into Vigil

options:
  -h, --help            show this help message and exit
  -d DATASET, --dataset DATASET
                        dataset repo name
  -c CONFIG, --config CONFIG
                        config file

@deadbits deadbits added dataset Detection datasets loader Dataset loader labels Nov 18, 2023
@deadbits deadbits self-assigned this Nov 18, 2023
@deadbits deadbits merged commit f91375a into main Nov 18, 2023
@deadbits deadbits deleted the dataloader branch November 18, 2023 05:16
@deadbits deadbits mentioned this pull request Nov 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset Detection datasets loader Dataset loader
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant