History

uvos cd1e2756bc initial commit		2024-06-14 08:54:09 +02:00
..
LLavaTagger.py	initial commit	2024-06-14 08:54:09 +02:00
README.md	initial commit	2024-06-14 08:54:09 +02:00
requirements.txt	initial commit	2024-06-14 08:54:09 +02:00

README.md

LLavaTagger

LLavaTagger is a python script that tags images based on a given prompt using the LLaVA multi modal llm. LLavaTagger supports using any number of gpus in ddp parralel for this task.

How to use

first create a python venv and install the required packages into it:

$ python -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

Then run LLavaTagger for instance like so:

$ python LLavaTagger.py --common_description "a image of a cat, " --prompt "describe the cat in 10 to 20 words" --batch 8 --quantize --image_dir ~/cat_images

By default LLavaTagger will run in parallel on all available gpus, if this is undesriable please use the ROCR_VISIBLE_DEVICES= or CUDA_VISIBLE_DEVICES= environment variable to hide unwanted gpus

LLavaTagger will then create a meta.jsonl in the image directory sutable to be used by the scripts of diffusers to train stable diffusion (xl) if other formats are desired ../utils contains scripts to transform the metadata into other formats for instace for the use with kohya

If editing the created tags is desired, QImageTagger can be used for this purpose