Basic | prodigy | recipe | dataset | source | --label |
---|---|---|---|---|---|
Text Classification | textcat.manual | news_topics | ./**.jsonl | --label A,B,C | |
Image Segmentation | image.manual | photo_objects | ./imfolder | ||
Audio and Video | audio.manual | speaker_data | ./recordings |
The dataset name is a name defined by the user.
Custom | prodigy | recipe | dataset | source | -F |
---|---|---|---|---|---|
Image | my-custom-recipe | my_dataset | ./imfolder | -F recipe.py |
The recipe name is defined in the python script by the decorator name @prodigy.recipe("my-custom-recipe")
. The command passes the dataset name to the input (dataset) of the function with decorator, same as the source to images_path, etc. If the function does not have the second argument, the argument of source in the command line can be reduced. The script should have the following basic format,
import prodigy
from prodigy.components.loaders import Images
from prodigy.components.preprocess import fetch_media
from prodigy.util import b64_uri_to_bytes
@prodigy.recipe("image-bbox")
def image_label(dataset,images_path):
def get_stream(images_path):
readsth...
data...
yield {data}
return {"dataset": dataset,
"stream": get_stream(),
"view_id": "",
"config": (optional, {"labels" or "host", etc.})}
You can choose use their ui by changing view_id
, for example, image
, image_manual
, etc. Or you can use a set of them via
blocks
. The labels are defined in config
, and host ip address as well.
To generate a transcript, run prodigy db-out dataset > **.jsonl
.
Drop the dataset, prodigy drop dataset
.
List database. prodigy stats -l
.
The stream can just be a list of dictionaries from jsonl file, before passing the image path to get the image, the image must turn into uri, via fetch media.stream = fetch_media(stream, ["image"], skip=True)
Basic | Image | Audio |
---|---|---|
|
|
|