Annotation


Prodigy

Syntax for calling out annotation ui
Basic prodigy recipe dataset source --label
Text Classification textcat.manual news_topics ./**.jsonl --label A,B,C
Image Segmentation image.manual photo_objects ./imfolder
Audio and Video audio.manual speaker_data ./recordings

The dataset name is a name defined by the user.

Custom Recipe
Custom prodigy recipe dataset source -F
Image my-custom-recipe my_dataset ./imfolder -F recipe.py

The recipe name is defined in the python script by the decorator name @prodigy.recipe("my-custom-recipe"). The command passes the dataset name to the input (dataset) of the function with decorator, same as the source to images_path, etc. If the function does not have the second argument, the argument of source in the command line can be reduced. The script should have the following basic format,


import prodigy
from prodigy.components.loaders import Images
from prodigy.components.preprocess import fetch_media
from prodigy.util import b64_uri_to_bytes

@prodigy.recipe("image-bbox")
def image_label(dataset,images_path):
    def get_stream(images_path):
        readsth...
        data...
        yield {data}
    return {"dataset": dataset,
            "stream":  get_stream(),
            "view_id": "",
            "config":  (optional, {"labels" or "host", etc.})}
    

You can choose use their ui by changing view_id, for example, image, image_manual, etc. Or you can use a set of them via blocks. The labels are defined in config, and host ip address as well.

To generate a transcript, run prodigy db-out dataset > **.jsonl.

Drop the dataset, prodigy drop dataset.

List database. prodigy stats -l .

The stream can just be a list of dictionaries from jsonl file, before passing the image path to get the image, the image must turn into uri, via fetch media.stream = fetch_media(stream, ["image"], skip=True)

Key bindings
Basic Image Audio
  • accept:A
  • reject:X
  • Ignore:SPACE
  • undo:BACKSPACE/DEL
  • option 1-9:1-9
  • rect:R
  • polygon:T,P
  • freehand:F
  • delete:D
  • reset:Q
  • play/pause:ENTER
  • zoom:Z
  • loop:L

References


  1. Prodigy - An annotation tool for AI, Machine Learning & NLP
  2. Scalable Active Learning for Object Detection
  3. Active learning -- Uncertainty Sampling (P3)