GenerativeAI Vision: Counting sheep

picture by https://www.pexels.com/@maca-naparstek-456152/
picture by https://www.pexels.com/@maca-naparstek-456152/

We are trying to figure out how good GenerativeAI vision in counting things for example "sheep". So we wrote a simple program. (https://gist.github.com/dennisseah/b60e153931579e0c01362a1ab700a0d0)

Dependencies

python = "^3.12"
openai = "^1.57.2"
azure-core = "^1.32.0"
azure-identity = "^1.19.0"
python-dotenv = "^1.0.1"

.env file content

AZURE_OPENAI_ENDPOINT=
AZURE_OPENAI_KEY=
AZURE_OPENAI_API_VERSION=2024-06-01
AZURE_OPENAI_DEPLOYED_MODEL_NAME=

I have Azure GPT 4o deployed

Test Cases

test_sets = [
    TestSet(
        "https://images.pexels.com/photos/2157028/pexels-photo-2157028.jpeg"
        description="Simple image with 2 sheep",
        expected_sheep_count=2,
    ),
    TestSet(
        "https://images.pexels.com/photos/1153756/pexels-photo-1153756.jpeg"
        description="Image with 8 sheep. Sorry, this is a complicated one"
        expected_sheep_count=8
    ),
    TestSet(
        "https://images.pexels.com/photos/69466/sunset-sheep-dike-nordfriesland-69466.jpeg"
        description="silhouette"
        expected_sheep_count=1
    ),
    TestSet(
        "https://images.pexels.com/photos/14191871/pexels-photo-14191871.jpeg"
        description="a cow, no sheep"
        expected_sheep_count=0
    ),]

And this is one of the output.
{
  "image_url": "https://images.pexels.com/photos/69466/sunset-sheep-dike-nordfriesland-69466.jpeg",
  "description": "silhouette",
  "expected_sheep_count": 1,
  "predicted_sheep_count": 1
}
{
  "image_url": "https://images.pexels.com/photos/1153756/pexels-photo-1153756.jpeg",
  "description": "Image with 8 sheep. Sorry, this is a complicated one",
  "expected_sheep_count": 8,
  "predicted_sheep_count": 8
}
{
  "image_url": "https://images.pexels.com/photos/2157028/pexels-photo-2157028.jpeg",
  "description": "Simple image with 2 sheep",
  "expected_sheep_count": 2,
  "predicted_sheep_count": 2
}
{
  "image_url": "https://images.pexels.com/photos/14191871/pexels-photo-14191871.jpeg",
  "description": "a cow, no sheep",
  "expected_sheep_count": 0,
  "predicted_sheep_count": 0
}

It does not always get the complicated one right. Sometimes, it returns 9 or 10.

Honestly, it is pretty good. :-)





Comments