![]() |
| picture by https://www.pexels.com/@maca-naparstek-456152/ |
We are trying to figure out how good GenerativeAI vision in counting things for example "sheep". So we wrote a simple program. (https://gist.github.com/dennisseah/b60e153931579e0c01362a1ab700a0d0)
Dependencies
python = "^3.12" openai = "^1.57.2" azure-core = "^1.32.0" azure-identity = "^1.19.0" python-dotenv = "^1.0.1"
.env file content
AZURE_OPENAI_ENDPOINT= AZURE_OPENAI_KEY= AZURE_OPENAI_API_VERSION=2024-06-01 AZURE_OPENAI_DEPLOYED_MODEL_NAME=
I have Azure GPT 4o deployed
Test Cases
test_sets = [
TestSet(
"https://images.pexels.com/photos/2157028/pexels-photo-2157028.jpeg"
description="Simple image with 2 sheep",
expected_sheep_count=2,
),
TestSet(
"https://images.pexels.com/photos/1153756/pexels-photo-1153756.jpeg"
description="Image with 8 sheep. Sorry, this is a complicated one"
expected_sheep_count=8
),
TestSet(
"https://images.pexels.com/photos/69466/sunset-sheep-dike-nordfriesland-69466.jpeg"
description="silhouette"
expected_sheep_count=1
),
TestSet(
"https://images.pexels.com/photos/14191871/pexels-photo-14191871.jpeg"
description="a cow, no sheep"
expected_sheep_count=0
),]
And this is one of the output.
{
"image_url": "https://images.pexels.com/photos/69466/sunset-sheep-dike-nordfriesland-69466.jpeg",
"description": "silhouette",
"expected_sheep_count": 1,
"predicted_sheep_count": 1
}
{
"image_url": "https://images.pexels.com/photos/1153756/pexels-photo-1153756.jpeg",
"description": "Image with 8 sheep. Sorry, this is a complicated one",
"expected_sheep_count": 8,
"predicted_sheep_count": 8
}
{
"image_url": "https://images.pexels.com/photos/2157028/pexels-photo-2157028.jpeg",
"description": "Simple image with 2 sheep",
"expected_sheep_count": 2,
"predicted_sheep_count": 2
}
{
"image_url": "https://images.pexels.com/photos/14191871/pexels-photo-14191871.jpeg",
"description": "a cow, no sheep",
"expected_sheep_count": 0,
"predicted_sheep_count": 0
}It does not always get the complicated one right. Sometimes, it returns 9 or 10.
Honestly, it is pretty good. :-)

Comments
Post a Comment