picture by https://www.pexels.com/@maca-naparstek-456152/ |
We are trying to figure out how good GenerativeAI vision in counting things for example "sheep". So we wrote a simple program. (https://gist.github.com/dennisseah/b60e153931579e0c01362a1ab700a0d0)
Dependencies
python = "^3.12" openai = "^1.57.2" azure-core = "^1.32.0" azure-identity = "^1.19.0" python-dotenv = "^1.0.1"
.env file content
AZURE_OPENAI_ENDPOINT= AZURE_OPENAI_KEY= AZURE_OPENAI_API_VERSION=2024-06-01 AZURE_OPENAI_DEPLOYED_MODEL_NAME=
I have Azure GPT 4o deployed
Test Cases
test_sets = [ TestSet( "https://images.pexels.com/photos/2157028/pexels-photo-2157028.jpeg" description="Simple image with 2 sheep", expected_sheep_count=2, ), TestSet( "https://images.pexels.com/photos/1153756/pexels-photo-1153756.jpeg" description="Image with 8 sheep. Sorry, this is a complicated one" expected_sheep_count=8 ), TestSet( "https://images.pexels.com/photos/69466/sunset-sheep-dike-nordfriesland-69466.jpeg" description="silhouette" expected_sheep_count=1 ), TestSet( "https://images.pexels.com/photos/14191871/pexels-photo-14191871.jpeg" description="a cow, no sheep" expected_sheep_count=0 ),]
And this is one of the output.
{ "image_url": "https://images.pexels.com/photos/69466/sunset-sheep-dike-nordfriesland-69466.jpeg", "description": "silhouette", "expected_sheep_count": 1, "predicted_sheep_count": 1 } { "image_url": "https://images.pexels.com/photos/1153756/pexels-photo-1153756.jpeg", "description": "Image with 8 sheep. Sorry, this is a complicated one", "expected_sheep_count": 8, "predicted_sheep_count": 8 } { "image_url": "https://images.pexels.com/photos/2157028/pexels-photo-2157028.jpeg", "description": "Simple image with 2 sheep", "expected_sheep_count": 2, "predicted_sheep_count": 2 } { "image_url": "https://images.pexels.com/photos/14191871/pexels-photo-14191871.jpeg", "description": "a cow, no sheep", "expected_sheep_count": 0, "predicted_sheep_count": 0 }
It does not always get the complicated one right. Sometimes, it returns 9 or 10.
Honestly, it is pretty good. :-)
Comments
Post a Comment