|  | 
| picture by https://www.pexels.com/@maca-naparstek-456152/ | 
We are trying to figure out how good GenerativeAI vision in counting things for example "sheep". So we wrote a simple program. (https://gist.github.com/dennisseah/b60e153931579e0c01362a1ab700a0d0)
Dependencies
python = "^3.12" openai = "^1.57.2" azure-core = "^1.32.0" azure-identity = "^1.19.0" python-dotenv = "^1.0.1"
.env file content
AZURE_OPENAI_ENDPOINT= AZURE_OPENAI_KEY= AZURE_OPENAI_API_VERSION=2024-06-01 AZURE_OPENAI_DEPLOYED_MODEL_NAME=
I have Azure GPT 4o deployed
Test Cases
test_sets = [
    TestSet(
        "https://images.pexels.com/photos/2157028/pexels-photo-2157028.jpeg"
        description="Simple image with 2 sheep",
        expected_sheep_count=2,
    ),
    TestSet(
        "https://images.pexels.com/photos/1153756/pexels-photo-1153756.jpeg"
        description="Image with 8 sheep. Sorry, this is a complicated one"
        expected_sheep_count=8
    ),
    TestSet(
        "https://images.pexels.com/photos/69466/sunset-sheep-dike-nordfriesland-69466.jpeg"
        description="silhouette"
        expected_sheep_count=1
    ),
    TestSet(
        "https://images.pexels.com/photos/14191871/pexels-photo-14191871.jpeg"
        description="a cow, no sheep"
        expected_sheep_count=0
    ),]
  And this is one of the output.
{
  "image_url": "https://images.pexels.com/photos/69466/sunset-sheep-dike-nordfriesland-69466.jpeg",
  "description": "silhouette",
  "expected_sheep_count": 1,
  "predicted_sheep_count": 1
}
{
  "image_url": "https://images.pexels.com/photos/1153756/pexels-photo-1153756.jpeg",
  "description": "Image with 8 sheep. Sorry, this is a complicated one",
  "expected_sheep_count": 8,
  "predicted_sheep_count": 8
}
{
  "image_url": "https://images.pexels.com/photos/2157028/pexels-photo-2157028.jpeg",
  "description": "Simple image with 2 sheep",
  "expected_sheep_count": 2,
  "predicted_sheep_count": 2
}
{
  "image_url": "https://images.pexels.com/photos/14191871/pexels-photo-14191871.jpeg",
  "description": "a cow, no sheep",
  "expected_sheep_count": 0,
  "predicted_sheep_count": 0
}It does not always get the complicated one right. Sometimes, it returns 9 or 10.
Honestly, it is pretty good. :-)
Comments
Post a Comment