Building an AI Provider System: The Strategy Pattern for Vision APIs (Part 1 of 3)
Built a swappable AI provider system for automatic image tagging using the Strategy pattern. Switch between OpenAI Vision, Google Vision, or mock providers with a single environment variable - zero code changes.
TL;DR
Built a swappable AI provider system for automatic image tagging using the Strategy pattern. Switch between OpenAI Vision, Google Vision, or mock providers with a single environment variable - zero code changes. This 3-part series covers: (1) Provider architecture, (2) Manual→Automatic evolution, (3) Deployment debugging.
The Problem: Vendor Lock-In
When adding AI vision to Chitram, I faced a critical decision:
What if OpenAI raises prices tomorrow? What if Google Vision releases a better model next month? How do we test without burning money on API calls?
Most tutorials hardcode the provider:
# ❌ BAD: Locked into OpenAI
def tag_image(image_bytes: bytes) -> list[str]:
openai_client = OpenAI(api_key=API_KEY)
response = openai_client.vision.analyze(image_bytes)
return response.tags
# Problem: Switching providers = rewriting this function everywhere
The pain:
- Provider code scattered across the codebase
- Changing providers requires code changes in multiple files
- Testing requires real API calls (expensive)
- No way to A/B test providers
The Solution: Strategy Pattern
Abstract the “what” (analyze image → get tags) from the “how” (OpenAI vs Google vs Mock).
Core idea: Define a common interface, implement it multiple ways, switch via configuration.
┌─────────────────────────────────────────────────────┐
│ Application Code (Caller) │
│ tags = await provider.analyze_image(bytes) │
└────────────────────┬────────────────────────────────┘
│ depends on abstraction
▼
┌─────────────────────────┐
│ AITaggingProvider │ ← Interface
│ - analyze_image() │
└─────────────────────────┘
△
┌────────────┼────────────┐
│ │ │
┌────▼────┐ ┌───▼────┐ ┌───▼─────┐
│ Mock │ │ OpenAI │ │ Google │ ← Implementations
│Provider │ │Provider│ │Provider │
└─────────┘ └────────┘ └─────────┘
The Interface
# app/services/ai/base.py
from abc import ABC, abstractmethod
from dataclasses import dataclass
@dataclass
class AITag:
"""AI-generated tag with confidence score."""
name: str # Tag name (lowercase, normalized)
confidence: int # Confidence 0-100
category: str | None # Optional category (object, scene, color)
class AITaggingProvider(ABC):
"""Abstract base for AI vision providers."""
@abstractmethod
async def analyze_image(self, image_bytes: bytes) -> list[AITag]:
"""
Analyze image and return tag suggestions.
Args:
image_bytes: Raw image data (JPEG/PNG)
Returns:
List of AI-generated tags with confidence scores
Raises:
AIProviderError: If provider fails
"""
pass
Why this works:
- ✅ Single responsibility: Each provider implements one interface
- ✅ Open/closed: Add new providers without changing existing code
- ✅ Dependency inversion: Code depends on abstraction, not concrete providers
Three Implementations
Provider comparison:
| Provider | Use Case | Cost/Image | Speed | Production Ready |
|---|---|---|---|---|
| Mock | Local dev, CI/CD, unit tests | $0 | Instant | ❌ (Testing only) |
| OpenAI Vision | Production tagging | ~$0.004 | 2-3 sec | ✅ |
| Google Vision | Future (cost optimization) | ~$0.0015 | 1-2 sec | 🚧 (Planned) |
1. MockAIProvider (Free, Testing)
# app/services/ai/mock_provider.py
class MockAIProvider(AITaggingProvider):
"""Returns predictable fake tags for testing."""
async def analyze_image(self, image_bytes: bytes) -> list[AITag]:
"""Return mock tags without calling any API."""
return [
AITag(name="mock-object", confidence=99, category="object"),
AITag(name="mock-scene", confidence=85, category="scene"),
AITag(name="mock-color", confidence=75, category="color"),
]
Use case: Local dev, CI/CD, unit tests Cost: $0 Speed: Instant
2. OpenAIVisionProvider (Production)
# app/services/ai/openai_vision.py
class OpenAIVisionProvider(AITaggingProvider):
"""OpenAI Vision API provider using gpt-4o-mini."""
def __init__(self, api_key: str, model: str = "gpt-4o-mini", max_tags: int = 5):
self.client = AsyncOpenAI(api_key=api_key)
self.model = model
self.max_tags = max_tags
self.prompt = (
f"Analyze this image and provide {max_tags} descriptive tags. "
"Return only tag names separated by commas, no explanations."
)
async def analyze_image(self, image_bytes: bytes) -> list[AITag]:
"""Call OpenAI Vision API."""
# Encode to base64
image_base64 = base64.b64encode(image_bytes).decode("utf-8")
# Call API
response = await self.client.chat.completions.create(
model=self.model,
messages=[{
"role": "user",
"content": [
{"type": "text", "text": self.prompt},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_base64}",
"detail": "low" # Cost optimization
}
}
]
}],
max_tokens=150 # Limit response length
)
# Parse response
tags_text = response.choices[0].message.content
tag_names = [tag.strip().lower() for tag in tags_text.split(",")]
# Convert to AITag objects
return [
AITag(name=name, confidence=90, category=None)
for name in tag_names[:self.max_tags]
]
Use case: Production tagging Cost: ~$0.004/image Speed: ~2-3 seconds
3. GoogleVisionProvider (Future)
# app/services/ai/google_vision.py
class GoogleVisionProvider(AITaggingProvider):
"""Google Cloud Vision API provider (Phase 7)."""
async def analyze_image(self, image_bytes: bytes) -> list[AITag]:
# TODO: 62% cheaper ($0.0015/image vs $0.004)
pass
The Factory: Configuration-Driven Switching
How configuration controls implementation:
Environment Variable Factory Decision
AI_PROVIDER=mock → MockAIProvider()
AI_PROVIDER=openai → OpenAIVisionProvider(api_key, model, max_tags)
AI_PROVIDER=google → GoogleVisionProvider(api_key, max_tags)
# app/services/ai/__init__.py
def create_ai_provider(settings: Settings) -> AITaggingProvider:
"""
Create AI provider based on configuration.
Environment variable AI_PROVIDER controls which implementation:
- "mock" → MockAIProvider (free, testing)
- "openai" → OpenAIVisionProvider (production)
- "google" → GoogleVisionProvider (future)
"""
if settings.ai_provider == "mock":
return MockAIProvider()
if settings.ai_provider == "openai":
if not settings.openai_api_key:
raise AIProviderError("OPENAI_API_KEY not configured")
return OpenAIVisionProvider(
api_key=settings.openai_api_key,
model=settings.openai_vision_model,
max_tags=settings.ai_max_tags_per_image,
)
if settings.ai_provider == "google":
if not settings.google_vision_api_key:
raise AIProviderError("GOOGLE_VISION_API_KEY not configured")
return GoogleVisionProvider(
api_key=settings.google_vision_api_key,
max_tags=settings.ai_max_tags_per_image,
)
raise ValueError(f"Unknown AI provider: {settings.ai_provider}")
Usage: One Line Everywhere
Configuration (.env):
# Development (free, instant)
AI_PROVIDER=mock
# Production (real tags, costs money)
AI_PROVIDER=openai
OPENAI_API_KEY=sk-proj-abc123...
AI_MAX_TAGS_PER_IMAGE=5
OPENAI_VISION_MODEL=gpt-4o-mini
# Future (cost optimization)
AI_PROVIDER=google
GOOGLE_VISION_API_KEY=...
Application code (same everywhere):
# Factory handles the decision
provider = create_ai_provider(settings)
tags = await provider.analyze_image(image_bytes)
# Works with mock, OpenAI, Google - zero code changes
Benefits:
- ✅ Change provider = change env var (no deploy needed)
- ✅ Test with mock provider (free, no API costs)
- ✅ Switch to cheaper provider later (easy cost optimization)
- ✅ A/B test providers (run both, compare results)
Key Takeaway
Abstract early to avoid vendor lock-in.
The Strategy pattern costs 30 minutes upfront to design the interface, but saves hours later:
- Adding Google Vision = implement interface, update factory (20 minutes)
- Testing without API calls = use MockAIProvider (instant)
- Switching providers in production = change environment variable (2 seconds)
Next: Part 2 - Manual→Automatic Evolution covers why we started with a manual /ai-tag endpoint (Phase 5) before going automatic with Celery (Phase 6).
Related Resources
This Series:
External:
Live Demo: https://chitram.io Source Code: https://github.com/abhi10/chitram
License: CC BY 4.0