AI Module

The AI module provides a unified interface for interacting with multiple large language model providers. Switch between OpenAI, Anthropic, Google, Groq, and local models without changing application code.

Configuration

example.py

python

Copied!

1	from vorte import Vorte
2
3	app = Vorte(
4	auto_load=True,
5	config={
6	"ai": {
7	"default_provider": "openai",
8	"default_model": "gpt-4o",
9	"providers": {
10	"openai": {
11	"api_key": "${'${OPENAI_API_KEY}'}",
12	"models": ["gpt-4o", "gpt-4o-mini", "o1", "o3-mini"],
13	"rate_limit": 60,
14	},
15	"anthropic": {
16	"api_key": "${'${ANTHROPIC_API_KEY}'}",
17	"models": ["claude-sonnet-4-20250514", "claude-3-5-haiku-20241022"],
18	"rate_limit": 50,
19	},
20	"google": {
21	"api_key": "${'${GOOGLE_API_KEY}'}",
22	"models": ["gemini-2.0-flash", "gemini-2.5-pro"],
23	"rate_limit": 30,
24	},
25	"groq": {
26	"api_key": "${'${GROQ_API_KEY}'}",
27	"models": ["llama-3.3-70b-versatile", "mixtral-8x7b-32768"],
28	"rate_limit": 30,
29	},
30	"ollama": {
31	"base_url": "http://localhost:11434",
32	"models": ["llama3", "mistral"],
33	"rate_limit": 100,
34	},
35	},
36	},
37	},
38	)

Basic Usage

example.py

python

Copied!

1	from vorte.ai import AI
2
3	ai = AI()
4
5	response = await ai.complete(
6	prompt="Explain quantum computing in two sentences.",
7	model="gpt-4o",
8	)
9
10	print(response.content)
11	print(response.usage.total_tokens)
12	print(response.latency_ms)

Multi-Provider Routing

The AI module supports multiple routing strategies to distribute requests across providers. Configure the strategy at the application level or override per-request.

Strategy	Description
`STATIC`	Always use the configured default provider
`ROUND_ROBIN`	Cycle through providers in order
`COST_OPTIMIZED`	Select the cheapest provider that supports the model
`LATENCY_OPTIMIZED`	Select the fastest provider based on recent response times
`QUALITY_OPTIMIZED`	Select the highest quality provider for the task
`FAILOVER`	Try primary, then fall back to alternatives on error

Configuring a Routing Strategy

example.py

python

Copied!

1	app = Vorte(
2	auto_load=True,
3	config={
4	"ai": {
5	"routing_strategy": "COST_OPTIMIZED",
6	"routing_config": {
7	"cost_weights": {
8	"gpt-4o": 1.0,
9	"gpt-4o-mini": 0.15,
10	"claude-sonnet-4-20250514": 0.80,
11	"gemini-2.0-flash": 0.075,
12	},
13	"quality_scores": {
14	"gpt-4o": 9.2,
15	"claude-sonnet-4-20250514": 9.0,
16	"gemini-2.0-flash": 8.5,
17	"gpt-4o-mini": 7.8,
18	},
19	},
20	},
21	},
22	)

Per-Request Strategy Override

example.py

python

Copied!

1	from vorte.ai import AI, RoutingStrategy
2
3	ai = AI()
4
5	response = await ai.complete(
6	prompt="Translate to French: Hello, world",
7	strategy=RoutingStrategy.LATENCY_OPTIMIZED,
8	)
9
10	response = await ai.complete(
11	prompt="Write a detailed analysis of...",
12	strategy=RoutingStrategy.QUALITY_OPTIMIZED,
13	model="gpt-4o",
14	)

Streaming Completions

chat_stream.py

python

Copied!

1	from vorte.ai import AI
2
3	ai = AI()
4
5	@router.post("/chat/stream")
6	async def chat_stream(prompt: str):
7	async def generate():
8	async for chunk in ai.stream(prompt=prompt, model="gpt-4o"):
9	yield {"token": chunk.content, "done": chunk.done}
10
11	return VorteSSEResponse(generate())

Embeddings

example.py

python

Copied!

1	from vorte.ai import AI
2
3	ai = AI()
4
5	embeddings = await ai.embed(
6	texts=["Hello world", "Goodbye world"],
7	model="text-embedding-3-small",
8	)
9
10	print(embeddings.vectors[0][:5])
11	print(embeddings.usage.total_tokens)

Structured Output

sentiment.py

python

Copied!

1	from pydantic import BaseModel
2	from vorte.ai import AI
3
4	class Sentiment(BaseModel):
5	label: str
6	score: float
7	confidence: float
8
9	ai = AI()
10
11	result = await ai.complete(
12	prompt="Analyze the sentiment: This product is amazing!",
13	model="gpt-4o",
14	response_schema=Sentiment,
15	)
16
17	print(result.parsed) # Sentiment(label="positive", score=0.95, confidence=0.92)

Provider Health and Metrics

example.py

python

Copied!

1	from vorte.ai import AI
2
3	ai = AI()
4
5	health = await ai.health_check()
6	for provider, status in health.items():
7	print(f"{provider}: {status.latency_ms}ms, {status.status}")
8
9	metrics = ai.get_metrics()
10	print(f"Total requests: {metrics.total_requests}")
11	print(f"Total tokens: {metrics.total_tokens}")
12	print(f"Total cost: {'$'}{metrics.total_cost_usd:.4f}")

AI Module

Configuration

Basic Usage

Multi-Provider Routing

Configuring a Routing Strategy

Per-Request Strategy Override

Streaming Completions

Embeddings

Structured Output

Provider Health and Metrics

Stay in the loop