AI Module

The AI module provides a unified interface for interacting with multiple large language model providers. Switch between OpenAI, Anthropic, Google, Groq, and local models without changing application code.

Configuration

example.py
python
Copied!
1from vorte import Vorte
2
3app = Vorte(
4 auto_load=True,
5 config={
6 "ai": {
7 "default_provider": "openai",
8 "default_model": "gpt-4o",
9 "providers": {
10 "openai": {
11 "api_key": "${'${OPENAI_API_KEY}'}",
12 "models": ["gpt-4o", "gpt-4o-mini", "o1", "o3-mini"],
13 "rate_limit": 60,
14 },
15 "anthropic": {
16 "api_key": "${'${ANTHROPIC_API_KEY}'}",
17 "models": ["claude-sonnet-4-20250514", "claude-3-5-haiku-20241022"],
18 "rate_limit": 50,
19 },
20 "google": {
21 "api_key": "${'${GOOGLE_API_KEY}'}",
22 "models": ["gemini-2.0-flash", "gemini-2.5-pro"],
23 "rate_limit": 30,
24 },
25 "groq": {
26 "api_key": "${'${GROQ_API_KEY}'}",
27 "models": ["llama-3.3-70b-versatile", "mixtral-8x7b-32768"],
28 "rate_limit": 30,
29 },
30 "ollama": {
31 "base_url": "http://localhost:11434",
32 "models": ["llama3", "mistral"],
33 "rate_limit": 100,
34 },
35 },
36 },
37 },
38)

Basic Usage

example.py
python
Copied!
1from vorte.ai import AI
2
3ai = AI()
4
5response = await ai.complete(
6 prompt="Explain quantum computing in two sentences.",
7 model="gpt-4o",
8)
9
10print(response.content)
11print(response.usage.total_tokens)
12print(response.latency_ms)

Multi-Provider Routing

The AI module supports multiple routing strategies to distribute requests across providers. Configure the strategy at the application level or override per-request.

StrategyDescription
STATICAlways use the configured default provider
ROUND_ROBINCycle through providers in order
COST_OPTIMIZEDSelect the cheapest provider that supports the model
LATENCY_OPTIMIZEDSelect the fastest provider based on recent response times
QUALITY_OPTIMIZEDSelect the highest quality provider for the task
FAILOVERTry primary, then fall back to alternatives on error

Configuring a Routing Strategy

example.py
python
Copied!
1app = Vorte(
2 auto_load=True,
3 config={
4 "ai": {
5 "routing_strategy": "COST_OPTIMIZED",
6 "routing_config": {
7 "cost_weights": {
8 "gpt-4o": 1.0,
9 "gpt-4o-mini": 0.15,
10 "claude-sonnet-4-20250514": 0.80,
11 "gemini-2.0-flash": 0.075,
12 },
13 "quality_scores": {
14 "gpt-4o": 9.2,
15 "claude-sonnet-4-20250514": 9.0,
16 "gemini-2.0-flash": 8.5,
17 "gpt-4o-mini": 7.8,
18 },
19 },
20 },
21 },
22)

Per-Request Strategy Override

example.py
python
Copied!
1from vorte.ai import AI, RoutingStrategy
2
3ai = AI()
4
5response = await ai.complete(
6 prompt="Translate to French: Hello, world",
7 strategy=RoutingStrategy.LATENCY_OPTIMIZED,
8)
9
10response = await ai.complete(
11 prompt="Write a detailed analysis of...",
12 strategy=RoutingStrategy.QUALITY_OPTIMIZED,
13 model="gpt-4o",
14)

Streaming Completions

chat_stream.py
python
Copied!
1from vorte.ai import AI
2
3ai = AI()
4
5@router.post("/chat/stream")
6async def chat_stream(prompt: str):
7 async def generate():
8 async for chunk in ai.stream(prompt=prompt, model="gpt-4o"):
9 yield {"token": chunk.content, "done": chunk.done}
10
11 return VorteSSEResponse(generate())

Embeddings

example.py
python
Copied!
1from vorte.ai import AI
2
3ai = AI()
4
5embeddings = await ai.embed(
6 texts=["Hello world", "Goodbye world"],
7 model="text-embedding-3-small",
8)
9
10print(embeddings.vectors[0][:5])
11print(embeddings.usage.total_tokens)

Structured Output

sentiment.py
python
Copied!
1from pydantic import BaseModel
2from vorte.ai import AI
3
4class Sentiment(BaseModel):
5 label: str
6 score: float
7 confidence: float
8
9ai = AI()
10
11result = await ai.complete(
12 prompt="Analyze the sentiment: This product is amazing!",
13 model="gpt-4o",
14 response_schema=Sentiment,
15)
16
17print(result.parsed) # Sentiment(label="positive", score=0.95, confidence=0.92)

Provider Health and Metrics

example.py
python
Copied!
1from vorte.ai import AI
2
3ai = AI()
4
5health = await ai.health_check()
6for provider, status in health.items():
7 print(f"{provider}: {status.latency_ms}ms, {status.status}")
8
9metrics = ai.get_metrics()
10print(f"Total requests: {metrics.total_requests}")
11print(f"Total tokens: {metrics.total_tokens}")
12print(f"Total cost: {'$'}{metrics.total_cost_usd:.4f}")
Stay in the loop

Get Vorte release notes, module guides, and developer deep-dives. No spam — unsubscribe anytime.