Supported Models
gpt-5gpt-5-minigpt-4.1gpt-4ogemini-2.5-progemini-2.5-flashdeepseek-r1-0528deepseek-v3.1deepseek-v3.2-exp
How to Set Models
You don’t need to specify amodel-name in your API calls. The model for your agent is managed on the backend, so switching models won’t require any code changes.
To change the model for your agent, go to the Agent Settings page in the Vivgrid Console.
Pricing
Pricing is calculated in USD per 1 million tokens. The table below details the cost for input, cached, and output tokens for each model.| Model | Input Token | Cached Token | Output Token | 
|---|---|---|---|
| gpt-5 | $1.25 | $0.125 | $10.00 | 
| gpt-5-mini | $0.25 | $0.03 | $2.00 | 
| gpt-4.1 | $2.00 | $0.50 | $8.00 | 
| gpt-4o | $2.50 | $1.25 | $10.00 | 
| gemini-2.5-pro | $1.25 | $0.31 | $10.00 | 
| gemini-2.5-flash | $0.30 | $0.08 | $2.50 | 
| deepseek-r1 | $1.35 | - | $5.40 | 
| deepseek-v3.1 | $1.14 | - | $4.56 | 
| deepseek-v3.2-exp | $0.28 | $0.03 | $0.42 | 
Capabilities
| Model | Context Window | Max Output Tokens | Function Calling Support | 
|---|---|---|---|
| gpt-5 | 1M tokens | 32K | Yes | 
| gpt-5-mini | 1M tokens | 32K | Yes | 
| gpt-4.1 | 1M tokens | 32K | Yes | 
| gpt-4o | 128K | 16K | Yes | 
| gemini-2.5-pro | 1M | 64K | Yes | 
| gemini-2.5-flash | 1M | 64K | Yes | 
| deepseek-r1 | 64K | 8K | No | 
| deepseek-v3.1 | 128K | 64K | No | 
| deepseek-v3.2-exp | 64K | 64K | No | 
Service Regions & Geo-Distributed Acceleration
Vivgrid intelligently accelerates model inference by automatically routing API requests to the nearest available compute region, minimizing latency and maximizing throughput — all while maintaining data-residency compliance for enterprise workloads. Unlike conventional Global deployments on public clouds (which rely on single centralized endpoints), Vivgrid’s geo-distributed architecture continuously synchronizes AI Tools and model states across multiple data zones, ensuring each region delivers optimized, low-latency performance.| Model | Acceleration Mode | Accelerated Regions | 
|---|---|---|
| gpt-5 | ⚡ Geo-Distributed | Accelerated in AMER, EMEA, APAC | 
| gpt-5-mini | ⚡ Geo-Distributed | Accelerated in AMER, EMEA, APAC | 
| gpt-4.1 | ⚡ Geo-Distributed | Accelerated in AMER, EMEA, APAC | 
| gpt-4o | ⚡ Geo-Distributed | Accelerated in AMER, EMEA, APAC | 
| gemini-2.5-pro | 🌐 Global (Centralized) | — | 
| gemini-2.5-flash | 🌐 Global (Centralized) | — | 
| deepseek-r1 | 🌐 Global (Regional Host) | Accelerated in APAC | 
| deepseek-v3-0324 | 🌐 Global (Regional Host) | Accelerated in APAC | 
| deepseek-v3.2-exp | 🌐 Global (Regional Host) | Accelerated in APAC | 
Key Highlights
- Dynamic Routing — Vivgrid automatically detects the user’s region and routes requests to the nearest accelerated node, minimizing cross-continent latency.
 - Tool Synchronization — Function-calling tools and context caches are replicated across all accelerated regions for consistent behavior.
 - Adaptive Caching — Frequently accessed prompts and embeddings are regionally cached to reduce cold-start delays.
 - Seamless Fallback — Traffic automatically re-routes to neighboring accelerated zones during high load or outages.
 
Notes on Global Models
For Global-only models (e.g.,gemini-2.5-pro), the model host remains centralized under the provider’s Global Standard endpoint, limiting VivGrid’s ability to perform regional acceleration.In contrast, Geo-Distributed models (e.g.,
gpt-4o, gpt-5-mini) leverage Vivgrid’s orchestration layer — delivering sub-50 ms latency worldwide through intelligent regional acceleration.