Skip to main content
Vivgrid provides access to a range of powerful AI models for building enterprise-grade AI agents. We select models based on their performance, cost-effectiveness, and suitability for various tasks. Pricing is transparent and matches the rates of the original providers.

Supported Models

  • gpt-5
  • gpt-5-mini
  • gpt-4.1
  • gpt-4o
  • gemini-2.5-pro
  • gemini-2.5-flash
  • deepseek-r1-0528
  • deepseek-v3.1
  • deepseek-v3.2-exp

How to Set Models

You don’t need to specify a model-name in your API calls. The model for your agent is managed on the backend, so switching models won’t require any code changes. To change the model for your agent, go to the Agent Settings page in the Vivgrid Console.

Pricing

Pricing is calculated in USD per 1 million tokens. The table below details the cost for input, cached, and output tokens for each model.
ModelInput TokenCached TokenOutput Token
gpt-5$1.25$0.125$10.00
gpt-5-mini$0.25$0.03$2.00
gpt-4.1$2.00$0.50$8.00
gpt-4o$2.50$1.25$10.00
gemini-2.5-pro$1.25$0.31$10.00
gemini-2.5-flash$0.30$0.08$2.50
deepseek-r1$1.35-$5.40
deepseek-v3.1$1.14-$4.56
deepseek-v3.2-exp$0.28$0.03$0.42

Capabilities

ModelContext WindowMax Output TokensFunction Calling Support
gpt-51M tokens32KYes
gpt-5-mini1M tokens32KYes
gpt-4.11M tokens32KYes
gpt-4o128K16KYes
gemini-2.5-pro1M64KYes
gemini-2.5-flash1M64KYes
deepseek-r164K8KNo
deepseek-v3.1128K64KNo
deepseek-v3.2-exp64K64KNo

Service Regions & Geo-Distributed Acceleration

Vivgrid intelligently accelerates model inference by automatically routing API requests to the nearest available compute region, minimizing latency and maximizing throughput — all while maintaining data-residency compliance for enterprise workloads. Unlike conventional Global deployments on public clouds (which rely on single centralized endpoints), Vivgrid’s geo-distributed architecture continuously synchronizes AI Tools and model states across multiple data zones, ensuring each region delivers optimized, low-latency performance.
ModelAcceleration ModeAccelerated Regions
gpt-5⚡ Geo-DistributedAccelerated in AMER, EMEA, APAC
gpt-5-mini⚡ Geo-DistributedAccelerated in AMER, EMEA, APAC
gpt-4.1⚡ Geo-DistributedAccelerated in AMER, EMEA, APAC
gpt-4o⚡ Geo-DistributedAccelerated in AMER, EMEA, APAC
gemini-2.5-pro🌐 Global (Centralized)
gemini-2.5-flash🌐 Global (Centralized)
deepseek-r1🌐 Global (Regional Host)Accelerated in APAC
deepseek-v3-0324🌐 Global (Regional Host)Accelerated in APAC
deepseek-v3.2-exp🌐 Global (Regional Host)Accelerated in APAC

Key Highlights

  • Dynamic Routing — Vivgrid automatically detects the user’s region and routes requests to the nearest accelerated node, minimizing cross-continent latency.
  • Tool Synchronization — Function-calling tools and context caches are replicated across all accelerated regions for consistent behavior.
  • Adaptive Caching — Frequently accessed prompts and embeddings are regionally cached to reduce cold-start delays.
  • Seamless Fallback — Traffic automatically re-routes to neighboring accelerated zones during high load or outages.

Notes on Global Models

For Global-only models (e.g., gemini-2.5-pro), the model host remains centralized under the provider’s Global Standard endpoint, limiting VivGrid’s ability to perform regional acceleration.
In contrast, Geo-Distributed models (e.g., gpt-4o, gpt-5-mini) leverage Vivgrid’s orchestration layer — delivering sub-50 ms latency worldwide through intelligent regional acceleration.