| Marpo

Google Gemini API Gets Cheaper with Implicit Caching

TechCrunch

5 days ago

May 08, 2025

Google Gemini API Introduces Implicit Caching for Cost Savings

Google has launched "implicit caching" for its Gemini API, aiming to reduce costs for developers using its latest AI models, Gemini 2.5 Pro and 2.5 Flash. This feature promises up to 75% savings on repetitive context passed through the API.

The rising cost of using advanced AI models has been a concern for developers. Google's implicit caching addresses this by reusing frequently accessed data, reducing the computational workload and associated expenses.

We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache.

We also lowered the min token required to hit caches to 1K on 2.5 Flash and 2K on 2.5 Pro!
Logan Kilpatrick (@OfficialLoganK) May 8, 2025

Caching is a common practice in AI, but Google's previous "explicit caching" required developers to manually define frequently used prompts. This new implicit caching is automatic, simplifying the process and offering cost savings by default.

Google explains that when a request to a Gemini 2.5 model shares a common prefix with previous requests, it's eligible for a cache hit, resulting in cost savings passed back to the developer. The minimum token count for caching is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro, a relatively low threshold.

Following past issues with explicit caching and cost estimations, Google recommends developers place repetitive context at the beginning of requests to maximize cache hits. Dynamic context should be appended at the end.

While Google hasn't provided third-party verification of the promised savings, the automatic nature of implicit caching offers a potentially significant cost reduction for developers using the Gemini API. Early adopter feedback will be crucial in assessing its true effectiveness.

More information on implicit caching can be found in Google's blog post and developer documentation.

Google Gemini API Gets Cheaper with Implicit Caching

Google Gemini API Introduces Implicit Caching for Cost Savings

Similar News

xAI Misses AI Safety Deadline, Raising Concerns

AWS Invests Billions in Saudi Arabia AI Zone with Humain

Amazon Reveals Potential Human Roles in a Robotic Future

Anthropic Launches AI-Powered Web Search API for Developers

Sett Secures $27M to Build AI Agents for Mobile Game Development