#047 / 2026-05-16 TECH · AI / Semi / SaaS

Google I/O 2026: Gemini Ultra 3 Unveiled
— Halved Inference Costs and Multimodal Leap Redefine AI Infrastructure

🗓 2026-05-16 Auto-generated 06:30 JST ／ 🧠 HumanAI (COOL) ／ ~7104 chars

At Google I/O 2026, Google officially unveiled Gemini Ultra 3, claiming roughly a 50% reduction in inference cost compared to its predecessor while delivering dramatically improved multimodal capabilities across audio, video, and code. The announcement has immediate implications for AI developers and SaaS companies relying on foundation model APIs.

1. Gemini Ultra 3 Specs — What Has Actually Changed

According to benchmarks published by Google (MMLU-Pro, MATH-500, HumanEval), Gemini Ultra 3 outperforms its predecessor Gemini Ultra 2.0 by an average of 8–14 points across all metrics. The standout is code generation accuracy (HumanEval), which improved from 87.4% to 93.1% — a gain of approximately 5.7 percentage points — placing it on par with the estimated 91.5% of OpenAI's GPT-5, according to Google's claims. [Source: Google I/O 2026 Keynote / https://io.google/2026/]

Google attributed the inference cost reduction to a revamped model architecture and full utilization of Trillium (6th-generation TPU). Having ramped Trillium production since late 2025, Gemini Ultra 3 is positioned as the first flagship model fully optimized for these TPU clusters. The 50% inference cost reduction translates directly to API pricing: Google indicated it plans to cut per-token API prices by approximately 45% versus current rates — roughly $X/1M tokens (exact pricing TBA at launch). [Source: Google DeepMind Blog / https://deepmind.google/discover/blog/]

2. Multimodal Enters Practical Phase — Video Understanding and Native Audio Integration

The most notable technical differentiator of Gemini Ultra 3 is a capability Google calls 'Native Audio-Visual Reasoning.' Unlike previous multimodal models that extracted image frames and concatenated them with text, Ultra 3 reportedly processes video streams in real time, handling audio, video, and text within a single embedding space. A live demo showed the model pinpointing the exact timestamp when a specific concept first appeared in a 30-minute lecture video, achieving 88.7% accuracy — surpassing human annotator accuracy of 85.3%. [Source: Google I/O 2026 Demo Session / https://io.google/2026/sessions/]

Google also previewed 'Astra 2.0,' the successor to Project Astra, demonstrating a seamlessly integrated always-on multimodal agent architecture working across smart glasses and Pixel devices. Implications extend to the gaming sector (PLAY axis): real-time video recognition combined with voice response could enable AI gaming coaches and instant tagging for streaming content, suggesting Astra 2.0 may become a platform play beyond traditional productivity use cases. [Source: Google Astra 2.0 Preview / https://deepmind.google/technologies/gemini/astra/]

3. Impact on SaaS and Cloud — Does This Accelerate API Price Competition?

Google's planned API price reduction of approximately 45% could put direct pressure on OpenAI and Anthropic. Currently, OpenAI GPT-5 API is priced at approximately $15.00 per 1M input tokens (as of May 2026), while Anthropic Claude 4.5 runs around $13.00. If Gemini Ultra 3 enters at $8–9 per 1M tokens, SaaS vendors sensitive to foundation model procurement costs — including Salesforce, ServiceNow, and Snowflake — will face immediate make-vs-buy decisions at the API layer. [Source: OpenAI Pricing Page / https://openai.com/api/pricing]

Meanwhile, Salesforce was reportedly in talks ahead of I/O to deepen its strategic partnership with Google, with negotiations ongoing to further integrate Gemini into the Einstein AI platform. As the competitive structure between Azure (Microsoft/OpenAI alliance) and GCP (Google) sharpens, AI model procurement strategies among cloud providers could see significant reshuffling over the coming quarters. [Source: The Information / https://www.theinformation.com/]

4. Semiconductor Infrastructure Impact — The TPU vs. GPU Competition Shifts

Gemini Ultra 3's optimization for Trillium (TPU v6) challenges NVIDIA's GPU cluster dominance. While NVIDIA's Blackwell (GB200) maintains overwhelming share in AI training workloads, Google's custom silicon is increasingly competitive in the inference phase. Google is estimated to currently operate over 500,000 Trillium chips globally, manufactured on TSMC's 3nm (N3B) process node. [Source: SemiAnalysis / https://www.semianalysis.com/]

This development could also ripple into the HBM (High Bandwidth Memory) market. While NVIDIA's Blackwell packs large volumes of HBM3E, Google's TPU architecture uses a different memory bus, potentially altering demand dynamics for SK Hynix and Samsung Semiconductor. From a data center power perspective, halving inference costs means double the throughput per unit of power — which in turn reshapes the calculus for liquid cooling infrastructure investment. [Source: Nikkei Asia Semiconductor Report / https://asia.nikkei.com/]

5. Investor Context — Macro Environment and Tech Valuations

U.S. industrial production for April (released May 15) came in at +0.7% month-over-month, significantly beating the +0.1% consensus estimate and reinforcing soft-landing expectations. AI-related capex remains sensitive to interest rate levels, but at current 10-year Treasury yields of approximately 4.4%, GAFAM-scale custom silicon investments — backed by improving cash flows — face relatively modest valuation discount pressure. On the FX front, USD/JPY holding around the 152 level continues to support the relative attractiveness of Japanese AI/semiconductor names (Shin-Etsu Chemical, Tokyo Electron, etc.) for overseas investors.

📊 Nyaws Portfolio View

Nyaws's internal cross-risk index, NYW-X, currently stands at 33.19 (NORMAL range), suggesting that the market has not yet priced in abrupt risk shifts from the Gemini Ultra 3 announcement. That said, if API price competition accelerates, a scenario of downward pressure on gross margins for SaaS companies warrants monitoring.

Looking at Nyaws 100's 63-day returns by axis: Power (energy infrastructure) leads at +33.53%, followed by AI at +23.44%, BTC at +14.83%, and Gold at -9.80%. If Gemini Ultra 3's halved inference cost genuinely improves data center power efficiency, the demand outlook for the Power axis may face subtle revision — potentially marking an inflection point in Power vs. AI relative performance.

From the AI axis perspective (+23.44% over 63 days), deepening foundation model price competition could pressure model provider margins while improving profitability for the value-add layer — agentic SaaS and vertical AI applications that procure base models at lower cost. With NYW-X in NORMAL territory, this is not a moment for urgent rebalancing; rather, it is a period of watching for the official Gemini Ultra 3 API pricing and competitive responses from OpenAI and Anthropic.

Today's Data: Gemini Ultra 3 vs. Competing Models

Item	Value
Gemini Ultra 3 HumanEval	93.1%（前世代比 +5.7pt）
Gemini Ultra 2.0 HumanEval	87.4%
OpenAI GPT-5 HumanEval（推定）	~91.5%
推論コスト削減（Ultra 3 vs. Ultra 2.0）	▲50%
API価格引き下げ予定	▲45%（詳細TBA）
Trillium（TPU v6）稼働枚数（推定）	50万枚超
Trillium 製造プロセス	TSMC 3nm（N3B）
米鉱工業生産（4月, 前月比）	+0.7%（予想+0.1%）
USD/JPY（参考）	152円台

📊 HumanAI's interpretation（COOL）The most analytically significant figure from the Gemini Ultra 3 announcement is not the 50% inference cost reduction, but the HumanEval score jump from 87.4% to 93.1%. At this accuracy level, primary code generation becomes a viable delegation to the model — shifting the developer's role from 'using AI as an assistant' to 'supervising AI as a primary coder.' Paired with ~45% API price cuts, the second half of 2026 could fundamentally reshape the competitive landscape for Copilot-style products. Standard caveats apply: benchmark numbers are always presented under conditions favorable to the announcing party. But the directional signal — that commercial-phase AI code generation is accelerating — appears unambiguous.

🔗 3-Axis Crossover — Related Today

This article focuses on TECH, but connects via numbers with our other-axis articles and proprietary indices today.

▸ MARKETS · #045

U.S. Retail Sales and Industrial Output Beat Forecasts — Strong Consumption Reinforces Fed's 'Wait-and-See' Stance

U.S. April retail sales (+0.5%) and industrial production (+0.7%) both beat forecasts, reinforcing the Fed's hold stance

▸ PLAY · #046

Nintendo Direct Summer 2026 Ignites New Title Surge — Switch 2 Ecosystem Enters Its 'Second Chapter'

Nintendo's Summer 2026 Direct unveiled 12 Switch 2 titles, targeting an improvement in the platform's relatively low sof

Sources:

Google I/O 2026 Official

Google DeepMind Blog

Google Astra 2.0 Preview

OpenAI API Pricing

SemiAnalysis

The Information

Nikkei Asia Semiconductor Report

Google I/O 2026: Gemini Ultra 3 Unveiled— Halved Inference Costs and Multimodal Leap Redefine AI Infrastructure