Latest

Fresh from the feed

Filter by timeframe and category to zero in on the moves that matter.

French agency Pajemploi reports data breach affecting 1.2M people
news
BleepingComputer1 day ago

Pajemploi, the French social security service for parents and home-based childcare providers, has suffered a data breach that may have exposed personal information of 1.2 million individuals. [...]

#open_source
Score · 2.89
Microsoft and Nvidia to invest up to $15bn in OpenAI rival Anthropic
news
Financial Times1 day ago

AI start-up commits to buying $30bn in computing capacity from Microsoft in data centres powered by Nvidia chips

#ai
Score · 2.89
Shop the best early Kindle deals for Black Friday 2025
news
ZDNet - Security1 day ago

We're keeping a close eye on the best early Black Friday Kindle deals, including bundles on base models and accessories, ahead of the holiday season.

Score · 2.83
Google CEO: If an AI bubble pops, no one is getting out clean
news
Ars Technica1 day ago

Sundar Pichai says no company is immune if AI bubble bursts, echoing dotcom fears.

#ai
Score · 2.88
The 5 FREE Must-Read Books for Every Data Scientist
news
KDnuggets1 day ago

Want to level up your data skills? Check out these 5 free books that explain data science clearly and practically.

#ai
Score · 2.68
Fedora vs. Ubuntu: How to choose your next Linux distro (and which one I use)
news
ZDNet - Security1 day ago

If you're looking for a new operating system or want to do a bit of distro hopping, you might be considering either Fedora or Ubuntu. Let me help you choose which to try first.

Score · 2.83
Microsoft's new AI agents create your Word, Excel, and PowerPoint projects now
news
ZDNet - Security1 day ago

They can generate documents, spreadsheets, and presentations from simple text prompts. Here's how to make the most of it.

#ai
Score · 2.82
Microsoft's new AI agents create your Word, Excel, and PowerPoint projects now - ZDNET
news
ZDNET (Google News)1 day ago

Microsoft's new AI agents create your Word, Excel, and PowerPoint projects now ZDNET

#ai
Score · 2.73
Google unveils Gemini 3 AI model and AI-first IDE called Antigravity
news
Ars Technica1 day ago

Google's flagship AI model is getting its second major upgrade this year.

#ai
Score · 2.87
IRS Accessed Massive Database of Americans Flights Without a Warrant
news
404 Media1 day ago

A bipartisan letter reveals the IRS searched a database of hundreds of millions of travel records without first conducting a legal review. Airlines like Delta, United, American, and Southwest are selling these records to the government through a co-owned data broker.

#ai
Score · 2.72
Agents built into your workflow: Get Security Copilot with Microsoft 365 E5
news
Microsoft Security1 day ago

At Microsoft Ignite 2025, we are not just announcing new features—we are redefining what’s possible, empowering security teams to shift from reactive responses to proactive strategies. The post Agents built into your workflow: Get Security Copilot with Microsoft 365 E5 appeared first on Microsoft Security Blog .

Score · 2.92
Microsoft is packing more AI into Windows, ready or not - here's what's new - ZDNET
news
ZDNET (Google News)1 day ago

Microsoft is packing more AI into Windows, ready or not - here's what's new ZDNET

#ai
Score · 2.72
Google Launches New Gemini AI Model With Interactive Answers - Bloomberg.com
news
Bloomberg (Google News)1 day ago

Google Launches New Gemini AI Model With Interactive Answers Bloomberg.com

#ai
#product
Score · 2.72
Gemini 3 Is Here—and Google Says It Will Make Search Smarter
news
WIRED1 day ago

Gemini 3 is skilled at reasoning, generating video, and writing code. Amid talk of an AI bubble, Google notes the new model could help increase search revenue too.

#ai
Score · 2.82
Generative UI: A rich, custom, visual interactive user experience for any prompt
news
Google Research Blog1 day ago

Generative AI

#ai
Score · 2.97
Writer's AI agents can actually do your work—not just chat about it
news
VentureBeat – AI1 day ago

Writer , a San Francisco-based artificial intelligence startup, is launching a unified AI agent platform designed to let any employee automate complex business workflows without writing code — a capability the company says distinguishes it from consumer-oriented tools like Microsoft Copilot and ChatGPT . The platform, called Writer Agent , combines chat-based assistance with autonomous task execution in a single interface. Starting Tuesday, enterprise customers can use natural language to instruct the AI to create presentations, analyze financial data, generate marketing campaigns, or coordinate across multiple business systems like Salesforce, Slack, and Google Workspace—then save those workflows as reusable "Playbooks" that run automatically on schedules. The announcement comes as enterprises struggle to move AI initiatives beyond pilot programs into production at scale. Writer CEO May Habib has been outspoken about this challenge, recently revealing that 42% of Fortune 500 executives surveyed by her company said AI is " tearing their company apart " due to coordination failures between departments. "We're delivering an agent interface that is both incredibly powerful and radically simple to transform individual productivity into organizational impact," Habib said in a statement. "Writer Agent is the difference between a single sales rep asking a chatbot to write an outreach email and an enterprise ensuring that 1,000 reps are all sending on-brand, compliant, and contextually-aware messages to target accounts." How Writer is putting workflow automation in the hands of non-technical workers The platform's core innovation centers on making workflow automation accessible to non-technical employees—what Writer executives call "democratizing who gets to be a builder." In an exclusive interview with VentureBeat, Doris Jwo, Writer's director of product management, demonstrated how the system works: A user types a request in plain English — for example, "Create a two-page partnership proposal between [Company A] and [Company B], make it a branded deck, include impact metrics and partnership tiers." The AI agent then breaks down that request into discrete steps, conducts web research, generates graphics and charts on the fly, creates individual slides with sourced information, and assembles a complete presentation. The entire process, which might take an employee hours or days, can be completed in 10-12 minutes. "The agent basically looks at the request, breaks it down, does research, understands what pieces it needs, creates a detailed plan at a step-by-step level," Jwo explained during a product demonstration. "It might say, 'I need to do web research,' or 'This user needs information from Gong or Slack,' and it reaches out to those connectors, grabs the data, and executes the plan." Crucially, users can save these multi-step processes as Playbooks—reusable templates that colleagues can deploy with a single click. Routines allow those Playbooks to run automatically at scheduled intervals, essentially putting knowledge work "on autopilot." Security and compliance controls: Writer's answer to enterprise IT concerns Writer positions these enterprise-focused controls as a key differentiator from competitors. While Microsoft , OpenAI , and Anthropic offer powerful AI capabilities, Writer's executives argue those tools weren't designed from the ground up for the security, compliance, and governance requirements of large regulated organizations. "All of the products you mentioned are great products, but even Copilot is very much focused on personal productivity—summarizing email, for example, which is important, but that's not the component we're focusing on," said Matan-Paul Shetrit, Writer's director of product management, in an exclusive interview with VentureBeat. Shetrit emphasized Writer's "trust, security, and interoperability" approach. IT administrators can granularly control what the AI can access — for instance, preventing market research agents from mentioning competitors, or restricting which employees can use web search capabilities. All activity is logged with detailed audit trails showing exactly what data the agent touched and what actions it took. "These fine-grained controls are what make products enterprise-ready," Shetrit said. "We can deploy to tens of thousands or hundreds of thousands of employees while maintaining the security and guardrails you need for that scale." This architecture reflects Writer's origin story . Unlike OpenAI or Anthropic, which started as research labs and later added enterprise offerings, Writer has targeted Fortune 500 companies since its 2020 founding. "We're not a research lab that went to consumer and is dabbling in enterprise," Shetrit said. "We are first and foremost targeting the Global 2000 and Fortune 500, and our research is in service of these customers' needs." Inside Writer's strategy to connect AI agents across enterprise software systems A critical technical component is Writer's approach to system integrations. The platform includes pre-built connectors to more than a dozen enterprise applications— Google Workspace , Microsoft 365 , Snowflake , Asana , Slack , Gong , HubSpot , Atlassian , Databricks , PitchBook , and FactSet —allowing the AI to retrieve information and take actions across those systems. Writer built these connectors using the Model Context Protocol (MCP) , an emerging standard for AI system integrations, but added what Shetrit described as an "enterprise-ready" layer on top. "We took a first-principle approach of: You have this MCP connector infrastructure—how do you build it in a way that's enterprise-ready?" Shetrit explained. "What we have today in the industry is definitely not it." The system can write and execute code on the fly to handle unexpected scenarios. If a user uploads an unfamiliar file format, for instance, the agent will generate code to extract and process the text without requiring a human to intervene. Jwo demonstrated this capability with a daily workflow she runs: Every morning at 10 a.m., a Routine automatically summarizes her Google Calendar meetings, identifies external participants, finds their LinkedIn profiles, and sends the summary to her via Slack — all without her involvement. "This was pretty simple, but you can imagine for a salesperson it might say, 'At the end of the day, wrap up a summary of all the calls I had, send me action items, post it to the account-specific Slack channel, and tag these folks so they can accomplish those workflows,'" Jwo said. "That can run continuously each day, each week, or on demand." From mortgage lenders to CPG brands: Real-world AI agent use cases across industries The platform is attracting customers across multiple industries. New American Funding , a mortgage lender, uses Writer Agent to automate marketing workflows. Senior Content Marketing Manager Karen Rodriguez uploads Asana project tickets with creative briefs, and the AI executes tasks like updating email campaigns or transforming articles into social media carousels, video scripts, and captions. Other use cases span financial services teams creating investment dashboards with PitchBook and FactSet data, consumer packaged goods companies brainstorming new product lines based on social media trends, and marketing teams generating partnership presentations with branded assets. Writer has added customers including TikTok , Comcast , Keurig Dr Pepper , CAA , and Aptitude Health , joining an existing base that includes Accenture , Qualcomm , Uber , Vanguard , and Marriott . The company now serves more than 300 enterprises and has secured over $50 million in signed contracts, with projections to double that to $100 million this year. The startup's net retention rate — a measure of how much existing customers expand their usage — stands at 160%, meaning customers on average increase their spending by 60% after initial contracts. Twenty customers who started with $200,000-$300,000 contracts now spend about $1 million annually, according to company data. 'Vibe working': Writer's vision for AI-powered productivity beyond coding Writer executives frame the platform as enabling what they call "vibe working" — a playful reference to the popular term "vibe coding," which describes AI tools like Cursor that dramatically accelerate software development. "We used to call it transformation when we took 12 steps and made them nine. That's optimizing the world as it is," Habib said at Writer's AI Leaders Forum earlier this month, according to Forbes . "We can now create a new world. That is the greenfield mindset." Shetrit echoed this framing: "Vibe coding is the theme of 2025. Our view is that ‘vibe working’ is the theme of 2026. How do you bring the same productivity gains you've seen with coding agents into the workspace in a way that non-technical users can maximize them?" The platform is powered by Palmyra X5 , Writer's proprietary large language model featuring a one-million-token context window — among the largest commercially available. Writer trained the model for approximately $700,000, a fraction of the estimated $100 million OpenAI spent on GPT-4, by using synthetic data and techniques that halt training when returns diminish. The model can process one million tokens in about 22 seconds and costs 60 cents per million input tokens and $6 per million output tokens — significantly cheaper than comparable offerings, according to company specifications. Making AI Decisions Visible: Writer's Approach to Trust and Transparency A distinctive aspect of Writer's approach is transparency into the AI's decision-making process. The interface displays the agent's step-by-step reasoning, showing which data sources it accessed, what code it generated, and how it arrived at outputs. "There's a very clear exhibition of how the agent is thinking, what it's doing, what it's touching," Shetrit said. "This is important for the end user to trust it, but also important for the IT person or security professional to see what's going on." This "supervision" model goes beyond simple observability of API calls to encompass what Shetrit described as "a superset of observability" — giving organizations the ability to not just monitor but control AI behavior through policies and permissions. Session logs capture all agent activity when enabled by administrators, and users can submit feedback on every output to help improve system performance. The platform also emphasizes providing sources and citations for generated content, allowing users to verify information. "With any sort of chat assistant, agentic or not, trust but verify is really important," Jwo said. "That's part of the pillars of us building this and making it enterprise-grade." What Writer Agent Costs—and Why It's Included in the Base Platform Writer is including all the new capabilities—Playbooks, Routines, Connectors, and Personality customization—as part of its core platform without additional charges, according to Jwo. "This is fully included as part of the Writer platform," she said. "We're not charging additional for using Writer Agent." The "Personality" feature allows individual users, teams, or entire organizations to customize the AI's communication style, ensuring generated content matches brand voice and tone guidelines. This works alongside company-level controls that enforce terminology and style requirements. For highly structured, repetitive tasks, Writer also offers a library of more than 100 pre-built agents and an AI Studio for building custom multi-agent systems aligned with specific business use cases. The Race to Define Enterprise AI: Can Purpose-Built Platforms Beat Tech Giants? The launch crystallizes a fundamental tension in how enterprises will adopt AI at scale. While consumer-facing AI tools emphasize individual productivity gains, companies need systems that work reliably across thousands of employees, integrate with existing software infrastructure, maintain regulatory compliance, and deliver measurable business impact. Writer's wager is that these requirements demand purpose-built enterprise platforms rather than consumer tools adapted for business use. The company's $1.9 billion valuation — achieved in a November 2024 funding round that raised $200 million — suggests investors see merit in this thesis. Backers include Premji Invest , Radical Ventures , ICONIQ Growth , Salesforce Ventures , and Adobe Ventures . Yet the competitive landscape remains formidable. Microsoft and Google command enormous distribution advantages through their existing enterprise software relationships. OpenAI and Anthropic possess research capabilities that have produced breakthrough models. Whether Writer can maintain its differentiation as these giants expand their enterprise offerings will test the startup's core premise: that serving Fortune 500 companies from day one creates advantages that research labs turned enterprise vendors cannot easily replicate. "We're entering an era where if you can describe a better way to work, you can build it," Jwo said. "The new Writer Agent democratizes who gets to be a builder, empowering the operational experts and creative problem-solvers in every department to become the architects of their own transformation. That's how you unlock innovation that competitors can't replicate." The promise is alluring — AI capabilities powerful enough to transform how work gets done, accessible enough for any employee to use, and controlled enough for enterprises to deploy safely at scale. Whether Writer can deliver on that promise at the speed and scale required will determine if its vision of "vibe working" becomes the 2026 theme Shetrit predicts, or just another ambitious attempt to solve enterprise AI's execution problem. But one thing is certain: In a market where 85% of AI initiatives fail to escape pilot purgatory, Writer is betting that the winners won't be the companies with the most powerful models—they'll be the ones that make those models actually work inside the enterprise.

#ai
#llm
#research
#product
Score · 2.77
Microsoft remakes Windows for an era of autonomous AI agents
news
VentureBeat – AI1 day ago

Microsoft is fundamentally restructuring its Windows operating system to become what executives call the first "agentic OS," embedding the infrastructure needed for autonomous AI agents to operate securely at enterprise scale — a watershed moment in the evolution of personal computing that positions the 40-year-old platform as the foundation for a new era of human-machine collaboration. The company announced Tuesday at its Ignite conference that it is introducing native agent infrastructure directly into Windows 11 , allowing AI agents — autonomous software programs that can perform complex, multi-step tasks on behalf of users — to discover tools, execute workflows, and interact with applications through standardized protocols while operating in secure, policy-controlled environments separate from user sessions. The shift is Microsoft's most significant architectural evolution of Windows since the introduction of the modern security model, transforming the operating system from a platform where users manually orchestrate applications into one where they can "simply express your desired outcome, and agents handle the complexity," according to Pavan Davuluri, President of Windows & Devices at Microsoft. "Windows 11 starts with this notion of secure by design, secure by default," Davuluri said in an exclusive interview with VentureBeat. "And a lot of the work that we're doing today, when we think about the engagement we have with our customers, the expectations they have with us is making sure we are building upon the fact that Windows is the most secure platform for them and is the most resilient platform as well." The announcements arrive as enterprises are experimenting with AI agents but struggling with fragmented tooling, security concerns, and lack of centralized management — challenges that Microsoft believes only operating system-level integration can solve. The stakes are enormous: with Windows running on an estimated 1.4 billion devices globally, Microsoft's architectural choices will likely shape how organizations deploy autonomous AI systems for years to come. New platform primitives create foundation for agent computing At the core of Microsoft's vision are three new platform capabilities entering preview that fundamentally change how agents operate on Windows. Agent Connectors provide native support for the Model Context Protocol (MCP) , an open standard introduced by Anthropic that allows AI agents to connect with external tools and data sources. Microsoft has built what it calls an "on-device registry" — a secure, manageable repository where developers can register their applications' capabilities as agent connectors, making them discoverable to any compatible agent on the system. "These are platform capabilities that then become available to all of our customers," Davuluri explained, describing how the Windows file system, for example, becomes an agent connector that any MCP-compatible agent can access with user consent. "We're able to do this in a fashion that can scale for one but it also allows others to participate in the Windows registry for MCP." The architecture introduces an MCP proxy layer that handles authentication, authorization, and auditing for all communication between agents and connectors. Microsoft is launching with two built-in agent connectors for File Explorer and System Settings, allowing agents to manage files or adjust system configurations like switching between light and dark mode — all with explicit user permission. Agent Workspace , entering private preview, represents perhaps the most significant security innovation. It creates what Microsoft describes as "a contained, policy-controlled, and auditable environment where agents can interact with software" — essentially a parallel desktop session where agents operate with their own distinct identity, completely separate from the user's primary session. "We want to be able to have clarity in the identity of the agent that is operating in the local operating system," Davuluri said, addressing security concerns about agents accessing sensitive data. "We want that session to be a session that is secure, that is policy control, that is manageable, that has transparency and auditability." Each agent workspace runs with minimal privileges by default, accessing only explicitly granted resources. The system maintains detailed audit logs distinguishing agent actions from user actions — critical for enterprises that need to prove compliance and track all changes to systems and data. Windows 365 for Agents extends this infrastructure to the cloud, turning Microsoft's Cloud PC offering into execution environments for agents. Instead of running on local devices, agents can operate in secure, policy-controlled virtual machines in Azure, enabling what Microsoft calls "computer-using agents" to interact with legacy applications and perform automation tasks at scale without consuming local compute resources. Taskbar becomes command center for monitoring AI agents at work The infrastructure enables significant user interface changes designed to make agents as commonplace as applications. Microsoft is introducing "Ask Copilot on the taskbar," a unified entry point in preview that combines Microsoft 365 Copilot, agent invocation, and traditional search in a single interface. Users will be able to invoke agents using "@" mentions directly from the taskbar, then monitor their progress through familiar UI patterns like hover cards, progress badges, and notifications — all while continuing other work. When an agent completes a task or needs input, it surfaces updates through the taskbar without disrupting the user's primary workflow. "We've evolved and created new UX in the taskbar to reflect the unique needs of agents performing background tasks on your behalf," said Navjot Virk, Corporate Vice President of Windows Experiences, describing features like progress bars and status badges that indicate when agents are working, need approval, or have completed tasks. The design philosophy, Virk emphasized, centers on user control. "These experiences are designed to be opt in. We want to give customers full control over when and how they engage with copilots and agents." For commercial Microsoft 365 Copilot users, the integration goes deeper. Microsoft is embedding Copilot directly into File Explorer, allowing users to ask questions, generate summaries, or draft emails based on document contents without leaving the file management interface. On Copilot+ PCs — devices with neural processing units capable of 40 trillion operations per second — new capabilities include converting any on-screen table into an Excel spreadsheet through the Click to Do feature. Microsoft bets on open standards against Apple and Google's proprietary approaches Microsoft's embrace of the open Model Context Protocol , created by Anthropic, marks a strategic bet on openness as enterprises evaluate competing AI platforms from Apple and Google that use proprietary frameworks. "Windows is an open platform, and by virtue [of being] an open platform, we certainly have the ability to take existing technologies, evolve, harden, adapt those, but we also allow customers to bring their own capabilities to the platform as well," Davuluri said when asked about competing with Apple Intelligence and Google's Android AI for Enterprise . The company demonstrated this openness with Claude, Anthropic's AI assistant, accessing the Windows file system through agent connectors with user consent — one of numerous partnerships Microsoft has secured. Dynamics 365 is using the File Explorer connector to streamline expense reporting, reducing what was previously a 30-minute, dozen-step process to "one sentence with high accuracy," according to Microsoft's blog post. Other early partners include Manus AI , Dropbox Dash , Roboflow , and Infosys . "Windows is the platform in which they build upon," Davuluri said of enterprise customers. "And so our ability to take those existing bodies of work they have, and extend them is the, I think, the least friction way for them to go, learn, adopt, experiment and find ways to [scale]." Security model enforces strict containment and mandatory user consent Microsoft's security model for agents adheres to what it calls " secure by default " policies aligned with the company's broader Secure Future Initiative . All agent connectors registered in the on-device registry must meet strict requirements around packaging and identity, with applications properly packaged and signed by trusted sources. Developers must explicitly declare the minimum capabilities their agent connectors require, and agents and connectors run in isolated environments with dedicated agent user accounts, separate from human user accounts. Windows requires explicit user approval when agents first access sensitive resources like files or system settings. "We give Windows the ability to go deliver on the security expectations, and then it is auditable at the end of the day," Davuluri said. "You still want an auditability log that looks similar to perhaps what you use in the cloud. And so all three pieces are built into the design and architecture of Agent Workspace." For IT administrators, Microsoft is introducing management policies through Intune and Group Policy that allow organizations to enable or disable agent features at device and account levels, set minimum security policy levels, and access event logs enumerating all agent connector invocations and errors. The company emphasized that agents operate with restricted privileges, with minimal permissions by default and access granted only to explicitly approved resources that users can revoke at any time. Post-quantum cryptography and recovery tools address emerging and persistent threats Beyond agent infrastructure, Microsoft announced significant security and resilience updates addressing both emerging and persistent enterprise challenges. Post-Quantum Cryptography APIs are now generally available in Windows, allowing organizations to begin migrating to encryption algorithms designed to withstand future quantum computing attacks that could break today's cryptographic standards. Microsoft worked closely with the National Institute of Standards and Technology to implement these algorithms. "We are introducing post quantum cryptography APIs in Windows," Davuluri said. "For customers who want to be able to do cryptographic encryption in their workloads, they can start taking advantage of these APIs in Windows for the first time. That is a huge step forward for us when we think about the future of windows." Hardware-accelerated BitLocker will arrive on new devices starting spring 2026, offloading disk encryption to dedicated silicon for faster performance while providing hardware-level key protection. Sysmon functionality is becoming generally available as part of Windows in early 2026, bringing advanced forensics and threat detection capabilities previously available only as a separate download directly into the operating system's event logging system. The company also detailed progress on its Windows Resiliency Initiative, launched a year ago following the CrowdStrike incident that disrupted 8.5 million Windows devices globally. New recovery capabilities include Quick Machine Recovery with expanded networking support and Autopatch management, allowing IT to remotely fix devices stuck in Windows Recovery Environment. Point-in-time restore entering preview rolls back devices to earlier states to resolve update conflicts or configuration errors, while Cloud rebuild in preview allows IT to remotely rebuild malfunctioning devices by downloading fresh installation media and using Autopilot for zero-touch provisioning. Microsoft is also raising security requirements for third-party drivers across the Windows ecosystem. Following updated requirements for antivirus drivers effective April 1, 2025, the company is expanding this approach to other driver classes including networking, cameras, USB, printers, and storage — requiring higher certification standards, adding compiler safeguards, and providing more Windows in-box drivers to reduce reliance on third-party kernel-mode code. Measured rollout reflects enterprise caution around autonomous software Microsoft is positioning these updates as essential infrastructure for what it calls " Frontier Firms " — organizations that "blend human ingenuity with intelligent systems to deliver real outcomes." However, the company emphasized a cautious, opt-in approach that reflects enterprise concerns about autonomous software agents. "The principles we're using in designing these new platform capabilities accounts for the reality that we have a very, very broad user base," Davuluri said. "A lot of the features and capabilities we're building are opt in capabilities. And so it is our goal to be able to have users find value in the workflow and meet them." Virk emphasized the measured approach: "This is more about meeting customers where they are and then taking them on this journey when they are ready. So there's the optionality, but also having support for it. And really important thing is that they should feel comfortable. They should feel secure." Microsoft's bet is that only operating system-level integration can provide the security, governance, and user experience required for mainstream AI agent adoption. Whether that vision materializes will depend on developer adoption, enterprise comfort with autonomous software, and Microsoft's ability to balance innovation with the stability that 40 years of Windows customers expect. After four decades of putting users in control of their computers, Windows is now asking them to share that control with machines.

#ai
#product
#open_source
Score · 2.77
Google unveils Gemini 3 claiming the lead in math, science, multimodal and agentic AI benchmarks
news
VentureBeat – AI1 day ago

After more than a month of rumors and feverish speculation — including Polymarket wagering on the release date — Google today unveiled Gemini 3 , its newest proprietary frontier model family and the company’s most comprehensive AI release since the Gemini line debuted in 2023. The models are proprietary (closed-source), available exclusively through Google products, developer platforms, and paid APIs, including Google AI Studio , Vertex AI , the Gemini command line interface (CLI) for developers, and third-party integrations across the broader integrated developer environment (IDE) ecosystem. Gemini 3 arrives as a full portfolio, including: Gemini 3 Pro: the flagship frontier model Gemini 3 Deep Think: an enhanced reasoning mode Generative interface models powering Visual Layout and Dynamic View Gemini Agent for multi-step task execution Gemini 3 engine embedded in Google Antigravity , the company’s new agent-first development environment. "This is the best model in the world, by a crazy wide margin!" wrote Google DeepMind Research Scientist Yi Tay on X . Indeed, already, independent AI benchmarking and analysis organization Artificial Analysis has crowned Gemini 3 Pro the "new leader in AI" globally, achieving the top score of 73 on the organization's index, leaping Google from its former placement of 9th overall with the preceding Gemini 2.5 Pro model, which scored 60 behind OpenAI, Moonshot AI, xAI, Anthropic and MiniMax models. As Artificial Analysis wrote on X : "For the first time, Google has the most intelligent model." Another independent leaderboard site, LMArena reported that Gemini 3 Pro ranked first in the world across all of its major evaluation tracks , including text reasoning, vision, coding, and web development. In a public post, the @arena account on X said the model surpassed even the newly released (hours old) Grok-4.1, as well as Claude 4.5, and GPT-5-class systems in categories such as math, long-form queries, creative writing, and several occupational benchmarks. The post also highlighted the scale of gains over Gemini 2.5 Pro, including a 50-point jump in text Elo, a 70-point increase in vision, and a 280-point rise in web-development tasks. While these results reflect live community voting and remain preliminary, they signal unusually broad performance improvements across domains where previous Gemini models trailed competitors. What It Means For Google In the Hotly Competitive AI Race The launch represents one of Google’s largest, most tightly coordinated model releases. Gemini 3 is shipping simultaneously across Google Search, the Gemini app, Google AI Studio, Vertex AI, and a range of developer tools. Executives emphasized that this integration reflects Google’s control of tensor processing unit (TPU — its homegrown Nvidia GPU rival chips) hardware, data center infrastructure, and consumer products. According to the company, the Gemini app now has more than 650 million monthly active users, more than 13 million developers build with Google’s AI tools, and more than 2 billion monthly users engage with Gemini-powered AI Overviews in Search. At the center of the release is a shift toward agentic AI — systems that plan, act, navigate interfaces, and coordinate tools, rather than just generating text. Gemini 3 is designed to translate high-level instructions into multi-step workflows across devices and applications, with the ability to generate functional interfaces, run tools, and manage complex tasks. Major Performance Gains Over Gemini 2.5 Pro Gemini 3 Pro introduces large gains over Gemini 2.5 Pro across reasoning, mathematics, multimodality, tool use, coding, and long-horizon planning. Google’s benchmark disclosures show substantial improvements in many categories. Gemini 3 Pro debuted at the top of the LMArena text-reasoning leaderboard, posting a preliminary Elo score of 1501 based on pre-release community voting — the first LLM to ever cross the 1500 threshold. That places it above xAI’s newly announced Grok-4.1-thinking model (1484) and Grok-4.1 (1465), both of which were unveiled just hours earlier, as well as above Gemini 2.5 Pro (1451) and recent Claude Sonnet and Opus releases. While LMArena covers only text-reasoning performance and the results are labeled preliminary, this ranking positions Gemini 3 Pro as the strongest publicly evaluated model on that benchmark as of its launch day — though not necessarily the top performer in the world across all modalities, tasks, or evaluation suites. In mathematical and scientific reasoning, Gemini 3 Pro scored 95 percent on AIME 2025 without tools and 100 percent with code execution, compared to 88 percent for its predecessor. On GPQA Diamond, it reached 91.9 percent, up from 86.4 percent. The model also recorded a major jump on MathArena Apex, reaching 23.4 percent versus 0.5 percent for Gemini 2.5 Pro, and delivered 31.1 percent on ARC-AGI-2 compared to 4.9 percent previously. ARC-AGI-2 is the second-generation version of the Abstraction and Reasoning Corpus (ARC), a benchmark introduced by AI researcher François Chollet to measure generalization , not memorization. Unlike typical multiple-choice or dataset-based evaluations, ARC-AGI-2 presents models with tiny grid-based puzzles that require discovering and applying abstract rules. Each task provides a few input–output examples, and the model must infer the underlying transformation and apply it to a new test case. The problems span visual pattern recognition, symbolic manipulation, object transformations, spatial reasoning, and rule induction — all designed to test reasoning capabilities that do not depend on training-set familiarity. The new ARC-AGI-2 variant is deliberately constructed to be out-of-distribution and resistant to memorization , making it one of the most difficult benchmarks for large language models. Its tasks are engineered to stress-test whether a model can infer a previously unseen rule purely from examples, a proxy for early forms of generalized problem-solving. Astonishingly, the "Deep Think" version of Gemini 3, designed to take longer to solve problems and use more reasoning, scored 45.1%, representing a substantial jump over prior frontier models, which typically score in the mid-teens to low-twenties. It also far exceeds Gemini 3 Pro’s 31.1% and is an order-of-magnitude improvement over older Gemini releases. These results suggest that Deep Think’s architecture is particularly effective at multi-step hypothesis generation, checking, and revision — the specific capabilities ARC-AGI-2 is designed to measure. Multimodal performance increased across the board. Gemini 3 Pro scored 81 percent on MMMU-Pro, up from 68 percent, and 87.6 percent on Video-MMMU, compared to 83.6 percent. Its result on ScreenSpot-Pro, a key benchmark for agentic computer use, rose from 11.4 percent to 72.7 percent. Document understanding and chart reasoning also improved. Coding and tool-use performance showed equally significant gains. The model’s LiveCodeBench Pro score reached 2,439, up from 1,775. On Terminal-Bench 2.0 it achieved 54.2 percent versus 32.6 percent previously. SWE-Bench Verified, which measures agentic coding through structured fixes, increased from 59.6 percent to 76.2 percent. The model also posted 85.4 percent on t2-bench, up from 54.9 percent. Long-context and planning benchmarks indicate more stable multi-step behavior. Gemini 3 achieved 77 percent on MRCR v2 at 128k context (versus 58 percent) and 26.3 percent at 1 million tokens (versus 16.4 percent). Its Vending-Bench 2 score reached $5,478.16, compared to $573.64 for Gemini 2.5 Pro, reflecting stronger consistency during long-running decision processes. Language understanding scores improved on SimpleQA Verified (72.1 percent versus 54.5 percent), MMLU (91.8 percent versus 89.5 percent), and the FACTS Benchmark Suite (70.5 percent versus 63.4 percent), supporting more reliable fact-based work in regulated sectors. Generative Interfaces Move Gemini Beyond Text Gemini 3 introduces a new class of generative interface capabilities in the consumer-facing Google Search AI Mode and for developers through Google AI Studio. Visual Layout produces structured, magazine-style pages with images, diagrams, and modules tailored to the query. Dynamic View generates functional interface components such as calculators, simulations, galleries, and interactive graphs. These experiences will be available starting today globally in Google Search’s AI Mode, enabling models to surface information in visual, interactive formats beyond static text. Developers can reproduce similar UI elements through Google AI Studio and the Gemini API, but the full consumer-facing interface types are not available as direct API outputs; instead, developers receive the underlying code or schema to render these components themselves. The branded Visual Layout and Dynamic View formats are therefore specific to Search and not exposed as standalone API features. Google says the model analyzes user intent to construct the layout best suited to a task. In practice, this includes everything from automatically building diagrams for scientific concepts to generating custom UI components that respond to user input. Google held a press call the day before the Gemini 3 announcement to brief reporters on the model family, its intended use cases, and how it differed from earlier Gemini releases. The call was led by multiple Google and DeepMind executives who walked through the model’s capabilities and framed Gemini 3 as a step toward more reliable, multi-step agentic systems that can operate across Google’s ecosystem. During the briefing, speakers emphasized that Gemini 3 was engineered to support more consistent long-horizon reasoning, better tool use, and smoother planning loops than Gemini 2.5 Pro. One presenter said the model benefits from an architecture that allows it to generate and evaluate multiple hypotheses in parallel, improving reliability on mathematically hard questions and complex procedural tasks. Another speaker explained that Gemini 3’s improved spatial reasoning enables more robust interaction with interface elements, which supports agentic workflows across screens and applications. Presenters highlighted growing enterprise adoption, noting strong demand for multimodal analysis, structured document reasoning, and agentic coding tools. They said Gemini 3’s performance on multimodal and scientific benchmarks reflected Google’s focus on grounded, verifiable reasoning. And they discussed Gemini 3's safety processes and improvements, including reduced sycophancy, stronger prompt-injection resistance, and a more structured evaluation pipeline guided by Google’s Frontier Safety Framework introduced back in 2024. A portion of the call was dedicated to developer experience. Google described updates to its AI Studio and API that allow developers to control thinking depth, adjust model “resolution,” and combine new grounding tools with URL context and Search. Demoes showed Gemini 3 generating application interfaces, managing tool sequences, and debugging code in Antigravity, illustrating the model’s shift toward agentic operation rather than single-step generation. The call positioned Gemini 3 as an upgrade across reasoning, planning, multimodal understanding, and developer workflows, with Google framing these advances as the foundation for its next generation of agent-driven products and enterprise services. Gemini Agent Introduces Multi-Step Workflow Automation Gemini Agent marks Google’s effort to move beyond conversational assistance toward operational AI. The system coordinates multi-step tasks across tools like Gmail, Calendar, Canvas, and live browsing. It reviews inboxes, drafts replies, prepares plans, triages information, and reasons through complex workflows, while requiring user approval before performing sensitive actions. On a press call with journalists ahead of the release yesterday, Google said the agent is designed to handle multi-turn planning and tool-use sequences with consistency that was not feasible in earlier generations. It is rolling out first to Google AI Ultra subscribers in the Gemini app. Google Antigravity and Developer Toolchain Integration Antigravity is Google’s new agent-first development environment designed around Gemini 3. Developers collaborate with agents across an editor, terminal, and browser. The system orchestrates full-stack tasks, including code generation, UI prototyping, debugging, live execution, and report generation. Across the broader developer ecosystem, Google AI Studio now includes a Build mode that automatically wires the right models and APIs to speed up AI-native app creation. Annotations support allows developers to attach prompts to UI elements for faster iteration. Spatial reasoning improvements enable agents to interpret mouse movements, screen annotations, and multi-window layouts to operate computer interfaces more effectively. Developers also gain new reasoning controls through “thinking level” and “model resolution” parameters in the Gemini API, along with stricter validation of thought signatures for multi-turn consistency. A hosted server-side bash tool supports secure, multi-language code generation and prototyping. Grounding with Google Search and URL context can now be combined to extract structured information for downstream tasks. Enterprise Impact and Adoption Enterprise teams gain multimodal understanding, agentic coding, and long-horizon planning needed for production use cases. The new model unifies analysis of documents, audio, video, workflows, and logs. Improvements in spatial and visual reasoning support robotics, autonomous systems, and scenarios requiring navigation of screens and applications. High-frame-rate video understanding helps developers detect events in fast-moving environments. Gemini 3’s structured document understanding capabilities support legal review, complex form processing, and regulated workflows. Its ability to generate functional interfaces and prototypes with minimal prompting reduces engineering cycles. In addition, the gains in system reliability, tool-calling stability, and context retention make multi-step planning viable for operations like financial forecasting, customer support automation, supply chain modeling, and predictive maintenance. Developer and API Pricing Google has disclosed initial API pricing for Gemini 3 Pro. In preview, the model is priced at $2 per million input tokens and $12 per million output tokens for prompts up to 200,000 tokens in Google AI Studio and Vertex AI. For prompts that require more than 200,000 tokens, the input pricing doubles to $2 per 1M tok, while the output rises to $18 per 1M tok. When compared to the API pricing for other frontier AI models from rival labs, Gemini 3 is priced in the mid-high range, which may impact adoption as cheaper and open-source (permissively licensed) Chinese models have increasingly come to be adopted by U.S. startups . Here's how it stacks up: Model Input (/1M tokens) Output (/1M tokens) Total Cost Source ERNIE 4.5 Turbo $0.11 $0.45 $0.56 Qianfan ERNIE 5.0 $0.85 $3.40 $4.25 Qianfan Qwen3 (Coder ex.) $0.85 $3.40 $4.25 Qianfan GPT-5.1 $1.25 $10.00 $11.25 OpenAI Gemini 2.5 Pro (≤200K) $1.25 $10.00 $11.25 Google Gemini 3 Pro (≤200K) $2.00 $12.00 $14.00 Google Gemini 2.5 Pro (>200K) $2.50 $15.00 $17.50 Google Gemini 3 Pro (>200K) $4.00 $18.00 $22.00 Google Grok 4 (0709) $3.00 $15.00 $18.00 xAI API Claude Opus 4.1 $15.00 $75.00 $90.00 Anthropic Gemini 3 Pro is also available at no charge with rate limits in Google AI Studio for experimentation. The company has not yet announced pricing for Gemini 3 Deep Think, extended context windows, generative interfaces, or tool invocation. Enterprises planning deployment at scale will require these details to estimate operational costs. Multimodal, Visual, and Spatial Reasoning Enhancements Gemini 3’s improvements in embodied and spatial reasoning support pointing and trajectory prediction, task progression, and complex screen parsing. These capabilities extend to desktop and mobile environments, enabling agents to interpret screen elements, respond to on-screen context, and unlock new forms of computer-use automation. The model also delivers improved video reasoning with high-frame-rate understanding for analyzing fast-moving scenes, along with long-context video recall for synthesizing narratives across hours of footage. Google’s examples show the model generating full interactive demo apps directly from prompts, illustrating the depth of multimodal and agentic integration. Vibe Coding and Agentic Code Generation Gemini 3 advances Google’s concept of “vibe coding,” where natural language acts as the primary syntax. The model can translate high-level ideas into full applications with a single prompt, handling multi-step planning, code generation, and visual design. Enterprise partners like Figma, JetBrains, Cursor, Replit, and Cline report stronger instruction following, more stable agentic operation, and better long-context code manipulation compared to prior models. Rumors and Rumblings In the weeks leading up to the announcement, X became a hub of speculation about Gemini 3. Well-known accounts such as @slow_developer suggested internal builds were significantly ahead of Gemini 2.5 Pro and likely exceeded competitor performance in reasoning and tool use. Others, including @synthwavedd and @VraserX, noted mixed behavior in early checkpoints but acknowledged Google’s advantage in TPU hardware and training data. Viral clips from users like @lepadphone and @StijnSmits showed the model generating websites, animations, and UI layouts from single prompts, adding to the momentum. Prediction markets on Polymarket amplified the speculation. Whale accounts drove the odds of a mid-November release sharply upward, prompting widespread debate about insider activity. A temporary dip during a global Cloudflare outage became a moment of humor and conspiracy before odds surged again. The key moment came when users including @cheatyyyy shared what appeared to be an internal model-card benchmark table for Gemini 3 Pro. The image circulated rapidly, with commentary from figures like @deedydas and @kimmonismus arguing the numbers suggested a significant lead. When Google published the official benchmarks, they matched the leaked table exactly, confirming the document’s authenticity. By launch day, enthusiasm reached a peak. A brief “Geminiii” post from Sundar Pichai triggered widespread attention, and early testers quickly shared real examples of Gemini 3 generating interfaces, full apps, and complex visual designs. While some concerns about pricing and efficiency appeared, the dominant sentiment framed the launch as a turning point for Google and a display of its full-stack AI capabilities. Safety and Evaluation Google says Gemini 3 is its most secure model yet, with reduced sycophancy, stronger prompt-injection resistance, and better protection against misuse. The company partnered with external groups, including Apollo and Vaultis, and conducted evaluations using its Frontier Safety Framework. Deployment Across Google Products Gemini 3 is available across Google Search AI Mode, the Gemini app, Google AI Studio, Vertex AI, the Gemini CLI, and Google’s new agentic development platform, Antigravity. Google says additional Gemini 3 variants will arrive later. Conclusion Gemini 3 represents Google’s largest step forward in reasoning, multimodality, enterprise reliability, and agentic capabilities. The model’s performance gains over Gemini 2.5 Pro are substantial across mathematical reasoning, vision, coding, and planning. Generative interfaces, Gemini Agent, and Antigravity demonstrate a shift toward systems that not only respond to prompts but plan tasks, construct interfaces, and coordinate tools. Combined with an unusually intense hype and leak cycle, the launch marks a significant moment in the AI landscape as Google moves aggressively to expand its presence across both consumer-facing and enterprise-facing AI workflows.

#ai
#llm
#research
#product
#open_source
Score · 2.77
Microsoft’s Agent 365 Tries to Be the AI Bot Boss
news
WIRED1 day ago

Microsoft still sees AI agents as the future of work, and the enterprise software giant wants companies to be able to manage those agents just like human employees.

#ai
Score · 2.82
​​Ambient and autonomous security for the agentic era​​
news
Microsoft Security1 day ago

In the agentic era, security must be ambient and autonomous, like the AI it protects. This is our vision for security, where security becomes the core primitive. The post ​​Ambient and autonomous security for the agentic era​​ appeared first on Microsoft Security Blog .

#ai
Score · 2.92
Page 4 of 95