Tags

Browse by signal

Fast keyword tagging derived from titles and summaries. Expect more nuance as we add model-assisted tagging.

Top results tagged #ai

Becoming an AI-detective is a job I never wanted and wish I could quit | Samantha Floreani
news
The Guardianabout 2 months ago

My social media feed is now a hellish stream of puerile AI slop. Am I stubborn to want to hang on to reality? Recently, a friend sent me a video of a man dressed as a pickle. Following a high-octane car chase, the pickle flung himself out of the car and flailed down the highway. It was stupid and we laughed. But it also wasn’t real. When I pointed out to my friend that the video was AI-generated, she was taken by surprise, noting she’s usually pretty good at spotting them. She was also frustrated: “I hate having to be on the constant lookout for AI trash,” she lamented in the chat. And I feel that. Becoming an AI detective is a job I never wanted and wish I could quit. Continue reading...

#ai
Score · 3.00
Former MrBeast content strategist is building an AI tool for creator ideation and analytics
news
TechCrunch – AIabout 2 months ago

Palo is tool for short video creators to gain insights into what is working for them.

#ai
Score · 2.99
Amazon’s AI chatbot Rufus drove sales on Black Friday
news
TechCrunch – AIabout 1 month ago

On Black Friday, Amazon sessions that resulted in a sale were up 100% in the U.S. when the AI chatbot Rufus was used. They only increased by 20% when Rufus wasn't used.

#ai
Score · 2.98
Sneaky 2FA Phishing Kit Adds BitB Pop-ups Designed to Mimic the Browser Address Bar
news
The Hacker Newsabout 2 months ago

The malware authors associated with a Phishing-as-a-Service (PhaaS) kit known as Sneaky 2FA have incorporated Browser-in-the-Browser (BitB) functionality into their arsenal, underscoring the continued evolution of such offerings and further making it easier for less-skilled threat actors to mount attacks at scale. Push Security, in a report shared with The Hacker News, said it observed the use

#ai
#open_source
Score · 2.93
Can AI Look at Your Retina and Diagnose Alzheimer’s? Eric Topol Hopes So
news
WIREDabout 1 month ago

The author of Super Agers believes AI could bring big changes to the world of medicine.

#ai
Score · 2.92
You should still learn to code, says top Google AI exec - here's why
news
ZDNet - Securityabout 2 months ago

ZDNET sat down with Andrew Ng at AI Dev 25 in New York to talk about developer futures, responsible AI, and why AGI is overhyped.

#ai
Score · 2.92
BP’s Geospatial AI Engine: Transforming Safety and Operations with Databricks
news
Databricks Blogabout 1 month ago

The integration of DATABRICKS capabilities with geospatial technology marks a significant...

#ai
Score · 2.92
​​Ambient and autonomous security for the agentic era​​
news
Microsoft Securityabout 2 months ago

In the agentic era, security must be ambient and autonomous, like the AI it protects. This is our vision for security, where security becomes the core primitive. The post ​​Ambient and autonomous security for the agentic era​​ appeared first on Microsoft Security Blog .

#ai
Score · 2.92
MCP-Powered Financial AI Workflows on Databricks
news
Databricks Blogabout 1 month ago

To understand the foundations of Model Context Protocol (MCP) and Agent Bricks, see...

#ai
Score · 2.92
A
paper
arXiv cs.LGabout 2 months ago

We investigate the scaling behaviour of multimodal transformers across data regimes.

#ai
#research
Score · 2.92
Google's Antigravity puts coding productivity before AI hype - and the result is astonishing
news
ZDNet - Securityabout 2 months ago

This new AI coding environment looks like a real winner. Here's why.

#ai
#product
Score · 2.92
Best early Black Friday laptop deals 2025: 21+ sales out early
news
ZDNet - Securityabout 2 months ago

Black Friday is next week, but retailers are gearing up with early sales now. Here are the best we've found so far.

#ai
Score · 2.90
Anthropic’s Claude Opus 4.5 is here: Cheaper AI, infinite chats, and coding skills that beat humans
news
VentureBeat – AIabout 2 months ago

Anthropic released its most capable artificial intelligence model yet on Monday, slashing prices by roughly two-thirds while claiming state-of-the-art performance on software engineering tasks — a strategic move that intensifies the AI startup's competition with deep-pocketed rivals OpenAI and Google. The new model, Claude Opus 4.5 , scored higher on Anthropic's most challenging internal engineering assessment than any human job candidate in the company's history, according to materials reviewed by VentureBeat. The result underscores both the rapidly advancing capabilities of AI systems and growing questions about how the technology will reshape white-collar professions. The Amazon-backed company is pricing Claude Opus 4.5 at $5 per million input tokens and $25 per million output tokens — a dramatic reduction from the $15 and $75 rates for its predecessor, Claude Opus 4.1 , released earlier this year. The move makes frontier AI capabilities accessible to a broader swath of developers and enterprises while putting pressure on competitors to match both performance and pricing. "We want to make sure this really works for people who want to work with these models," said Alex Albert, Anthropic's head of developer relations, in an exclusive interview with VentureBeat. "That is really our focus: How can we enable Claude to be better at helping you do the things that you don't necessarily want to do in your job?" The announcement comes as Anthropic races to maintain its position in an increasingly crowded field. OpenAI recently released GPT-5.1 and a specialized coding model called Codex Max that can work autonomously for extended periods. Google unveiled Gemini 3 just last week, prompting concerns even from OpenAI about the search giant's progress, according to a recent report from The Information. Opus 4.5 demonstrates improved judgment on real-world tasks, developers say Anthropic's internal testing revealed what the company describes as a qualitative leap in Claude Opus 4.5's reasoning capabilities. The model achieved 80.9% accuracy on SWE-bench Verified , a benchmark measuring real-world software engineering tasks, outperforming OpenAI's GPT-5.1-Codex-Max (77.9%), Anthropic's own Sonnet 4.5 (77.2%), and Google's Gemini 3 Pro (76.2%), according to the company's data. The result marks a notable advance over OpenAI's current state-of-the-art model, which was released just five days earlier. But the technical benchmarks tell only part of the story. Albert said employee testers consistently reported that the model demonstrates improved judgment and intuition across diverse tasks — a shift he described as the model developing a sense of what matters in real-world contexts. "The model just kind of gets it," Albert said. "It just has developed this sort of intuition and judgment on a lot of real world things that feels qualitatively like a big jump up from past models." He pointed to his own workflow as an example. Previously, Albert said, he would ask AI models to gather information but hesitated to trust their synthesis or prioritization. With Opus 4.5, he's delegating more complete tasks, connecting it to Slack and internal documents to produce coherent summaries that match his priorities. Opus 4.5 outscores all human candidates on company's toughest engineering test The model's performance on Anthropic's internal engineering assessment marks a notable milestone. The take-home exam, designed for prospective performance engineering candidates, is meant to evaluate technical ability and judgment under time pressure within a prescribed two-hour limit. Using a technique called parallel test-time compute — which aggregates multiple attempts from the model and selects the best result — Opus 4.5 scored higher than any human candidate who has taken the test, according to company. Without a time limit, the model matched the performance of the best-ever human candidate when used within Claude Code, Anthropic's coding environment. The company acknowledged that the test doesn't measure other crucial professional skills such as collaboration, communication, or the instincts that develop over years of experience. Still, Anthropic said the result "raises questions about how AI will change engineering as a profession." Albert emphasized the significance of the finding. "I think this is kind of a sign, maybe, of what's to come around how useful these models can actually be in a work context and for our jobs," he said. "Of course, this was an engineering task, and I would say models are relatively ahead in engineering compared to other fields, but I think it's a really important signal to pay attention to." Dramatic efficiency improvements cut token usage by up to 76% on key benchmarks Beyond raw performance, Anthropic is betting that efficiency improvements will differentiate Claude Opus 4.5 in the market. The company says the model uses dramatically fewer tokens — the units of text that AI systems process — to achieve similar or better outcomes compared to predecessors. At a medium effort level, Opus 4.5 matches the previous Sonnet 4.5 model's best score on SWE-bench Verified while using 76% fewer output tokens, according to Anthropic. At the highest effort level, Opus 4.5 exceeds Sonnet 4.5 performance by 4.3 percentage points while still using 48% fewer tokens. To give developers more control, Anthropic introduced an "effort parameter" that allows users to adjust how much computational work the model applies to each task — balancing performance against latency and cost. Enterprise customers provided early validation of the efficiency claims. "Opus 4.5 beats Sonnet 4.5 and competition on our internal benchmarks, using fewer tokens to solve the same problems," said Michele Catasta, president of Replit, a cloud-based coding platform, in a statement to VentureBeat. "At scale, that efficiency compounds." GitHub's chief product officer, Mario Rodriguez, said early testing shows Opus 4.5 "surpasses internal coding benchmarks while cutting token usage in half, and is especially well-suited for tasks like code migration and code refactoring." Early customers report AI agents that learn from experience and refine their own skills One of the most striking capabilities demonstrated by early customers involves what Anthropic calls "self-improving agents" — AI systems that can refine their own performance through iterative learning. Rakuten , the Japanese e-commerce and internet company, tested Claude Opus 4.5 on automation of office tasks. "Our agents were able to autonomously refine their own capabilities — achieving peak performance in 4 iterations while other models couldn't match that quality after 10," said Yusuke Kaji, Rakuten's general manager of AI for business. Albert explained that the model isn't updating its own weights — the fundamental parameters that define an AI system's behavior — but rather iteratively improving the tools and approaches it uses to solve problems. "It was iteratively refining a skill for a task and seeing that it's trying to optimize the skill to get better performance so it could accomplish this task," he said. The capability extends beyond coding. Albert said Anthropic has observed significant improvements in creating professional documents, spreadsheets, and presentations. "They're saying that this has been the biggest jump they've seen between model generations," Albert said. "So going even from Sonnet 4.5 to Opus 4.5, bigger jump than any two models back to back in the past." Fundamental Research Labs , a financial modeling firm, reported that "accuracy on our internal evals improved 20%, efficiency rose 15%, and complex tasks that once seemed out of reach became achievable," according to co-founder Nico Christie. New features target Excel users, Chrome workflows and eliminate chat length limits Alongside the model release, Anthropic rolled out a suite of product updates aimed at enterprise users. Claude for Excel became generally available for Max, Team, and Enterprise users with new support for pivot tables, charts, and file uploads. The Chrome browser extension is now available to all Max users. Perhaps most significantly, Anthropic introduced " infinite chats " — a feature that eliminates context window limitations by automatically summarizing earlier parts of conversations as they grow longer. "Within Claude AI, within the product itself, you effectively get this kind of infinite context window due to the compaction, plus some memory things that we're doing," Albert explained. For developers, Anthropic released "programmatic tool calling," which allows Claude to write and execute code that invokes functions directly. Claude Code gained an updated "Plan Mode" and became available on desktop in research preview, enabling developers to run multiple AI agent sessions in parallel. Market heats up as OpenAI, Google race to match performance and pricing Anthropic reached $2 billion in annualized revenue during the first quarter of 2025, more than doubling from $1 billion in the prior period. The number of customers spending more than $100,000 annually jumped eightfold year-over-year. The rapid release of Opus 4.5 — just weeks after Haiku 4.5 in October and Sonnet 4.5 in September — reflects broader industry dynamics. OpenAI released multiple GPT-5 variants throughout 2025, including a specialized Codex Max model in November that can work autonomously for up to 24 hours. Google shipped Gemini 3 in mid-November after months of development. Albert attributed Anthropic's accelerated pace partly to using Claude to speed its own development. "We're seeing a lot of assistance and speed-up by Claude itself, whether it's on the actual product building side or on the model research side," he said. The pricing reduction for Opus 4.5 could pressure margins while potentially expanding the addressable market. "I'm expecting to see a lot of startups start to incorporate this into their products much more and feature it prominently," Albert said. Yet profitability remains elusive for leading AI labs as they invest heavily in computing infrastructure and research talent. The AI market is projected to top $1 trillion in revenue within a decade, but no single provider has established dominant market position—even as models reach a threshold where they can meaningfully automate complex knowledge work. Michael Truell, CEO of Cursor, an AI-powered code editor, called Opus 4.5 "a notable improvement over the prior Claude models inside Cursor, with improved pricing and intelligence on difficult coding tasks." Scott Wu, CEO of Cognition, an AI coding startup, said the model delivers "stronger results on our hardest evaluations and consistent performance through 30-minute autonomous coding sessions." For enterprises and developers, the competition translates to rapidly improving capabilities at falling prices. But as AI performance on technical tasks approaches—and sometimes exceeds—human expert levels, the technology's impact on professional work becomes less theoretical. When asked about the engineering exam results and what they signal about AI's trajectory, Albert was direct: "I think it's a really important signal to pay attention to."

#ai
#research
#product
#open_source
Score · 2.90
Microsoft's new AI agents won't just help us code, now they'll decide what to code
news
ZDNet - Securityabout 2 months ago

From Agent 365 to Foundry's MCP tool catalog and new IQ services, Microsoft is moving beyond copilots and toward a future where software development becomes an automated assembly process.

#ai
Score · 2.90
Google’s Gemini 3 Pro turns sparse MoE and 1M token context into a practical engine for multimodal agentic workloads
news
MarkTechPostabout 2 months ago

How do we move from language models that only answer prompts to systems that can reason over million token contexts, understand real world signals, and reliably act as agents on our behalf? Google just released Gemini 3 family with Gemini 3 Pro as the centerpiece that positions as a major step toward more general AI […] The post Google’s Gemini 3 Pro turns sparse MoE and 1M token context into a practical engine for multimodal agentic workloads appeared first on MarkTechPost .

#ai
#product
Score · 2.90