- 01 Anthropic released Claude Opus 4.6, expanding its context window from 200,000 to 1 million tokens — a five-fold increase enabling roughly 3,000 pages of text in a single conversation
- 02 On GDPval-AA, Opus 4.6 outperformed OpenAI’s GPT-5.2 by approximately 144 Elo points, translating to a ~70% win rate in head-to-head comparisons
- 03 Opus 4.6 scores 76% on MRCR v2 (retrieval in massive document sets) compared to just 18.5% for Sonnet 4.5
- 04 New agent teams feature allows multiple AI agents to work simultaneously — one early user reported Opus 4.6 autonomously closed 13 issues and assigned 12 across a ~50-person organization in a single day
- 05 A $285 billion selloff in software and services stocks swept through markets over two days following Anthropic’s release of legal plugins — Thomson Reuters fell -15.83%, Legalzoom fell nearly -20%

Photo: AI Generated / TheMarketContext.com
Anthropic released Claude Opus 4.6 on Thursday, its most capable model yet, as the enterprise AI race accelerates and financial services companies grapple with the concrete threat that artificial intelligence can now automate knowledge work at scale. The launch arrives at peak tension: a $285 billion selloff in software and services stocks swept through markets over two days following Anthropic’s release of legal plugins, and now the company is proving why investors believe AI will reshape professional services by introducing a model engineered specifically for financial research and autonomous multi-agent workflows.
The Benchmark Inflection
Opus 4.6 represents a qualitative shift in what large language models can accomplish. The model expands its context window from 200,000 tokens to 1 million tokens—a five-fold increase that enables Claude to process roughly 3,000 pages of text in a single conversation. On GDPval-AA, an independent benchmark measuring knowledge work tasks in finance, legal and technical domains, Opus 4.6 outperformed OpenAI’s GPT-5.2 by approximately 144 Elo points, translating to a win rate of roughly 70 percent in head-to-head comparisons.
More specifically, Anthropic claims Opus 4.6 scores 76 percent on MRCR v2, a retrieval test measuring a model’s ability to find information buried in massive document sets, compared to just 18.5 percent for Sonnet 4.5. For financial analysts, the implications are immediate: a model that can ingest 500-page SEC filings, proxy statements, and earnings transcripts while maintaining perfect recall fundamentally changes the economics of financial research.
The model also achieved the highest score to date on Terminal-Bench 2.0, an evaluation of agentic coding, and topped Humanity’s Last Exam, a multidisciplinary reasoning test covering complex tasks across multiple domains.
Agent Teams and Autonomous Execution
Beyond raw capability, Opus 4.6 introduces agent teams in Claude Code—a feature that allows multiple AI agents to work simultaneously on different aspects of a single task, coordinating autonomously. One early user, Invariant Labs, reported that Opus 4.6 “autonomously closed 13 issues and assigned 12 issues to the right team members in a single day, managing a ~50-person organization across 6 repositories.”
This represents a conceptual leap. Earlier AI models functioned as assistants to human workers. Opus 4.6’s agent teams execute work independently across complex, multi-step processes without human intervention. For enterprises, the implication is obvious: roles organized around sequential task completion—research, analysis, documentation, coordination—now face structural redundancy.
The Market Reaction
The timing of this launch reveals the stakes at play. On February 3, Anthropic’s rollout of legal plugins for its Cowork tool triggered what Bloomberg called a trillion-dollar market rout in tech stocks. Thomson Reuters fell -15.83% in a single day. Legalzoom fell nearly -20%. Goldman Sachs’ basket of U.S. software stocks declined 6 percent, its worst day since April’s tariff-fueled selloff.
This response reflects investors’ sudden realization that Anthropic is not building a better chatbot; it is building a platform capable of replacing entire functional workflows. Yesterday’s abstract concern about AI disruption became concrete threat when Anthropic demonstrated legal contract review automation. Today’s Opus 4.6 launch confirms the same threat applies to financial analysis, coding, and any knowledge-intensive domain.
Valuation and Enterprise Adoption
The valuation math reflects this belief. Anthropic signed a term sheet for a $10 billion funding round at a $350 billion valuation earlier this month, nearly doubling its value from the $183 billion valuation just three months prior. Anthropic’s flagship product, Claude Code, reached a $1 billion run rate in revenue just six months after launch, with major enterprises like Uber, Salesforce, Accenture, Spotify, Novo Nordisk and Ramp deploying it for production use.
For comparison, OpenAI is in talks to raise capital at a $500-830 billion valuation, but Opus 4.6’s benchmark wins suggest Anthropic’s technical execution—at least on knowledge work—may be advancing faster.
The Enterprise Reality
For enterprise adopters, Opus 4.6 marks a practical inflection. According to Reuters, several major law firms—including Allen & Overy and Clifford Chance—have already integrated Anthropic’s Cowork into their document review and due diligence pipelines. Financial institutions, previously cautious about AI adoption due to hallucination risks, are now re-evaluating deployment timelines given the documented accuracy improvements.
The million-token context window addresses a historical limitation: the inability of models to maintain coherence across long documents. Analysts processing earnings calls, regulatory filings, and research notes can now work with complete document sets rather than fragmented excerpts, reducing the risk of missing critical context.
Looking Ahead
Anthropic’s release strategy appears calculated. By launching legal tools first and following with Opus 4.6, the company demonstrated production capability before releasing its most advanced model. This sequencing builds credibility with enterprise buyers who require evidence of practical deployment before committing resources.
The competitive implications extend beyond Anthropic and OpenAI. Companies whose business models depend on human knowledge work—legal services, financial analysis, consulting, software development—now face concrete timelines for disruption. The $285 billion selloff was not panic; it was repricing based on demonstrated capability.
For investors, the question is no longer whether AI will automate knowledge work, but which companies will capture the value created and which will lose their competitive moats.