The ChatGPT Moment for Legal Agents
The week legal AI stopped being theoretical
Last Tuesday I sat in a demo call with a law firm’s resourcing team. We were showing them how a contracts agent works: an email address that picks up inbound requests, reviews documents against a playbook, drafts responses, and routes finished work to a supervision queue. The agent had processed an NDA in about four minutes. A complex MSA review took closer to an hour. The resourcing lead, whose job is making sure the right lawyers are matched to the right work across the firm, watched the supervision inbox fill up with completed first-pass reviews and asked a question I’ve been thinking about since: “So what do I actually resource for, if this handles the volume?”
It was not a rhetorical question. He was trying to figure out what his job becomes.
Six weeks earlier, Anthropic had shipped a legal plugin for Claude in Cowork mode. Contract review, NDA triage, playbook configuration, compliance checks. On February 3rd, RELX, the parent company of LexisNexis, had its steepest single-day share price fall since 1988. Wolters Kluwer dropped 15 per cent. Thomson Reuters fell 18 per cent. Within weeks, LexisNexis announced a new product built on top of the same Anthropic plugin. Thomson Reuters crossed a million users on CoCounsel. Harvey reportedly hit $195 million in annual recurring revenue and is raising at an $11 billion valuation.
I think the period between early February and now is the period the market crossed a line. Not because any single product launch was transformative on its own, but because the collective weight of announcements made something tangible that had previously been speculative. Foundation model companies are now packaging legal workflows directly. Incumbents are integrating them as fast as they can. Startups are scaling at a pace that, even eighteen months ago, would have seemed implausible. And six weeks after the market shock, a resourcing lead at a law firm is sitting in our demo not asking whether the technology works, but wondering what his operating model looks like in two years.
What I think changed
For the past three years, the conversation about AI in legal services has been dominated by a specific kind of tool: the copilot. A lawyer opens a browser tab or a Word plugin, pastes in a contract, gets suggestions, applies them manually, and sends the result. Harvey built a very successful business on this model. Spellbook, GC.ai, CoCounsel, and a dozen others occupy variations of the same space. The lawyer is the operator. The AI is the assistant. The throughput of the team is still bounded by how many lawyers are available to pick work up.
The Anthropic announcement matters, I think, because it represents the first time a foundation model company has moved up the stack into legal-specific product. Not a wrapper around an API. A configured, deployable legal capability that organisations can point at their own playbooks and let run. The fact that it arrived as a plugin, something you install and configure rather than something you hire a team to build, is the part that caught the market’s attention. It is also the part that I find most interesting to think about carefully, because the gap between “installable plugin” and “production-grade legal agent” is wider than a single product launch can close.
But the signal is real. When a model provider packages legal workflows and a $50 billion incumbent integrates them within weeks, the question is no longer whether autonomous legal AI will exist. The question is what form it takes and who controls it.
Why we started here
Flank has been building agentic legal AI since April 2023, when we pivoted from a no-code legal automation platform to an agent architecture. At the time, explaining the distinction between an agent and an assistant required a whiteboard and twenty minutes of patience. The category did not exist in the way buyers understood it. “Agentic” was not a word that appeared in analyst reports or procurement checklists.
I mention this not as a priority claim, which would be tedious, but because I think the timing is relevant to understanding how the market is shifting. We spent the first eighteen months of that pivot doing something that looked, from the outside, like an extraordinarily slow way to build a product. We sat with enterprise legal teams and mapped their workflows in granular detail: how requests arrive, how they get triaged, which decisions are genuinely judgment calls and which are pattern-matching against implicit rules that nobody had written down. We built playbooks clause by clause. We ran dry runs where the agent processed real contracts and a lawyer reviewed every output, and we documented every misalignment between what the agent did and what the lawyer would have done. The early deployments took longer to supervise than the manual process they were replacing.
That felt, at the time, like a problem. I now think it was the product being built.
The reason is that a legal agent that operates on generic training data and general legal knowledge is a fundamentally different thing from one that operates on a specific organisation’s templates, preferred terms, fallback positions, and escalation rules. The former can produce plausible legal output. The latter can produce output that a supervising lawyer recognises as consistent with how their team actually works. The gap between those two things is enormous, and it is filled not with better models but with configuration work that is painstaking, domain-specific, and slow.
The scenario that clarifies the difference
Here is what the law firm demo actually looked like in practice. We had configured the agent with a set of the firm’s standard templates and review playbooks. A request arrives at contracts@firm.com. The agent reads the email, identifies the document type, selects the appropriate playbook, and begins work. If it is a third-party NDA, the agent reviews each clause against the firm’s standard positions, identifies deviations, applies the playbook’s fallback logic, and produces a marked-up document with tracked changes and comments explaining each redline. The completed review lands in the supervision queue. A lawyer opens it, sees the flagged items, checks the reasoning, and either approves or edits. The response goes out.
The whole process took four minutes for a standard NDA. The lawyer’s review of the supervision output took perhaps five. Compare that to the previous workflow, where the same lawyer would have spent sixty to ninety minutes on the full review, most of that time spent on operational setup rather than legal judgment: identifying the document type, finding the right template, checking which playbook applies, formatting the response.
The resourcing lead’s question was not about whether the output was good enough. He could see it was. His question was about what happens to the staffing model when you remove the volume work from the human queue. That is a different category of question from “does the AI work?” and I think it is the one that matters more.
What the plugin model gets right, and what it misses
The Anthropic plugin, and the LexisNexis integration that followed it almost immediately, validates something important: the workflow layer for legal AI can be packaged. You do not need a dedicated engineering team to build a contract review pipeline from scratch. The foundational capability is there. The models are good enough for a wide range of legal reasoning tasks. The interface can be made accessible to legal teams without requiring them to learn prompt engineering.
What I think this model misses, and I want to be precise about why, is the operational depth that makes the difference between a demo and a deployment.
Consider what happens when a counterparty sends redlines on a client’s own paper. The markup is layered on top of a previous negotiating version where some tracked changes have been accepted and others haven’t. Comments and tracked changes are used inconsistently. A paragraph appears in a different font because it was lifted from another template. Before the legal analysis can even begin, someone or something has to establish the net state of the document: what is the counterparty actually proposing?
That operational layer, the messy reality of how legal work arrives rather than how it looks in a clean demo, is where I find the real difficulty sits. Models can reason about law. Reasoning about the state of a badly formatted Word document with three layers of conflicting tracked changes is a different problem entirely, and it is one that requires purpose-built engineering rather than general-purpose capability.
The same applies to supervision. A plugin that produces legal output and lets the user review it is an assistant with better packaging. A system where the agent’s confidence level determines whether the output goes directly to the business user or stops in a supervision queue, where the supervision corrections feed back into the system’s playbook logic, where the lawyer can see not just the output but the reasoning path that produced it, that is a different architecture. It is not one you get by configuring a plugin, however well-designed.
The market is splitting
A Fortune article published this month described legal AI as splitting into two categories: “authoritative AI,” grounded in proprietary legal data and research corpora, and “operational AI,” which executes workflows. The framing is imperfect but useful. Thomson Reuters and LexisNexis own the authoritative layer, decades of case law, legislation, and legal commentary that no startup can replicate. The operational layer, the part that actually does the work of reviewing contracts and drafting responses and handling negotiations, is where the competition is fiercest and the category boundaries are least settled.
I find this framing useful because it clarifies where the value actually sits for in-house legal teams. A GC who needs to know whether a clause is enforceable under New York law needs the authoritative layer. A GC who needs 5,000 NDAs processed this quarter without hiring six more lawyers needs the operational layer. These are different problems with different economic profiles and different buyers, and the technology that serves one does not automatically serve the other.
The Anthropic plugin sits on the operational side. So does Flank. So do Harvey’s new Agent Builder workflows, and Legora’s enterprise platform, and the dozen other entrants raising hundreds of millions of dollars to compete in the same space. The question for buyers is not which of these tools exists, because they all do now, but which one can operate at enterprise scale on their specific legal logic with a supervision model they actually trust.
What I think is genuinely uncertain
I do not know how fast this transition plays out. Mustafa Suleyman suggested earlier this year that most legal tasks would be fully automated within twelve to eighteen months. Dario Amodei tracks Claude usage at 60% augmentation and 40% automation, with the automation share growing. Forrester predicts that 25% of planned AI spend will be deferred into 2027 because buyers are cautious about ROI.
These predictions are not easily reconciled. The technology is moving faster than the institutions that need to adopt it, which is a common enough pattern in enterprise software but feels qualitatively different here because the work being displaced is professional judgment, not data entry.
What I think is clearer than the timeline is the direction. The ACC/Everlaw survey found that 64% of in-house legal teams expect to reduce outside counsel reliance through AI. Only 7% have actually achieved a cost reduction. That gap will close, but it will close through operational deployment of agents that do the work, not through better copilots that help lawyers do the work slightly faster.
The resourcing lead’s question, “what do I resource for?”, is the right one. The answer, I think, is that you resource for judgment, supervision, and the work that genuinely requires a human to think about it. Everything else is moving, faster than most people in the profession have absorbed, toward a model where the work gets done by something that does not take holidays, does not have a maximum concurrent caseload, and does not need three months to get up to speed on a new client’s preferences.
The moment the market absorbed this was not tied to any single product shipping. It was the accumulation, over February and into March, of enough products from enough directions at enough scale that the structural shift became visible even to people who were not looking for it. I think this period will be remembered as when legal AI stopped being a feature and started being an operating model. Whether that happens in twelve months or thirty-six, the direction is set. The interesting questions are all about execution.