The Vibe Coder, the Machine, and the Cathedral: How a US Federal Court Saved Open Source Without Knowing It
Part One: A Licensing Dispute That Shook the Commons
On a quiet corner of the Python ecosystem, a licensing dispute erupted that may prove to be the most consequential controversy in the history of open-source software governance. The project at the center of the storm is chardet, a character encoding detection library so deeply embedded in the infrastructure of the modern web that most developers who depend on it have never given it a second thought. For years, chardet has operated as the invisible workhorse that ensures text renders correctly across systems, browsers, and languages. Its original author is Mark Pilgrim, a figure of near-legendary standing in the Python community and the mind behind Dive Into Python. Pilgrim released chardet under the GNU Lesser General Public License, a copyleft license whose defining characteristic is its viral condition: you may use, modify, and distribute the code, but any derivative work must be distributed under the same LGPL terms. The license propagates by design. It is the legal mechanism through which the original author ensures that the freedoms granted to the community can never be revoked by a downstream actor.
Recently, the current maintainers of chardet undertook what they described as a complete, ground-up rewrite of the library, driven substantially by large language models. From a purely technical perspective, the results were extraordinary. Version 7.0.0 delivered a forty-one-fold increase in speed alongside a suite of new features. Decades of accumulated technical debt, the inevitable residue of human developers working under human constraints, had been swept away by the tireless output of the machine.
But alongside this technical triumph, the maintainers made a decision that detonated a legal controversy across the open-source world. They announced that because the new code had been generated from the ground up by an AI, it was no longer bound by Mark Pilgrim's original LGPL license. They stripped the copyleft terms and republished chardet under the permissive MIT license.
Their reasoning possessed an intuitive plausibility that made it seductive: if the AI wrote entirely new lines of code, if no human hand copied a single expression from the original, how could the old license still attach? Had the machine not functioned as an automated clean room, severing the chain of copyright inheritance?
Mark Pilgrim thought otherwise. In a message on the project's public repository that was courteous in tone and devastating in substance, Pilgrim thanked the maintainers for years of dedicated stewardship and praised the project as a genuine free software success story. He then drew an unambiguous legal line. The maintainers, he informed them, possessed no right to relicense the project. Their claim of a ground-up rewrite was legally irrelevant because they, and the AI they deployed, had extensive exposure to the original LGPL-licensed source code. This was not a clean room implementation, which requires a rigorous informational firewall between the team that analyzes the existing code and the team that writes the replacement. Feeding the original source directly into a language model and instructing it to rewrite the codebase is the antithesis of a clean room. The informational quarantine is not merely breached; it never existed. Pilgrim observed with characteristic precision that inserting a sophisticated code generator into the workflow does not magically grant the maintainers any additional intellectual property rights. He requested that the project be reverted to its original license.
The resulting conflagration on GitHub, before the thread was locked, illuminated a question of enormous consequence. If a large language model can rewrite an entire codebase in an afternoon, what prevents any actor, individual or corporate, from using AI to systematically strip copyleft protections from free software and repackage the output under permissive or proprietary terms? What prevents, to name the threat plainly, automated license laundering on an industrial scale?
The chardet maintainers were not acting in bad faith. They were acting on a legal assumption that happened to be spectacularly, dangerously wrong.
The answer to their error, and to the existential threat of automated license laundering, lies in a federal appellate decision that never mentions a single line of source code.
Part Two: The Machine Cannot Be an Author — The D.C. Circuit's Definitive Ruling in Thaler v. Perlmutter
On March 18, 2025, the United States Court of Appeals for the District of Columbia Circuit issued its opinion in Thaler v. Perlmutter. The Supreme Court subsequently declined to hear the case, making the D.C. Circuit's ruling the final and authoritative word on the question it addressed. That question, stated with deceptive simplicity by Judge Millett in the opening lines of the opinion, was this: Can a non-human machine be an author under the Copyright Act of 1976?
The answer was no.
The facts were straightforward. Dr. Stephen Thaler, a computer scientist, created a generative artificial intelligence system he called the Creativity Machine. The Creativity Machine produced a visual artwork that Thaler titled A Recent Entrance to Paradise. Thaler submitted a copyright registration application to the United States Copyright Office, listing the Creativity Machine as the work's sole author and himself as the work's owner. The Copyright Office denied registration on the ground that a human being did not create the work. Thaler sought reconsideration twice within the agency. At each stage, he reaffirmed that the work was autonomously generated by an AI and argued that the human authorship requirement was unconstitutional and unsupported by statute or case law. The agency affirmed its denial each time. Thaler then sought judicial review in the United States District Court for the District of Columbia, which granted summary judgment to the Copyright Office. Thaler appealed.
The D.C. Circuit affirmed. Its reasoning warrants close examination, because the opinion is not a blunt policy pronouncement. It is a meticulously constructed statutory analysis, and its doctrinal architecture has profound implications that extend far beyond the visual arts and deep into the heart of the open-source software ecosystem.
The statutory foundation. The court began with the Copyright Act's text. The Act does not define the word "author." But Judge Millett demonstrated, through a comprehensive survey of the statute's interlocking provisions, that the word "author" as used throughout the Copyright Act refers exclusively to human beings. The opinion identifies seven distinct textual indicators, and their cumulative force is overwhelming.
First, the Act's ownership provision vests copyright "initially in the author." Because copyright is a property right, and because Congress specified that authors immediately own their copyrights upon creation, an entity that cannot own property cannot be an author under the statute. Second, the Act limits the duration of a copyright to "the life of the author and 70 years after the author's death." Machines do not have lives or deaths in any legally cognizable sense. Third, the Act's inheritance provision states that upon an author's death, the termination interest passes to the author's widow, widower, children, or grandchildren. Machines have no surviving spouses or heirs. Fourth, copyright transfers require a signature by the owner, and machines lack both signatures and the legal capacity to provide an authenticating signature. Fifth, authors of unpublished works are protected regardless of the author's nationality or domicile, concepts that are meaningless when applied to a machine. Sixth, joint works require that the authors act with the intention that their contributions be merged, and machines lack minds and do not form intentions. Seventh, and by way of contrast, every time the Copyright Act discusses machines, it treats them as tools used by authors rather than as authors themselves. A computer program is defined as a set of instructions to bring about a certain result. The word "machine" is given the same definition as "device" and "process," and these terms are used consistently throughout the statute as mechanisms that assist authors.
The court was careful to note that no single one of these provisions states a necessary condition for authorship. An author need not have children, a domicile, or a conventional signature. Even the ability to own property has not always been required: married women in the nineteenth century authored copyrightable works even though coverture laws forbade them from owning copyrights, and the Supreme Court recognized as much in Belford, Clarke and Co. v. Scribner. The point, instead, is holistic. The Copyright Act's text, taken as a whole, is best read as making humanity a necessary condition for authorship. That is the reading, as the court put it, to which the provisions of the whole law point.
The historical reinforcement. The court then turned to the administrative and legislative history surrounding the Copyright Act, not as a substitute for textual analysis but as reinforcement. The Copyright Office first addressed whether machines could be authors in 1966, a full decade before the 1976 Act was passed. The Register of Copyrights wrote that the crucial question was whether a work was basically one of human authorship, with the computer merely being an assisting instrument. The Office formally adopted the human authorship requirement in 1973, stating that works must owe their origin to a human agent. In 1974, Congress created the National Commission on New Technological Uses of Copyrighted Works, known as CONTU, to study how copyright law should accommodate automated creation. CONTU's final report, published in 1978 but reflecting conclusions formed during the drafting of the 1976 Act, was unequivocal: "There is no reasonable basis for considering that a computer in any way contributes authorship to a work produced through its use. The computer, like a camera or a typewriter, is an inert instrument, capable of functioning only when activated either directly or indirectly by a human."
The court applied the settled principle that when Congress reenacts a statute using a term whose meaning has been well established by the administering agency, Congress is presumed to adopt that interpretation. Because the interpretation of "author" as requiring human authorship was well settled by 1976, through both Copyright Office practice and CONTU's expert analysis, the 1976 Act carries that meaning.
The rejection of Thaler's arguments. Dr. Thaler advanced several counterarguments, each of which the court dispatched with precision.
Thaler first argued that a dictionary definition of "author" as "one that originates or creates something" encompasses non-human entities. The court rejected this as an exercise in reading words in isolation rather than in their statutory context. If "machine" were substituted for "author" throughout the Copyright Act, the statute would refer to a machine's children, a machine's widow, a machine's domicile, a machine's nationality, and a machine's mens rea. The Act would generate internal contradictions by treating "machine" as simultaneously meaning both an author and a tool used by authors.
Thaler then argued that the work-made-for-hire provision allows non-human entities such as corporations to be recognized as authors, and that this precedent should extend to machines. The court observed that the provision uses the word "considered" with deliberate care: the employer is "considered the author," not declared to be the author. The provision allows copyright protections attaching to a work originally created by a human to transfer instantaneously, as a matter of law, to the hiring party. Congress was careful to avoid using the word "author" by itself to cover non-human entities. If Congress had intended otherwise, the provision would say that those who hire creators "are the author," not that they are "considered the author."
Thaler further argued that the human authorship requirement would disincentivize innovation. The court responded that copyright law has never existed primarily for the benefit of authors. Quoting the Supreme Court, Judge Millett wrote that copyright law makes reward to the owner a secondary consideration, and that the primary object in conferring the monopoly lies in the general benefits derived by the public from the labors of authors. Machines, including the Creativity Machine, do not respond to economic incentives. The human authorship requirement still incentivizes humans to create and to pursue exclusive rights to works that they make with the assistance of artificial intelligence. The incentive structure is undisturbed.
Finally, and this is a point of great significance that is routinely overlooked in popular commentary, Thaler attempted to argue on appeal that he himself was the work's author by virtue of having made and used the Creativity Machine. The court refused to reach this argument. Thaler had waived it by failing to raise it before the agency and by failing to adequately challenge the district court's finding of waiver on appeal. His opening brief offered only a single, bare, conclusory sentence describing the district court's conclusion as based on a misunderstanding of the record below. The D.C. Circuit held this insufficient to preserve the argument for resolution on the merits.
The significance of this waiver cannot be overstated. It means the court never decided, and deliberately left open, the question of whether a human who creates, operates, and directs an AI system is the "author" of the system's output within the meaning of the Copyright Act. The opinion is clear that the human authorship requirement does not prohibit copyrighting work that was made by or with the assistance of artificial intelligence. The rule requires only that the author of that work be a human being, the person who created, operated, or used artificial intelligence, and not the machine itself. The court explicitly noted that the Copyright Office has in fact allowed the registration of works made by human authors who use artificial intelligence, citing the Office's 2023 guidance stating that whether a work made with artificial intelligence is registerable depends on the circumstances, particularly how the AI tool operates and how it was used to create the final work.
The opinion thus establishes a clear holding and a clearly unresolved frontier. The holding: a work whose sole listed author is a machine is not eligible for copyright registration. The unresolved frontier: where, precisely, the line falls between tool-assisted human authorship, which is copyrightable, and autonomous machine generation, which is not. The court wisely declined to draw that line, noting that Congress and the Copyright Office are the proper audiences for such questions. Judge Millett even acknowledged, with a literary flourish that one suspects was not entirely unintentional, that science fiction is replete with examples of creative machines that far exceed the capacities of current generative artificial intelligence, and that there will be time enough for Congress and the Copyright Office to tackle those issues when they arise.
What the court did not reach. One further aspect of the opinion demands attention. The Copyright Office had argued not only that the Copyright Act requires human authorship, but that the Constitution's Intellectual Property Clause itself mandates it. The D.C. Circuit expressly declined to address this constitutional argument, holding that the statutory ground was sufficient and invoking the cardinal principle of judicial restraint: if it is not necessary to decide more, it is necessary not to decide more. This restraint is doctrinally significant. It means the question of whether Congress could, if it chose, amend the Copyright Act to permit non-human authorship remains formally open. The constitutional ceiling has not been tested. But the statutory floor is now firmly established: under the current Copyright Act, as definitively construed by the D.C. Circuit, the algorithm cannot be an author.
Part Three: The Fortress That Holds — How Thaler Protects the Open-Source Commons
The Thaler opinion concerns a visual artwork and never mentions software. But its doctrinal logic, combined with the Copyright Act's derivative works provisions, constructs a legal fortress around the open-source ecosystem that is remarkably well-suited to repel the specific threat that the chardet controversy exposed.
The mechanism is elegant in its simplicity, and it operates through the interaction of two doctrines.
The first doctrine is the one the D.C. Circuit has now definitively established. To the extent that AI-generated code reflects the autonomous output of the machine rather than the direct creative expression of a guiding human author, that code is not eligible for copyright protection. It enters the world legally unowned.
The second doctrine is the law of derivative works, codified in 17 U.S.C. Sections 101 and 106. A derivative work is one based upon a preexisting copyrighted work: a translation, adaptation, transformation, or other recasting. The critical principle is that a derivative work does not extinguish the copyright in the underlying work. It layers new expression on top of old. The copyright holder of the original retains all rights in their contribution. The creator of the derivative work holds rights only in the new material, and only if that new material is itself copyrightable.
Now apply both doctrines simultaneously to the chardet scenario and observe the result.
The maintainers directed a large language model to ingest and rewrite Mark Pilgrim's LGPL-licensed code. The AI operated not in a vacuum but with direct, extensive exposure to the original expression: its structure, its logic, its flow, its architectural choices. The output was not independent creation. It was transformation of an existing copyrighted work. In the taxonomy of copyright law, this is the definition of a derivative work. It does not matter that no single line was copied verbatim. Copyright protects expression, not merely literal strings, and the organizational, structural, and logical expression of a codebase persists through reformulation just as the narrative architecture of a novel persists through translation into another language.
The AI-generated rewrite of chardet is therefore a derivative work of the original LGPL-licensed library. The LGPL requires that all derivative works be distributed under the same license terms. This is the copyleft's viral mechanism, and it activates with mechanical precision regardless of whether the derivative was produced by a human, a machine, or a human directing a machine.
The Thaler doctrine then closes the trap. Because the AI's contribution is not eligible for copyright under the Copyright Act as construed by the D.C. Circuit, the maintainers hold no copyright interest in the new expression. Because the underlying work is copyrighted by Mark Pilgrim under the LGPL, and because the LGPL mandates that derivative works carry the same license, the copyleft propagates with full and undiminished force into the rewritten code. The maintainers cannot relicense the derivative work to MIT because they lack the copyright ownership necessary to effectuate such a change. To relicense, you must hold the copyright. They do not. They are attempting to convey rights they do not possess while extinguishing rights that belong to another.
This is not a close question. It is a categorical legal error.
The maintainers confused syntactic novelty with legal independence. Their intuition, that entirely new lines of code mean an entirely new work, is humanly understandable but doctrinally illiterate. The derivative works doctrine has never required literal copying to apply. A German translation of a French novel creates entirely new sentences, but no one would suggest the translator may strip the original author's copyright and republish the translation under terms of the translator's unilateral choosing. An AI-mediated transformation of a codebase is no different in principle.
Nor does the deployment of a large language model constitute a clean room implementation. The clean room procedure, as practiced in the software industry since the IBM-compatible BIOS clones of the 1980s, demands that one team analyze the existing software and produce a purely functional specification describing what the software does, stripped entirely of how it does it. A separate team, with no access to the original source code, then writes a new implementation from that specification alone. The entire purpose of the procedure is to ensure that the new code is an independent creation and not a derivative work. Feeding the original source code directly into a language model and instructing it to rewrite the codebase obliterates this firewall. The machine has consumed the original expression in its entirety. The output is derivative by definition.
Consider now the implications for the broader ecosystem.
The Linux kernel is licensed under version 2 of the GNU General Public License. It represents one of the most extraordinary achievements of collaborative software development in human history: over thirty million lines of code contributed by thousands of developers who chose the GPL precisely because of its copyleft protections. Under the GPL, anyone may use, study, modify, and distribute the kernel, but any derivative work must be distributed under the same GPL terms and the corresponding source code must be made available. This is the social contract that has sustained the kernel's development for over three decades.
Without the Thaler doctrine, a sufficiently motivated corporation could point a fleet of language models at the kernel source tree. It could instruct the models to rewrite each subsystem, claim that the AI had performed an automated clean room reimplementation severing all prior licensing obligations, and enclose the resulting code behind proprietary walls. The GPL's copyleft would be reduced to a parchment barrier enforceable only against those too principled or too resource-constrained to deploy an LLM. The strategic asymmetry would be devastating. Individual contributors and small projects would continue to honor the copyleft out of conviction while corporations with industrial AI infrastructure systematically laundered the same code into proprietary products.
The interaction of Thaler and the derivative works doctrine prevents this. Because the AI's output is not copyrightable, no new copyright interest arises that could override the original GPL license. The kernel contributors' copyleft persists, impervious to algorithmic circumvention. The lock holds not because it was designed for this particular key, but because the key was never forged.
The analysis works differently, but no less coherently, for permissively licensed projects. Consider the Mesa 3D Graphics Library, distributed under the MIT license. Suppose an enthusiast directs an AI to rewrite and optimize a complex rendering pipeline within Mesa. Under the MIT license, anyone is free to modify and distribute the code, provided the original copyright and permission notices are preserved. The AI-generated portions of the rewrite carry no independent copyright under Thaler. They are unowned. But they are integrated into a project whose existing human-authored components remain under the MIT license. The result is a hybrid: a work containing original MIT-licensed expression and new, uncopyrightable AI-generated expression. The developer may compile, use, and distribute the improved software, but the developer cannot claim personal copyright over the AI-generated optimizations. The developer cannot pivot and release those optimizations under a proprietary license, because there is no copyright interest to license. Anyone else, from a solo hobbyist to a trillion-dollar technology firm, may freely use the AI-generated portions because they are, in the most precise legal sense, unowned.
This outcome may initially seem like a loss for the enthusiast who spent a weekend directing the AI. But it is, in fact, the feature that protects the enthusiast's own interests. If AI-generated code were copyrightable and if an automated rewrite could sever derivative works obligations, then permissive-license projects would be equally vulnerable to enclosure. A corporation could rewrite MIT-licensed code, claim copyright over the AI output, and distribute the result under restrictive proprietary terms, effectively privatizing what was freely given. The uncopyrightability of AI output ensures that what enters the commons as free remains free.
An important nuance must be acknowledged here, because the Thaler opinion itself insists upon it. The D.C. Circuit did not hold that all code produced with the involvement of an AI is uncopyrightable. It held that a work whose sole author is a machine is not eligible for copyright. The court explicitly recognized that works made by human authors who use artificial intelligence may be registerable, depending on how the AI tool operates and how it was used to create the final work. This leaves open a significant spectrum. A developer who uses an LLM as a glorified autocomplete, writing substantial original code and occasionally accepting a machine-generated suggestion, almost certainly retains authorship over the resulting work. A developer who types a single prompt and accepts the AI's output wholesale almost certainly does not. The vast middle ground, where the vibe coder provides detailed architectural direction, iterates through multiple refinements, selects among alternatives, and exercises substantial creative judgment in curating the output, remains doctrinally unresolved. The D.C. Circuit was candid about this, and its candor is to be respected rather than papered over.
But this unresolved spectrum does not affect the license laundering analysis. Even if a vibe coder's creative direction were sufficient to establish human authorship over the AI-assisted rewrite of a copyleft project, the result would still be a derivative work of the original. The copyleft would still propagate. The vibe coder would hold copyright only in whatever original expression the human contributed, not in the underlying work, and could not relicense the whole. The derivative works doctrine is agnostic to the tool used to create the derivative. Whether you rewrite a GPL library by hand, with a sophisticated IDE, or with a large language model, the result is a derivative work if it is based upon the original, and the GPL's terms attach.
A brief note on the continental perspective is warranted, though the American analysis does the decisive work. German copyright law, the Urheberrecht, anchors the right even more tightly to the human person. § 2 Abs. 2 of the Urhebergesetz requires a "persönliche geistige Schöpfung", a personal intellectual creation, and the word persönlich demands a reflection of human individuality that no statistical model can supply. The Court of Justice of the European Union, in its Infopaq and Painer decisions (C-5/08 and C-145/10), has established that copyright under the harmonized European framework requires the author's own intellectual creation reflecting free and creative choices. The transatlantic consensus is thus complete, though it rests on different doctrinal foundations: the American framework reasons from statutory text and structure, while the European framework reasons from the philosophical link between the work and the personality of its creator. Both arrive at the same destination. The algorithm is not an author. Its output, standing alone, bears no copyright. And a derivative work produced through its operation cannot escape the gravitational pull of the original human author's license.
Part Four: The Cathedral Endures
The legal landscape that emerges from the D.C. Circuit's opinion in Thaler is not hostile to the vibe coder. It is, if properly understood, the vibe coder's most important ally.
The existing license architecture of the open-source ecosystem, the GPL, the LGPL, the MIT license, the BSD licenses, was built on the foundation of human copyright. These licenses work because a human author holds a copyright and uses that copyright to impose conditions on downstream use. The copyleft licenses impose the condition that derivative works remain free. The permissive licenses impose minimal conditions, typically preservation of the copyright notice, and otherwise allow maximum freedom. Both categories depend on the existence of a valid, enforceable copyright held by a human author.
The Thaler doctrine reinforces this foundation in two ways. First, it prevents the creation of new, conflicting copyright interests over AI-generated code that could be used to override or circumvent the original license. Second, by leaving the original human author's copyright undisturbed, it ensures that the license conditions attached to that copyright propagate through derivative works exactly as the original author intended. The copyleft's viral mechanism is not disrupted by the interposition of a machine. The permissive license's conditions are not rendered superfluous by the absence of new copyright in the AI-generated portions. The system holds.
No new generation of open-source licenses is required to survive the generative AI era. The brilliance of the GPL and the MIT license lies precisely in their foundational reliance on human copyright, and the D.C. Circuit has now confirmed, as the final word of the federal judiciary, that copyright remains a human institution.
What must evolve is not the law but the governance culture of open-source projects. Repositories will need to establish clear contribution guidelines that acknowledge the distinction between human-authored code, which carries copyright and can be meaningfully licensed by its author, and AI-generated code, which does not. Contributor license agreements may need to be reconceived, not because the legal framework is broken, but because contributors and maintainers need shared expectations about what it means to submit code that was substantially generated by a machine. Projects should develop norms that recognize the vibe coder's genuine contribution, the architectural vision, the testing discipline, the integration judgment, without pretending that the machine's syntactic output carries an independent copyright that it does not possess.
The D.C. Circuit itself pointed toward this future. The court noted that the Copyright Office is studying how copyright law should respond to artificial intelligence and is making recommendations based on its findings. Congress has completed a bipartisan report addressing artificial intelligence and intellectual property. The political branches are engaged. The question of where, precisely, tool-assisted human authorship ends and autonomous machine generation begins will be refined over the coming years through agency guidance, legislative action, and inevitably further litigation. Open-source communities would be wise to begin developing best practices well before that refinement is complete.
The existing legal architecture, constructed decades before anyone imagined a transformer model, turns out to be remarkably well suited to a challenge its drafters never anticipated. No emergency legislation was required. No novel regulatory apparatus had to be erected. The Copyright Act, interpreted faithfully according to its text, its structure, and the well-settled meaning of its terms, provided a complete and coherent answer to a question that seemed, at first glance, to belong to science fiction. This is what sound legal architecture looks like: principles general enough to absorb technological upheaval, yet specific enough to yield determinate outcomes in concrete disputes. It is also, not coincidentally, what sound software architecture looks like. The parallel is not accidental. Both law and code are systems of rules designed to govern behavior under conditions of uncertainty, and the best systems in both domains are those that remain stable under stress precisely because their foundational abstractions were chosen with care.
The open-source movement began as a radical proposition: that the most critical infrastructure of the digital age should be a shared human heritage, maintained not by corporate fiat but by a decentralized community of individuals acting out of conviction. The arrival of the large language model does not end this project. It accelerates its maintenance beyond anything the early pioneers could have imagined. The AI is the most powerful instrument the open-source volunteer has ever wielded, a tireless assistant that can clear decades of accumulated technical debt in hours, that can translate architectural intent into working code with extraordinary speed, that can shoulder the grinding syntactic labor that has burned out generations of maintainers.
But the instrument remains an instrument. Under the Copyright Act as definitively construed by the D.C. Circuit in Thaler v. Perlmutter, affirmed by the Supreme Court's refusal to hear the case, the machine cannot be an author. It cannot hold copyright. It cannot license. It cannot relicense. It cannot sign the social contract that binds contributor to contributor across borders, across decades, and across legal systems.
The vibe coder directs the machine. The law protects the commons. And the cathedral, as it always has, belongs to those who chose to build it for everyone.