Folks, we have an open source problem. And, no, it’s not the problem some think. You’ll hear people rail against corporations that falsely describe their code as open source. Sometimes they’re correct. You’ll hear others bemoan the influx of venture-backed companies that dilute the meaning of open source to fuel corporate gains. Sometimes they’re correct.
But the problem isn’t the companies. At least, that’s not the primary problem. Businesses piggybacking on open source branding in pursuit of commercial gains is nothing new. The difference is that, over the past few years, free and open source software has lost its way, leaving developers (and businesses) just one option: permissive, Apache-style licensing. The first kind of open source licensing was, as its sometimes prickly and pedantic adherents insist, not “open source” at all, but rather copyleft, free software licensing like the GPL. (“We want people to know we stand for freedom, so we do not accept being mislabeled as open source supporters,” said Richard Stallman.)
It’s easy to point fingers at the corporations, but the real issue is that, in the rush to make open source software ripe for corporate adoption, we lost all power to protect user freedom.
No give, all take
There is no such thing as “open source AI,” however much we like to pretend. Even the Open Source Initiative (OSI), which is the go-to source for the Open Source Definition, has spent more than a year tackling how to define “open source” in a world of weights, floating-point numbers, and training data. “Open source kind of missed the evolution of the way software is distributed and executed,” Stefano Maffulli, executive director of the OSI, has suggested. He’s trying to resolve this problem by October 2024, at least with regard to AI.
Small wonder that confusion runs rampant in the world of AI. Meta, for example, “open sourced” Llama, its large language model (LLM). This seems to clearly not adhere to standard definitions of open source because although it is “available for free for research and commercial use,” it comes with caveats. You can’t use it to improve other LLMs, and you can’t use it if you have more than 700 million daily active users. Yet, by the OSI’s definition, open source can’t “restrict anyone from making use of the program in a specific field of endeavor.”
Meta Vice President for AI Research Joelle Pineau says current open source licenses don’t really fit a world in which training data plays a huge part, opening up users to significant liability. Mafulli concurs, noting, “We definitely have to rethink licenses in a way that addresses the real limitations of copyright and permissions in AI models while keeping many of the tenets of the open source community.” AI has made this glaringly obvious, but the same holds true in cloud computing. “Open source” hasn’t kept pace with shifting definitions of software distribution and what a downstream developer needs to actually use the software.
We have privileged the right of downstream developers to do whatever they want with the code over the right of upstream developers to insist that their software remain free. I’ve written before that this sort of licensing minutiae is silly, but now I’m less sure.
Both sides now
I’ve gone back and forth on the topic for years. In 2005 I championed the GPL, insisting that no other open source license has done more than the GPL to make open source commercially viable. By 2009 I was on the Apache Software License train. By 2014 I was channeling RedMonk analyst James Governor in declaring, “We’re living in a post-open source world,” due to developers’ apparent indifference to which licenses they used on GitHub. I’m sure I’ve been both right and wrong in every position I’ve taken because none of these issues are simple, as I’ve outlined.
Over the years, I’ve trended toward permissive, Apache-style licensing, asserting that it’s better for community development. But is that true? It’s hard to argue against the broad community that develops Linux, for example, which is governed by the GPL. Because freedom is baked into the software, it’s harder (though not impossible) to fracture that community by forking the project. To me, this feels critical, and it’s one reason I’m revisiting the importance of software freedom (GPL, copyleft), and not merely developer/user freedom (Apache).
If nothing else, as tedious as the internecine bickering was in the early debates between free software and open source (GPL versus Apache), that tension was good for software, generally. It gave project maintainers a choice in a way they really don’t have today because copyleft options disappeared when cloud came along and never recovered. Even corporations, those “evil overlords” as some believe, tended to use free and open source licenses in the pre-cloud world because they were useful. Today companies invent new licenses because the Free Software Foundation and OSI have been living in the past while software charged into the future. Individual and corporate developers lost choice along the way.
This doesn’t mean you need to pity the poor billion-dollar startups or enterprises that want to monetize the software they release. And it definitely doesn’t mean you need to shed a tear for the trillion-dollar cloud companies whose business models depend on a steady supply of software created by others.
Forget the companies. Think about the developer who wants to keep her software free. There’s no such thing as copyleft in cloud and AI-land, but there should be. Developers basically have one choice today in their open source software licensing (permissive, take-my-code-and-run licensing), and that’s not good for our long-term interest. Open source matters. Free software does, too. We need to bring it back.