Are LLMs any good at coding? We鈥檙e halfway through the revolution

The semi-automation of much software coding has been one of the bigger successes of LLMs so far. But there鈥檚 no clear consensus on exactly how far we鈥檝e come entirely toward the ultimate goal of automatic software development.聽 Are we almost there, halfway there, or just barely getting started?

Some, like New York Times columnist Kevin Roose, have enthusiastically embraced the idea that large language models (LLMs) make software development accessible to non-coders, enabling anyone with an idea to create apps with simple prompts. Others, like AGI researcher and AI critic Gary Marcus, warn that such narratives dangerously overhype AI鈥檚 capabilities, obscuring its limitations and potentially discouraging young programmers from learning the fundamentals of coding.

So, where does the truth lie? Are LLMs genuinely revolutionising coding, or are they merely regurgitating existing patterns without true problem-solving abilities? As someone who has been coding since 1980 and working in AI R&D since the mid-1980s, I鈥檝e had firsthand experience with both the astonishing utility and frustrating shortcomings of current AI coding tools.聽 Overall it feels like we鈥檙e halfway through a revolution in AI automation of coding 鈥� but given the nature of exponential change, halfway through conceptually may mean we are quite close to the finish line in terms of clock time.

What LLMs do well in coding

Roose鈥檚 enthusiasm for AI-assisted coding tools isn鈥檛 entirely misplaced. There is something remarkable about describing a problem in plain language and watching an AI generate functional, even elegant, solutions. LLM-powered coding assistants like GitHub Copilot, Cursor, and Replit have already transformed software development in several ways:

Speeding up routine coding tasks. If a problem has been solved before and the relevant techniques are well-documented, LLMs can quickly generate working code, often saving hours of effort.

Lowering the barrier to entry. Beginners can now create useful software without years of formal training, at least for straightforward applications.

Boosting productivity in known domains. Developers working within established paradigms鈥攚hether building CRUD applications, automating workflows, or implementing standard algorithms鈥攃an significantly accelerate their work.

Enhancing creative coding. As I鈥檒l discuss later, LLMs shine in areas like music generation, where they can rapidly produce scripts for experimental transformations of sound.

For these reasons, it鈥檚 no surprise that Roose found himself “vibecoding” small, personalised applications that solved everyday problems. AI models are great at repurposing existing software techniques for novel applications鈥攖aking an old tool and using it in a new way.

Why we must expand conversation beyond latest AI arms race

Where LLMs fall short

However, as Marcus rightly points out, LLM-based coding tools struggle when faced with deeper challenges. Their limitations are particularly apparent in areas like:

Generalisation beyond training data. LLMs excel at regurgitating and remixing existing solutions but struggle to reason about entirely new programming paradigms or novel problem spaces.

Debugging and long-term maintainability. Writing code is one thing; ensuring it works correctly, handles edge cases, and remains maintainable over time is another. AI-generated code often requires significant human oversight and refinement.

The last 20% of hard problems. Many AI applications鈥攊ncluding self-driving cars and automated coding鈥攃an achieve 80% accuracy fairly easily. But the final 20% often involves complex reasoning, deep debugging, and optimisation that current AI models simply cannot handle.

Truly innovative software engineering. If we only built software in ways that AI models can assist with today, we鈥檇 be stuck in a loop, endlessly recycling past programming paradigms and system architectures instead of inventing new ones.

These weaknesses become painfully clear in the kind of software development required for AGI research, where conventional solutions don鈥檛 cut it.

AGI development exposes LLMs鈥� coding limitations

In my own AGI research, so far, I鈥檝e found LLMs to be almost entirely useless.

Take, for example, our work on MeTTa, a new programming language designed for AGI development. Since LLMs are trained on existing codebases, they struggle with anything that deviates significantly from established programming paradigms. Even fine-tuning a model on a corpus of MeTTa code hasn鈥檛 yielded much improvement. We鈥檝e experimented with prompting LLMs to reason about MeTTa鈥檚 operational semantics, hoping they could deduce how to write MeTTa code effectively, but the results have been disappointing.

It鈥檚 not just about MeTTa, though. The deep technical challenges involved in building AGI 鈥� such as optimising the Metta Optimal Reduction Kernel (MORK) for scaling neural-symbolic-evolutionary AI 鈥� require sophisticated problem-solving and deep reasoning that LLMs simply do not possess. Even in widely used languages like Rust, the ability of coding LLMs to help with complex, memory-intensive optimisations is minimal.

There鈥檚 also a broader, more troubling dynamic at play. Because LLMs are so helpful when working within existing programming paradigms, they exert an implicit pressure on developers to stick to well-trodden paths rather than pushing boundaries. If I weren鈥檛 committed to unconventional AGI development, I might feel tempted to adjust my approach to something that AI tools could better assist with. That鈥檚 a dangerous trap鈥攐ne that could stifle the kinds of software innovation necessary for genuine breakthroughs.

Why DeepSeek shows the potential of European AI

LLMs rock at computer music coding

On the flip side, I鈥檝e found LLMs to be incredibly useful for creative-arts coding 鈥� for example in graphic arts or music generation.

Let鈥檚 say I have an idea: What if I take two musical riffs, decompose them using a wavelet transform, and recombine their coefficients to create an offspring riff? What if I emphasise long-range coefficients from one riff over another? Does the choice of wavelet basis functions matter?

With an LLM, I can describe these concepts in plain English and get functional Python scripts within minutes. Previously, it might have taken me a full day to write and debug such scripts manually. Now, I can explore an idea in a fraction of the time, making it feasible for me to experiment now and then with generative music despite my packed schedule. LLMs and associated deep neural nets allow my robot-led band Desdemona鈥檚 Dream to do what it does, and enable software projects like Incantio and Jam Galaxy and so many others to create new capabilities and income streams for musicians.

The computer music example illustrates a key strength of LLMs: They are excellent at helping with novel applications of existing computational techniques. They might not innovate at the level of inventing a new music theory or algorithm, but they allow me to very rapidly and flexibly explore creative new ways to use existing signal-processing tools for music generation.

LLM coding is an AGI accelerator

Semi-automating computer music and other creative arts is wonderful, but it鈥檚 not what will get us to Singularity and superintelligence and all that good stuff.聽 Fundamental technological progress relies heavily on the kind of deep technical creativity that eludes LLMs entirely. But even so, I think it鈥檚 clear the current state of LLM coding is already accelerating our progress toward the Kurzweilian endgame.

While LLMs can’t help with the hard parts of coding AGI, it’s not always the hard parts that eat up the most development time. We are clearly into the phase now where AI tools are concretely and palpably helping accelerate the development of new AI tools. Human expert ingenuity is still needed for core AGI architecture and algorithmics, but LLMs are hugely helpful for building test suites for AGI code, they speed up preprocessing of data to evaluate pieces of AGI code, and so on and so forth. This all smells to me personally a lot like 鈥� not quite 鈥淎GI is coming this year鈥�, but definitely “endgame before Singularity”.聽聽

Right now, LLM coders are a tool, not a replacement

The bottom line is, the skeptics and the enthusiasts are both right here鈥攋ust in different ways. LLMs are transformative for certain types of coding, particularly when working within established paradigms.聽 They are far from the total revolution that Roose鈥檚 enthusiasm suggests, and Marcus is correct to warn against overhyping their capabilities.聽 But the current capabilities can not only revolutionise digital creative arts and other important areas, but accelerate our progress toward AGI and Singularity.

If we falsely assume AI can already replace deep software engineering, we risk discouraging today鈥檚 students from learning to code at a fundamental level. And the world can use these students to help with the deep stuff 鈥� like creating AGI. At the same time, if we downplay the strengths of current LLMs for coding, we fail to take advantage of an astoundingly useful tool for accelerating routine and creative coding tasks.

One way to frame things: AI-powered coding tools are most useful when working within existing computational frameworks but fall short when asked to extend or reinvent them. If your goal is to generate functional applications based on well-documented methods, LLMs can be an incredible asset. But if you鈥檙e developing fundamentally new software architectures鈥攍ike those required for AGI鈥攄on鈥檛 expect much help from today鈥檚 AI models.

As AI continues to evolve, we鈥檒l clearly need systems that go beyond LLMs 鈥� with capability for deeper reasoning and for grounding their code generation in a real understanding of their code鈥檚 activity in the real world. That鈥檚 exactly what we鈥檙e working on in OpenCog Hyperon and the Artificial Superintelligence Alliance, exploring neural-symbolic-evolutionary cognitive architectures that can overcome LLM limitations.

For now, from a practical coding perspective, the best approach is to see LLMs for what they are: powerful assistants and accelerants in certain aspects of software development work, but still far from replacements for the deeper aspects of human software engineering expertise.聽

We need to clearly acknowledge the current strengths and weaknesses, while also manifesting proper respect for the amazing speed with which new capabilities are coming into play.

Peak founder says Labour 鈥榮treets ahead鈥� on AI policy

老九品茶