p|datav0.1news190
HomeMarketsHubsNews190MapMonitor
menu
HomeMarketsHubsNews190MapMonitor
theme
p|data·synced:Polymarket—Kalshi—Manifold—Predict—Myriad—Opinion—Limitless—Gemini—
hubsaboutmethodologyagentsdocsx© 2026
markets·Science and Technology
opentrade on Manifold ↗
marketsingle market

In 2029, will any AI be able to construct "reasonably" bug-free code of >= 10k LOC from a natural language specification? (Gary Marcus benchmark #4)

Manifold·Science and Technology
I
91%
· 0.0pp
24h change
vol cum
M26.8k
vol 24h
M0
spread
n/a
ends
Jan 1, 2030
YES 91¢
NO 9¢

The fourth question from this post: https://garymarcus.substack.com/p/dear-elon-musk-here-are-five-things The full text is: "In 2029, AI will not be able to reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]" Judgment will be by me, not Gary Marcus. Ambiguous whether this means start or end of 2029, so I have set it for the end. For this question I am not using the exact text of the question, because I think "bug-free" is 1. silly 2. untestable. I will instead accept if it produces code of >=10k LOC with <= the number of bugs in an implementation by a human (many small bugs for some significant bugs will unfortunately be down to my subjective impression of whether it's "better") I am also ignoring the "no gluing libraries together" requirement, because I don't know what he means. Does he want an AI that writes 10k LOC of assembly? I will accept code that is calling/using libraries at <= the rate that normal human programmers do. Sep 16, 2:26pm: Some additional clarifications: If there was a benchmark that, for instance, compared human to AI code, allowed both to ask follow up questions about the initial natural language prompt, allowed tests, allowed multiple submissions, etc. (so roughly the workflow of "human consultant is hired to write a ~10k LOC project") I will accept that. If there's an agent that can pass this for some "typical" coding tasks but not for highly-specialized tasks (e.g. it can write a website, a data structure library, or implement some standard ML workflows but can't write highly secure code or an efficient optimizing compiler) I will accept that. To frame it another way: if it can write small-medium projects that a median FAANG coder can do, but not projects that FAANG coders who implement research-level code can do (and non-research-level coders can't), I will accept that. (tbc I don't mean "research level quality", I mean "production/industry quality, research level difficulty/complexity")

timeline
opened · Sep 16, 2022opencloses · Jan 1, 2030
prices over time · In 2029, will any AI be able to construct "reasonably" bug-free code of >= 10k LOC from a natural language specification? (Gary Marcus benchmark #4) highlighted
−24h
now
volume · 24h windows
no volume history

similar markets

suggested by pdata
Manifoldmanifold
By 2029, will any AI be able to read a novel and reliably answer questions about it? (Gary Marcus benchmark #2)
yes82¢· 0.0pp
M0 24h · 1 market
Polymarketpolymarket
Will any AI model reach ___ Coding Arena Score by June 30?
155067¢▼ 16.0pp156036¢▲ 3.5pp157018¢▲ 7.5pp
$374 24h · 3 markets
Kalshikalshi
Top Coding AI this week
ChatGPT2¢▼ 1.0ppClaude97¢▼ 1.0ppDola4¢· 0.0pp
+6 others
$2.8k 24h · 9 markets
Manifoldmanifold
In 2029, will any AI be able to take an arbitrary proof in the mathematical literature and translate it into a form suitable for symbolic verification? (Gary Marcus benchmark #5)
yes77¢· 0.0pp
M0 24h · 1 market
Polymarketpolymarket
Will any AI model reach ___ Coding Arena Score by December 31?
156075¢▼ 3.5pp158057¢▼ 1.0pp160042¢▼ 1.2pp
$0 24h · 3 markets
Manifoldmanifold
By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?
yes10¢· 0.0pp
M0 24h · 1 market