The Hidden Cost of AI: Tech Debt in the GenAI…

Richard Lichtenstein

Oct 16

Maintaining GenAI products have some new challenges we're just finding out about

Read →

3 Comments

ToxSec

Oct 23

Really interesting angle thanks

Expand full comment

Daniel Popescu / ⧉ Pluralisk

Oct 16

This article comes at the perfect time, thank you for articulatting so clearly the insidious prompt-level tech debt we are only now starting to truely grapple with in our GenAI development.

Expand full comment

Lakshya Agarwal

Oct 16Edited

The difference arises from post-training regimes that the model companies follow. The earlier generations (4o-era) were primarily RLHF-tuned, while the current ones (5-era) are RLVR-tuned.

RLHF requires human feedback, while RLVR is “verifiable rewards” (code/math-adjacent), meaning it can scale much faster. It’s way easier to generate 20 math questions in calculus that are slightly different from each other than it is to generate 20 conversational scenarios and collect “ideal-state” feedback. RLVR is also better suited for “agentic” trajectories where the model can explore its environment autonomously and learn from its errors to eventually complete the objective. This is the key unlock that is delivering the current performance gain in coding. Cursor IDE is an example of such an environment for code, where given an issue and a codebase, the model receives a positive reward if its actions are able to convert a failing test to passing, or a negative rewards if it errors out.

While the objective for both regimes is to move the base model capabilities towards “chat-style” scenarios, RLVR is able to do it at a much higher efficiency. As a result, the models are highly prompt-sensitive, relative to their earlier counterparts.

A promising solution to this “tech debt” is GEPA [1], which combines genetic evolution and Pareto scoring to iteratively improve prompts based on task feedback. Similar to how traditional ML ended up with online learning and continuous optimization, we may see something similar play out.

[1] http://lakshyaag.com/blogs/gepa

Expand full comment