Friday, October 11, 2024

What "The Monkey's Paw" can teach us about AI prompt engineering

I decided to try out an AI app builder -- in this case, Together.AI's LlamaCoder -- to see if one could actually build something useful from just a few prompts. 

TL;DR, these tools are almost useful, but every prompt feels like making a wish with a Monkey's Paw: Unless you are ridiculously specific with your request, you'll end up with something other than what you wanted. (Usually nothing cursed, but also usually nothing truly correct.)

As a test model, I asked LlamaCoder to "Build me an app for generating characters for the Dungeons and Dragons TTRPG, using the the third edition ruleset."  Here's what happened.

For those of you who aren't tabletop roleplaying game (TTRPG) nerds, D&D 3rd Edition came out nearly 25 years (it's currently in its Fifth Edition), so lots of its source material has been on the web for a very, very long time. There's no reason an app-builder born of web-scraping wouldn't have plenty of examples to go on, both for the text and the app design.

Here's what my initial prompt produced:

It looks like an app. And, when I click Generate Character, here's what happens:


The app has clearly generated all six standard Ability Scores within typical ranges for a Level 1 character, and randomly chose a Character ClassRaceAlignment, and Background. On its face, this looks like a barebones but reasonable app for pumping out the beginning of a basic D&D 3E character. Not super useful, but okay to save me some dice-rolling and decision-making. 

Aside: Yes, manually calculating a full-fledged D&D character is not entirely dissimilar to filling out a tax return. We're only going over the equivalent of the 1040EZ in this example, but a "real" character generator has complexity similar to TurboTax, and for a lot of the same reasons. (In a future post, we can discuss the similarity between tax lawyers and munchkins.) My findings below are about getting an app of basic competency, not one intended for power users.

In our first output, we already have a problem: D&D didn't introduce Backgrounds to player-characters until 5th Edition. This app is already non-compliant with the rules I set forth. Moreover, a lot of vital character components are missing.

However, generative AI is probabilistic, not deterministic, so every time you enter a prompt, you'll get a slightly (or not-so-slightly) different result. Thus, I entered the exact same prompt again to see if the "Background problem" was just a one-time glitch.


Character Background is now gone, despite using word-for-word the same starting prompt. I now also have the option to choose a base Character Class rather than have one randomly assigned, and the system now appears to offer options to automatically calculate Armor Class and Hit Points

However, Alignment and Race have disappeared, and those are crucial to every D&D character. Moreover, neither version of this app has included Saving Throws, Skills, or a Base Attack Bonus, which are needed to have a character fight or perform any actions in a game.

I also have a new dropdown to choose a method of generating character attributes: Roll 4d6 and drop lowest or Point Buy. This is missing two other classic methods: 3d6 down the line and the Standard Array (which are less popular, to be sure, but absolutely listed in the D&D Players Handbook as approved methods).

And now we have a new problem: choosing the Point Buy option doesn't change anything. The app behaves identically, regardless of my choice on that dropdown. It simply performs a random number generation irrespective of that setting. This is a dummy setting that LlamaCoder threw in of its own accord.

In contrast, choosing a Class does seem to affect the Armor Class and Hit Points of a character, which is to be expected, given there are Class Bonuses for these stats.

LlamaCoder lets you add additional prompts to refine the output, so let's start knocking down these issues.

I added the following secondary prompt to start: Start every character at Level 1, and generate their stats using the Standard Array. Be sure to choose a Race and Alignment for the character. Automatically calculate the character's Base Attack Bonus, Saving Throws, Hit Points, Armor Class, and Skill Bonuses.

This broke LlamaCoder.


Specifically, the system introduced errors into its own code. 


Now, this is a free tool with limited capacity, so I tried breaking up the follow-up prompts to see if fewer instructions would prevent the error. I started with just the first one: Start every character at Level 1, and generate their stats using the Standard Array.

This is what I got:


The generation method dropdown is gone, and when I select Calculate Ability Scores, it randomly places a score from the Standard Array into each Ability, with no duplication (as is correct). Also, Saving Throws and Base Attack Bonus are now included despite no specific prompt. I suspect LlamaCoder is playing around with prompt retention, so it decided to add those features based on my last, failed prompt. Skills, however, were not added.

I also tested all the individual buttons to generate Hit Points, Armor Class, Base Attack Bonus, and Saves. Running them before ability scores are distributed (they start at 0) created the correct negative bonuses, and changing Races and Classes appeared to alter these stats appropriately. Unfortunately, when I chose anything from a dropdown, I had to click four buttons to get all the derived stat blocks to regenerate. (LlamaCoder is not LlamaUXDesigner, clearly.) Let's see if we can fix that.

I added this sentence to the follow-up prompt: A single button should re-calculate Ability Scores, Hit Points, Armor Class, Base Attack Bonus, and Saves.

It worked, sort of:


I've got my One Button to Rule Them All, and all the math is calculated correctly when I click it, but once again, Class, Race, and Alignment have been removed. The first two of those can affect the derived stats, so we need them back. Let's see if we can get there without overloading the system.

I added a third sentence to the follow-up prompt: Automatically choose a Class, Race, and Alignment for the character.

Here's what we got:


This is a passably useful app for generating Level 1 D&D 3E characters. But it took a few iterations, a lot of domain knowledge, and some specificity to get there. In other words, you don't get a good Monkey's Paw wish until very late in the process, when you know exactly all the caveats you need to declare to avoid a harmful result. To get something commercial-grade would take a lot more work.

This speaks to me as a software product manager with 20+ years of experience: my initial prompt was a pretty typical user story, but no engineer -- AI or human -- would have likely produced a good result without more specific acceptance criteria.

Or, as someone more pithy than I (but equally cynical) put it:

To replace programmers with robots, clients will have to accurately describe what they want. We're safe.

I have two decades of product definition experience and I've spent the past five years directly developing AI tools, and it still took me several tries and lot of tinkering to get an app I still probably won't use, because it's missing so many key features. Unless the task is simple, GenAI isn't going to build what you really want, because building good stuff is hard and defining what you want is sometimes even harder. (That's why product managers are necessary.)

Anyone who says otherwise is selling something.

Use the GenAI Money's Paw at your own risk.

---

PS - Want to hire a veteran product leader to actually deliver GenAI value? I'm presently available for hire. You can grab my resume and generic cover letter here.


Friday, October 04, 2024

AI isn't going to kill SaaS; AI is going to kill half-ass startups

There's a lot of bold bullshit prognostication about how new generative artificial intelligence is going to kill the Software-as-a-Service (SaaS) business model because now anybody can build custom enterprise apps with a simple ChatGPT prompt. Nothing could be further from the truth. 

AI is going to lead to more SaaS products that don't suck. But, along the way, AI is going to kill most SaaS startups -- because most early-stage SaaS startups suck.

I'll let SMB explain what I mean.


For those who don't get the joke, here's the definition of a Minimum Viable Product (MVP) from Wikipedia: "a version of a product with just enough features to be usable by early customers who can then provide feedback for future product development."

The cult of the Lean Startup that dominates Silicon Valley and most venture-backed SaaS companies produces almost nothing but MVP SaaS products that are half-assed on their best day by design. The idea is to get early customers to tolerate these half-baked offerings until the founding team learns enough to turn it into actual enterprise software. (Customers buy at the MVP stage because they assume they'll get permanently grandfathered into absurdly reduced early pricing in exchange for being guinea pigs. Also, sometimes MVPs sort of work.)

But today, thanks to AI, I can type a few sentences and get a half-assed prototype for free. I don't need to try -- let alone pay for -- an outside startup's SaaS MVP that's buggy, unreliable, and may not last long. I can get that kind of V1 crap in an afternoon of puttering, complete with mildly functional code I can hand off to a real engineer as a proof of concept.

No, these AI-generated prototypes won't be very good. Most SaaS startup MVPs aren't any good, either. The difference is I don't have to go to a SaaS startup to get a crappy MVP anymore. I can roll my own.

That doesn't mean startups are going away -- startups are doing great -- it means SaaS startups can no longer get away with barebones MVPs. The bar for an initial version of a product just got a lot higher, especially if you want someone to pay for it.

The absurd idea that ChatGPT can spit out a useful SaaS CRM anytime somebody prompts it with "build me a Hubspot clone" is just that: absurd. Anyone who has ever built commercial-grade SaaS software (and I've built a lot) knows that it's really hard and requires sweating a lot of complicated details that GenAI code-vomiters won't address (and that's before we discuss the issue of systems maintenance and required security compliance). As such, there will be plenty of market left for human-driven SaaS startups to claim.

And, because GenAI makes it easy to spin up prototypes, internal alpha releases are cheaper and faster than previously possible. It will be easier than ever to launch a SaaS startup thanks to GenAI. But it will be harder than ever to make a SaaS product that people will pay for.

GenAI will make it simple to create generic, barely useful SaaS apps. But hyper-focused SaaS apps that meet very specific needs at a high level of competency will be become much more valuable precisely because GenAI can't deliver that level of quality, and because maintaining that quality over time requires significant investment. Moreover, with actual SaaS, the costs of maintenance get spread over all customers, not incurred by a single customer running a bespoke in-house GenAI product. (In other words, build vs. buy economics still largely apply.)

The result will be an absolute explosion of niche, highly mature SaaS startups looking to claim extremely specialized market areas. 

Which leads me to my final conclusion, one shared by my old colleague and current AI investor Rob May: venture-backed SaaS is probably dead.

GenAI makes rolling up custom migration tools super-cheap now, so SaaS switching costs are going to drop like crazy. Customer lock-in is going to be really hard to enforce. VCs like moats; GenAI is going to mass-produce bridges. As such, the only way to keep a customer on your product is for your product to actually be great. Being great is hard and expensive and may take a while. VCs aren't known for their patience, and they really hate customer churn.

Moreover, venture capital economics require that every company a VC invests in target a huge total addressable market (TAM), then set money on fire to try and claim a dominant position in the market before anyone else. The niche SaaS apps that AI can't create will have much smaller TAMs than VCs will tolerate. If you want to build a SaaS app in the future, be prepared to bootstrap.

In conclusion: The rise of code-generating artificial intelligence isn't going to destroy the SaaS business model -- just the SaaS business model as we've known it. The future of SaaS is a staggering variety of small, specialized, highly refined products that are ready for prime time at V1. 

MVPs -- and VCs -- need not apply.

---

PS - Want to build a killer SaaS product before the GenAI market shift changes everything? I'm presently available for hire. You can grab my resume and generic cover letter here.