ooo-yay.com

It's been six months since I last wrote about my experience using GitHub CoPilot while I wrote this blog. If you haven't read that entry, my key takeaways were:

CoPilot functioned like a smarter auto-complete
It was really useful for rote tasks
I needed to be pretty specific with instructions
Hallucination was pretty minimal

In that post I mentioned that I saw some early signs that I think LLMs could, possibly, upend the stranglehold that massive corporations hold over the software industry. Stated a bit more conversely, LLMs are a force multiplier in the hands of a seasoned engineer who knows how to leverage them. I compared them to the innovation of compilers and interpreters.

In that six months, a lot has happened. Cursor was released and I was exposed to it at work. The feature that struck me the most was Cursor's agent capabilities. There was something about the way that feature worked that left me with the disposition, "This is something you must learn to use well." When I got off of work I canceled my Jetbrains subscription and bought Cursor (they're about the same monthly price.)

You may have noticed my blog has a new look. I liked my previous Svelte frontend a lot. I spent roughly a month and a half building out that UI in my spare time. I felt really comfortable in that code but it had some sharp corners too. The way Svelte works is fundamentally different from React and Vue, which led to me making some pretty rookie mistakes because the contextual leap from what I do at work vs what I use at home was just that big. The point is, I wanted to go back to using Vue and I was impressed with the advancements of Vue 3. That led to my first practice run with Cursor.

Agents as workers

I didn't really have much of a plan but I've worked in Vue before and I had already made other technology choices on my own. One thing I realized early on with Agents in Cursor is that they hold a lot of context including cues from your IDE. I spent a couple hours building out the scaffolding of the project and making sure linting was in a place that I'd be happy with and would return meaningful errors. After that, it was off to the races. In less than a week using my off-time I was able to crank out a redesign and reimplementation of my blog using Vue. I was even able to add a much more robust moderation view for comments.

I achieved this by using Cursor's Agent prompt window, which they've recently merged into a single window. Cursor's agent uses a multi-step, multi-model RAG approach to problem solving, but it also has the ability to deeply integrate into your editor to spot linting errors, test failures, command line output (and errors), and probably other cues I'm unaware of. If it spots a problem in its original generation, it simply runs the model again with the errors included. It's also incredibly good at mimicking patterns in existing code, which you can cue it to do by including similar files in its context. Speaking of the model context, you can also have it land on more accurate results using generated code by plopping those files on its context. I was able to generate entirely new pages, using my component framework, by pointing it at the SDK folder in my frontend. If that comes with types it's even more accurate.

Hallucinations using the Agent I would describe as mild as long as you keep the task pretty well scoped. You can think of Cursor as a really fast SWE II pair programmer. You can give it tasks with some scope, examples, and major design choices made and it runs and does them for you. The experience is overall pretty incredible. It'll even attempt to adopt patterns it can discern from your codebase.

Rules are another feature that I found really useful. You can give Cursor a set of Markdown rules that it passes to every prompt. For instance, I use rules to to tell Cursor to use Tailwind CSS classes whenever possible and when it doesn't it should be for a good reason. At work I was able to patch a systemic issue with tool context window oversizing using rules. They're a powerful primitive for your workflows.

Testing is paramount

With increased productivity I became somewhat skeptical of the LLMs output at times. For instance, it sometimes takes some liberties or makes assumptions that I otherwise would not. I was already at a happy place in my workflow with the agent where I could iterate quickly on a single document or small collection of documents so I took some time to build some confidence into each of those iterations. My thinking was that it's easier to watch the tests and code evolve at the same time, like with Test Driven Development (TDD), than it is to try to catch the tests up to the code.

I extended my workflow by having it write assertive tests based on my prompts. This took the mechanical overhead of writing tests away from me and let me focus on what to test and how - which is much more valuable than sinking that energy into thoughtfully crafting testable interfaces. Instead, because the LLM has to write code and tests it often writes the same volume of testable code that I would and often from the same perspective.

My API actually never had testing. Instead I focused a lot on how the code read to mitigate bugs and endless troubleshooting; frankly, I just didn't have the time to go write 200+ tests for the size of my API. When I was done with the redesign I turned Cursor loose on my API's to write tests for them based on the docstrings I had. It did exceedingly well, helped me expand my RBAC permissions middleware, and even wrote the tests into a matrix for different RBAC roles. I now have 55 unique tests against 3 different RBAC enabled roles (that's 165 unique tests.)

Doubling down

The call I made in my last post with CoPilot as a reference case was pretty early, but what I saw is what I described as the writing on the wall. I've really only just begun to figure out how to make Cursors models work the way I would but more efficiently. First, I'm going to cover the basis of my predictions:

Modern software engineers at Senior level in software-focused companies spend maybe 40% of their time writing code. The rest of that time is spent in decision making, design, report writing, slide deck preparation, research, learning, alignment meetings, and wheel greasing that gets the real work of an enterprise (or product company) done. As engineers become more familiar with LLMs and how to use them they'll be a great productivity boost, especially in small teams, thereby allowing engineers to focus more on the tasks that actually move things in a business or software package forward.

I'm going to revisit my prediction from six months ago:

Prediction #1: In the next 1-2 years there will be a non-trivial number of non-VC funded, multi-million dollar ARR products come to market built by small software teams using LLMs as productivity enhancement.

I'm not talking about the kind of products coming to market now where it smells like AI was just sprinkled on top. Rather, I'm talking about, potentially non-AI, software that is built by LLM assisted humans.

I'm also going to add another prediction:

Prediction #2: I don't think the time of the software engineer is coming to an end. I think it's merely just begun.

In this prediction, I'm really focusing on the role that a software engineer plays in a business. We're not there strictly to code, in fact it's often a minority of what we do on any given day or week. We're the living logic of an organization that keeps things changing and adapting, thereby keeping the business meeting the needs of its customers and more adaptable to emergent business threats.