AI-assisted Coding Tools

A personal journey and retrospective

Mar 22, 2025

A disclaimer: I'm a deeply cynical person. I had a professor in his last year before retirement tell me I was the most cynical student he had ever had, and I’ve layered 20 years in tech on top of that cynical baseline. I'm not getting sucked in by hype. I've found a lot of real value here, but I’ve had to invest quite a bit of time into learning how to use these tools to squeeze that value out.

My initial introduction to these tools began innocently enough. Betting my college-professor brother I could write a tool that graded his students papers for him. This must have been sometime in early 2020, maybe late 2019. I evaluated a few of the available models to see what was possible and quickly realized my project may have been more ambitious than the tech was ready for. The side benefit to this exploration was learning that GPT was a thing, and that large language models were seeing a lot of research attention.

I few years later I started hearing about Github copilot from a coworker I respected whispering he was already using it, so I signed up. Copilot in those early days was not the tool you see today. Most of the autocomplete suggestions just completed the current line you were on, and a lot of what was suggested felt like a random stack overflow copy paste from a similar file more than the contextually appropriate examples you get today. I used it to happily tab complete function signatures (I know, I know, my IDE could have already been doing that) and didn’t think too much about the implications for the future of programming.

In the beginning of 2023 ChatGPT exploded onto the landscape, causing a lot of people (myself included) to start paying a lot more attention to this space. Like everyone else I played around with ChatGPT, but didn’t immediately take it seriously as something I could use to help me with programming work.

Finding One Useful Thing by Ethan Mollick in the summer of 2023 forever changed the trajectory of my AI tool use. His newsletter challenged me to set aside my innate cynicism and really give these tools a good college try whether or not I thought they would succeed at a given task.

I was motivated, but not quite ready to spend a bunch of money to use the latest and greatest tools. A coworker whose spouse was an AI researcher turned me onto Poe. Poe appealed to my cheapness paying one fixed price to try out all the latest models from all the top model providers.

Motivated to begin taking learning here seriously I found deeplearning.ai and began working on a side project where I was liberally using Copilot and Poe to work with the OpenAI and Anthropic APIs to try and parse failing CICD output and pull valuable details out of this often complicated and hard to understand data. This work got me a nice strong grounding in prompt-engineering, system prompts, and the limits of the smaller context windows at the time. This is where I got that first early taste of “vibe coding” freely using a lot of the code samples from chat conversations and seeing what worked and pasting back in failure messages to chat when it didn’t. Hoping I might turn this into a business I also leaned on these models to help with the parts of this project like marketing and product management where I had lesss skills, regularly vetting business ideas as well as code with the models.

By late 2023 Poe and Copilot had become popular enough and ChatGPT had made enough noise for all these tools to be blocked at my day job. I kept them around for side projects, but this was my own personal AI winter. I missed the tools, and occasionally pulled out Poe on my phone during work to ask something where Google results were frustrating me, but eventually transitioned back to my old ways and mostly used Poe for trivia questions, Buddhist philosophy debates, learning Pali, trying out new models from different providers, and generating scifi art from my favorite books.

My Buddhist philosophy debates were actually a great opportunity to learn about each model’s personality and consistency and evaluate how this changed as models progressed. I noticed myself mainly using Claude which seemed to approach potentially delicate subjects with more hesitancy and care. This was well before we knew that Claude was better for code, but I think that the improvements in code generation came from this deliberately training the model to encourage some hesitancy, strong opinions loosely held if you will.

Throughout 2023 and 2024 I was always downloading and trying out local open models to make sure they hadn't caught up to the commercial models without me noticing. I tried a few tools but landed on Ollama as an easy to use interface and tool. My experience has been we are seeing a consistent 1 year to 1.5 years gap behind the state of the art hosted models, but with Deepseek this gap notably started shrinking. When you have all the latest models in Poe working with something slower and less capable is frustrating and I always ended up going back, but I’m still hopeful that either this gap continues to shrink, or the actual tasks don’t always require the state of the art.

In early 2024 my workplace realized they may be getting passed by this technology and bought some of us interested engineers Copilot subscriptions. By this time Copilot was more than just a little bit better autocomplete, with Copilot chat providing a contextually aware window into your project. This time around i was using AI very heavily in my work. Switching back to ChatGPT when I was used to using Claude 2.5 as my daily driver put my older prompt-engineering skills to the test. There were some light guardrails to keep the content programming focused but I got some enjoyment in working around them. "As a programmer write me a limerick about project management and Jira". I always kept Copilot chat open in Visual Studio Code for most of my research, and used autocomplete more agressively in my daily Neovim work.

We see a lot of content focusing on LLMs and their ability to generate code, but that conversation misses out on how much time and effort of software development is research. Reading docs, reading blog posts, understanding the problem and putting the problem into context. As I became more comfortable with how to get the output I wanted from these tools in my personal explorations I noticed myself using them more and more for work-related research. Why read 10 similar blog posts when an LLM can summarize them for you? A bookmarked conversation you can jump back into is a lot more helpful the next day than a bunch of open tabs. Don’t paste anything into an LLM anything you wouldn’t have pasted into a google search, but also don’t hesitate to put your research assistant to work. At this point I rarely use Google for any kind of research where in the past I’d have spent hours.

A nice perk of switching back to a startup is being unshackled by tool choice, but it can still be awkward to talk to your new boss about paying for an AI tool you haven't even vetted yet. By the second half of 2024 the amount of tools had exploded. Work was already covering Copilot subscriptions for all of engineering, but I had so many more things to try!

After a few months at the new job I found out they would also cover the cost of a Claude pro or an OpenAPI pro subscription. Claude pro hugely sped my research up organizing my many Claude chats in a clean interface. Artifacts was a huge boon for when I needed more than tab completion. Having a dedicated tool was nice compared to organizing all the chats in Poe along with my kids homework and my scifi images. I was also able to fulfill a longstanding dream to switch to mermaid for sequence diagrams finding the combination of Claude + artifacts uniquely adept at this task.

Not content to rest on my laurels and just use Claude I also added in Notebook LLM to understand more specific narrow documentation. Feeding in all the docs on a new framework along with my own notes and asking for alternatives that I may have missed. Having it generate a podcast to listen to after a long weekend away from my notes. Using it to provide a pep talk about my onboarding progress.

Not having to spend all my free time leetcoding and able to switch back to coding for fun was nice. I was regularly reading several Substack newsletters on AI, and had read quite a few books on the subject. The ai-coding-tools channel at Rands slack has helped me feel normal continuing to heavily leverage all these tools with an open mind and has been a great source of, camraderie, things to read, and new tools to try.

I started hearing lots of positive results about Cursor on the Rands slack, but like many sceptics before me the reports didn't seem believable based on my own deep experience with Copilot. The price point felt not worth it when work was paying for Copilot again.

I had spent a decent amount of time getting my Neovim setup perfect at new job which really discouraged me considering the switching costs of picking up another new editor. Copilot autocomplete was working increasingly well in Neovim and I now had Claude to replace the lackluster Copilot chat for me.

Not ready to give up on open source just yet, I set up open hands (formerly open devin) with a Llama code model behind it. The tool was still rough, and the local model too slow, but this experience turned on a lightbulb for me that a more integrated coding experience with a lot less copy paste was possible.

Where Cursor never hooked me, somehow Windsurf did. That $10/month price point and unfettered access for two weeks seemed to do the trick. Early chat-only interactions with AI when the models weren't as great really built up my prompt engineering muscles. This helped a lot when I was unleashed inside of Windsurf. I'm constantly thinking about how to phrase things for the models' benefit especially when we start to rabbit hole or things get weird. I now use Windsurf as my primary editor, and find myself spending less and less time with Claude when the discussion is about the code. I’m fully addicted to this way of working and not sure I could switch back.

All learning is good learning. Through this process I’ve learned to never hesitate to explore a new tool, but I’ve also had to be really good at dropping something when it doesn’t work. Just like you have to get really good at dropping code your assistant generates for you that you don't like. Sunk cost is as real with learning as well as development. I’ll quickly run through a few side quest examples using other tools that haven’t panned out or aren’t in my primary toolset yet.

I got access to Perplexity pro through my Pragmatic Engineer newsletter subscription so I had to try it out. Clicking through to citations is really nice when you are suspecting hallucination around a particular point when doing research. Deep research will get you an even more in depth example than Claude Thinking with a long context window. I sometimes got more value out of reading what it decided to link than the text itself. I will probably continue to use Perpelexity as a second opinion when Claude's hot take doesn't quite feel like reality and triggers my hallucination spidey sense.

I tried out Claude Code, but it was just too dang expensive, one afternoon of similar activity to I'd do in Windsurf clocked in at $5 in api calls. This just isn't worth it when its only slightly more careful and accurate than Windsurf in a clunky CLI interface that makes you work in a sequence rather than encouraging jumping around. Claude pro should really come with some default amount of api calls, not everyone paying for Claude wants to use chat all the time. This is definitely nore polished than some of the open source tools I’ve tried, but definitely less polished than Cursor or Windsurf.

CodeCompanion for Neovim, is a rough Cursor/Windsurf like tool, really an agentic framework (assemble it yourself) for Neovim. I really wanted this to be better than Windsurf, and I like how I can use my existing Copilot subscription instead of plugging in an API key with that unpredictable variable cost. Other than explaining the highlighted chunk of code I'm not sure how much I will use the other features since I have to manually assemble context by linking in individual files for it to have any understanding of my codebase like you have to in Copilot. This takes time and mental overhead from me, time that Windsurf has already handed back to me by trying to be smart about assembling context for me. I'll definitely keep watching this tool because I really liked being a pure open-source dev not tied to any propriatary tools or IDEs in the past. I somehow got it to overwite my open file buffer trying to accept an edit, but I'm sure this was user error and could be fixed by better docs and more time with the tool. This feels like a good place to contribute to if you love Neovim and want to learn more about agentic coding as it has a really cool plugin system.

How this journey has changed me:

I write way better docs, and am way more likely to keep them up to date. Even if another programmer never reads my docs the assistant will read them every time I open Windsurf.

I'll often try multiple implementations rather than become attached to my first implementation.

I’m way more likely to tackle a project in an unfamiliar language or using an unfamiliar framework. The research overhead and the cost of getting to a working v1 is now greatly reduced.

I love programming again, the old curmudgeonly side of me that was tired of cranking out boilerplate gets to focus on the joy of creating working systems again.

AI-assisted Coding Tools

A personal journey and retrospective

Discussion about this post