Working with AI, State of the Union June 2024

June 21, 2024

This post explores what it is like to work with AI in a software development context. With our current rate of growth, this experience changes significantly every 3 - 6 months. In this installment, I chronicle an experiment of exploring a new concept by building a PoC using Claude Sonnet 3.5 and Anthropic’s new Artifacts feature.

This week, Anthropic released Claude Sonnet 3.5 along with a new feature they call Artifacts. According to Anthropic’s numbers, this new model outperforms even the recently released (and poorly named) GPT-4o. Buried in that same article, you will find a description of Anthropic’s new *Artifacts feature. As far as I can tell, Artifacts allow you to focus on making and improving a particular thing (program, image, etc.) throughout the larger context of a conversation with the model. Many artifacts can exist per conversation. Programmatic ones allow you to see code with a running preview and are versioned!

Below, I’ll briefly describe my experience with this new release. For additional context, you may want to try the result yourself. As a test case, I wanted to explore the process of creating alignment data for an audio Bible. Why? Much of my day job revolves around developing an alignment tool for text-based Bibles. If you’re not familiar, “alignment data” describes the relationships between texts at the level of words, groups of words, or parts of words. At Biblica, we’re focused on aligning translations to Hebrew and Greek sources so that we can project source annotations through translations that are written in modern languages. Anyway, I wanted to explore a user experience focused on aligning an audio bible.

So what was it like collaborating with AI in June 2024?

The new Artifacts feature is a big step in the right direction. It just removes all kinds of annoyances that have come from coding with GPT-4 in the past. It keeps the thing you are working on front and center, the artifact in context properly, and allows you to easily switch between versions without scrolling through a long conversation.

The coding abilities of Claude Sonnet 3.5 are also excellent. It used a variety of libraries during our prototyping when appropriate. It also had good first-draft styling and design ideas. The most impressive part was that, for a fairly complex domain (aligning an audio bible, dealing with multiple languages, etc.), it rarely took more than a sentence or two to describe my intent. This last part is worth emphasizing: this model understood my ideas as I produced them. Never once did it seem to misunderstand what I wanted to accomplish.

There is still a lot of work to be done, though.

Most of the logic we built for this little concept demo was written in a single React component. In my local code repository, I had a few other boiler files for loading and displaying the component. Claude was aware of those but was mostly just writing code for my main react component. If I wanted to do some refactoring and split this component up, there did not seem to be a good or easy way to accomplish that with what is available today. Working in a single component isn’t the end of the world though, especially if Claude is writign all the code for me. Unfortunately, though, things got sketchy when the component grew to ~200 LOC. The model’s code output began to have rudimentary syntax errors like missing opening braces for an if statement. Beyond that, it didn’t want to reproduce the entire component during each iteration, so it would print fragments of the component that I would have to carefully replace in my local code repository. Beyond the syntax errors, the fragments were often functional, but where to put them wasn’t always obvious.

Overall, the prototyping experience was speedy and productive in the beginning – magically so. But when we hit that threshold, a lot of the magic disappeared.

I had some more UI/UX ideas, but I decided to stop when the syntax errors stopped. Then, it was time to deploy to Netflify so I could share the working PoC. At this point, I ran into issues with hosting the mp3 file for the demo in Netlify. The prototype used audio ranges, which the Netlify HTTP server did not seem to support. I was asking Claude for advice on this when I hit my Claude request limit. I’m a paying customer for Claude Pro, mind you, and I still hit the limit before I was able to take this small PoC across the finish line.

I ended up finishing the deployment without Claude’s help. One of the features that we had programmed for the PoC was left broken (the audio snippet for particular tokens should play when you click on them).

All in all, the early prototyping experience was honestly magicial. Claude was bringing my ideas to life, typing all the code for me. But even with my subscription, Claude couldn’t take me from idea to deployment. This was for a PoC, the core logic of which was ~250 LOC. I don’t think I could depend on Claude for larger projects, not even for a whole half-day work session.

When it comes to Generative AI, there is so much to be excited about. We also have a long way to go. I’m looking forward to what comes next.

Try the result on Netlify
Look at the code on Github