Monday, April 6, 2026

A workflow for Agentic Engineering

"- Claude, build the entire Platform from scratch. Make no mistakes."

In the agentic engineering community, some think that 100% test coverage will guarantee the quality of generated code. I think that relying too much on automated tests is problematic, because testing is difficult. I'd rather review the actual source code. More importantly, having an active part in the development of it by steering the agent(s) in real time.

I usually develop a feature step by step, to not focus too much on upfront planning. I used to do this before the agents joined the development process, and have found that it works well today too. When beginning with a new task, I often have some clear ideas and some vague ideas about how to solve the problem. By starting with the things that are clear, I can postpone thinking about the implementation details of the more vague parts of the overall feature. I can worry about that later, when I have learned more. During the development, the how and what to implement will likely change. This is a natural thing, as you learn more along the way and understand more about the actual problem to solve. I guess this is the basic idea behind Embrace Change (from the Agile Manifesto).

I haven't yet felt any need to set up larger Agent Orchestrations for the kind of problems that I solve. It mostly seems like overdoing it, doesn't it? I don't know about you, but we are not building or reinventing an entire E-Commerce or Social Media platform every day. The things we do is on a much smaller and human-friendly scale. I think the idea about 100% test coverage might be about scaling up fast and the assumption that agents will produce so much code, that it becomes impossible for humans to grasp.

An Example

I recently developed a new feature that involved several repos, a combination of Python backends and Next.js apps. I decided to do this in smaller steps and began with the most straightforward step, which in this case was one of the backend services that I already know well. I had a pretty clear idea on what needed to be changed in there and added a simple Solution Design into the ticket.

For context: I used the same ticket for all the development of this particular feature. Each plan an agent produced was submitted as a comment to the ticket. I have automated this with an MCP server connected to the issue system that we currently use at work. It seems like having the relevant data collected in the ticket made the upcoming tasks clearer for the agent(s). Each task begun with a new context, but I instructed the agent to read the ticket that contained the problem to solve, the solution design, previous plans and linked pull requests before planning how to implement things.

It felt like things went pretty smooth. However, the very first task needed several iterations. I realize now that even if the basic setup of that particular repo was straightforward, the service has evolved over time and the structure of it has diverged. I think this is quite common in long-lived repos. This confused the agent, so I practiced the stop-the-line principle by rejecting generated code, correcting and steering the agent in real time. I did this when I noticed that the implementation (the code) went into a direction I didn't like. My agentic tool of choice, Eca, recently added support for a new steer command that is very useful for this kind of workflow. You can change the direction, or steer, while the agents are working. It is not always necessary to halt all ongoing work, sometimes a friendly nudge in the right direction is enough.

Sometimes, unexpected things happen: the other day, the main model I use was unreachable about half the day (again). Oh no. But I just switched to a different provider and continued the work! This is another thing where an Open Source and Provider-agnostic tool like Eca shines.

Another thing that I noticed was that the short summary the agent produced for each Pull Request covered the whole feature and the particular details pretty well. The data in the actual ticket was probably helpful for this kind of task too.

I've been advocating Test Driven Development (TDD) and the sibling REPL Driven Development for many years. The real-time steering, the stop-the-line principle with origins from The Toyota Way could be an Agentic Engineering version of the Test Driven approach. It's about fast feedback loops. What do you think?

Top Photo: that's me pretending to do something important on the cell phone, taken at Åreskutan, Jämtland, Sweden.

Sunday, March 22, 2026

The tools of an Agentic Engineer

A lot of great things have origins from the 1970s: Hip Hop redefining music and street culture, Bruce Lee was taking Martial Arts to the next level and the initial development of something called editor macros (also known as Emacs) was happening. I was born in that decade, but that's purely coincidence.

My choice of primary development tool since a couple of years back is that editor from the seventies. It is my choice of development for Python, JavaScript, TypeScript and Lisp dialects such as Clojure and elisp. And today, as an agentic engineer, it turned out to be a great choice for this kind of software development too. With the rise of various CLI, TUI & Desktop based tools for AI development, it would be reasonable to think that this ancient code editor would become obsolete - right?

Not if you knew about the innovative Emacs community. It is driven by passion, support from the community itself and Open Source. These components are usually more resilient and reliable long term than the VC driven startup culture. Emacs is part of the greater Lisp community, where a lot of innovations in general take place. The Clojure community is cutting edge in many aspects of software development including AI.

More Agents

One thing that I have noticed lately is that the more I get into Agentic Engineering, the more I use Emacs. When the focus has shifted from typing code to instruct and review, I have found use of Emacs powers I haven't really needed until now. Tools like Magit (git) and I'm also learning more about the powerful Org Mode. I didn't care that much about Markdown before, but now it is an important part of the development itself. So I just configured my Emacs to have a nice-looking, simple and readable markdown experience.

"More Agentic Engineering, More Emacs"

With Emacs, I use a great AI-tool called Eca and with it I am not limited to any specific vendor for agentic development. Vendor lock-in is something I really want to avoid. The combination of Eca and the power tools mentioned before, makes a very nice Agentic Engineering toolset. Eca is actively developed and has a lot of useful features and a very nice developer experience. It supports standards like AGENTS.md, commands, skills, hooks, sub-agents and use a client-server setup in the same way as the language server protocol. It is Open Source and not only for Emacs. Have a look at the website for support of your favorite editor or IDE. By the way, Eca is developed in Lisp (Clojure).

I have my Eca-setup shared at GitHub, and have also some contributions to the Eca plugins repository.

Human Driven Development

With this setup, the human reviewing can happen in real time, and doesn't have to wait until the end where the amount of code too often is quite overwhelming. The human developer (that's me) can quickly act when noticing that things takes a different route than expected, in a similar way as the stop-the-line principle from the Toyota Way. This is a lean way to reach the end goal quickly: deploying code that is good enough for production and adds value.

I have found that many Agile practices in combination with developer friendly tools fits well with the ideas of Agentic Engineering. Even though I've seen worrying signs of a return of the Waterfall movement.

To summarize: the result of my new Agentic Engineering development-style is that I haven't put my IDE to the side - it's at the very Center of the agentic workflow.



Top Photo by me, taken at Åreskutan, Jämtland, Sweden.

Sunday, March 15, 2026

Agile & Agentic Engineering

"Don't fall into a waterfall-style of software development."

Our industry is quickly adapting to the new ways of working and we are redefining what it means to be a software developer: Agentic Engineering.

It's not the same thing as Vibe Coding, and probably why I have had such a surprisingly smooth transition recently. As I see it, vibe coding is about treating code as a black box: it doesn't really matter how things are put together. The only thing important for a vibe coder is the output. As a passionate TDD and REPL driven programmer, this doesn't feel right for me today. Tomorrow, I don't know. Right now, I care about how things are constructed. More importantly, I need to have some understanding about the code itself to get ideas on how to change and improve the output.

This is where Agentic Engineering fits in: a structured way of developing software, where the human developer takes an active part in the process. It is not only about high level architecture or designing the solution, it's also about having the possibility to direct the agents into producing actual code that the human can understand and approve. For me, this is about keeping things simple and concise. Functional. Doing things in small steps, i.e. solve a problem by breaking it down into smaller parts. Experiment and learn along the way, step by step, rather than making big plans upfront. This is the core of the Agile movement.

It's well known that LLMs can produce verbose chunks of code and forget about important things. But it is possible to take control of that part as an agentic engineer. Similar to the stop-the-line principle (from the Toyota Way) where you can halt the production if you identify an issue. Take action (clarify or give new instructions) and then proceed. With these skills, agents can produce chunks of code that are just right. And functional.

From what I've picked up in the developer community lately, there's an increased need for structured work in the new AI landscape. This makes a lot of sense. What surprises me is the conclusion that we should start doing more planning upfront, writing detailed specifications before any code is written. This is what the Plan, Execute, Test sounds to me. I am just misunderstanding, or is this a Modern Waterfall movement?

Plan, execute, test might be the correct workflow for an agent, but not for a human. Planning in the beginning is difficult, because in the beginning we have very little knowledge about the thing to develop. Instead, we could learn what and how to develop something along the way. We can also pick up bits of why along the way too, but it's good to have some understanding about that specific part early in the process.

If the Plan was incorrect we will be 10x, but with 10x waste. Agents produce code fast, and it might not be that big of an issue as before if we need to throw away the result and start over. This is new. The difficult part is throwing away our plan, our design that we've invested in, and start all over again. This drains human energy. Once a plan is set, it can be a too big mental effort to break free from it because things have been decided already. Big planning upfront might sound right, but it is a trap. A vague Jira ticket description to begin with is not necessarily a bad thing.

The challenge as agentic engineers is to 10x the value, and not end up in 10x waste. Essentially making us more product and value focused than before. In short: build the thing right, also build the right thing.

How can we do that? Explore, learn and adapt are words I would like to see as part of the Agentic Engineering definition. Plan a little bit upfront, just enough to get started, but no more than that. Continue exploring, planning and adjust as the work proceeds. Get things out fast so you can collect feedback (logs, errors, usage) and learn what to adjust. That's Agile & Agentic, a Lean and Agentic Engineering workflow.



Top Photo by me, taken at the top of Åreskutan, Jämtland, Sweden.

Saturday, October 25, 2025

Please don't break things

"Does this need to be a breaking change?"

During the years as a Software Developer, I have been part of many teams making a lot of task force style efforts to upgrade third-party dependencies or tools. Far too often it is work that add zero value for us. It adds significant cost, though. As a user of third-party tools, you don't have much choice. Even if you might feel productive as a developer when implementing these forced changes, think of all the other stuff you could do instead to improve your product or platform.

The great tools from the community, most of it Open Source, is something we should be thankful for, and appreciate the efforts made by the people out there. This is a plea to have extra attention when making changes that will affect your users. I am also an Open Source maintainer, and try hard to avoid changes that doesn't add value for the users.

An example from Python: uv

warning: The `tool.uv.dev-dependencies` field (used in `pyproject.toml`) is deprecated and will be removed in a future release; use `dependency-groups.dev` instead

The change itself makes a lot of sense. There's a new PEP standard for how to declare the dependencies only needed for development. Most Package & Dependency management tools out there already had their own implementation of this feature and it is probably a good thing to use the standard. From a user perspective, this only means that we need to make changes in all our Python projects. My suggestion is: why not support both options?

Maintainability vs Value

From a tooling developer perspective it is understandable that you don't want to maintain several ways of solving a problem. What about the Developer Experience of all the users of the tool out there? Imagine the teams maintaining many projects with 10, 40 or even 100 different Microservices and libraries. Each one in its own git repo.

Another Python example: Pydantic

The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0.

I like Pydantic, it is a very useful tool with great features. The 2.0 release also came with significant performance improvements. But this change doesn't make sense to me. I understand that it probably fits better within the domain of Pydantic itself.

Does this have to be a breaking change? I would suggest to have the both alternatives there. Yes, it might be a little bit more of maintenance for you as a library developer. More importantly: your users can focus on adding value into their products instead of this mostly zero-added-value work.

Example three: SQLAlchemy

MovedIn20Warning: The ``declarative_base()`` function is now available as sqlalchemy.orm.declarative_base(). (deprecated since: 2.0)

This is probably also correct, from an internal SQLAchemy domain perspective. From a user perspective, this only means that we need to change a lot of existing code. The cost of having the code overlapping in two namespaces is probably low.

I am also a maintainer of tools, and I've also made mistakes, or design choices that turned out to be not that great later on. But I also actively have made the decision to not force users to make the kind of changes that are described in this post. I don't want to break things for users of the tool because of design choices made in the past.

Should the users change their workflows?

The main thing I work on today is Python tooling for Polylith, that is a Monorepo architecture. The changes introduced by uv, Pydantic and SQLAlchemy actually isn't that much work for the developer teams using Polylith today. You'll have only one place in the source code where these changes are needed. This setup is robust, and is ready for any unexpected breaking changes in the tools that are used. Sounds nice, doesn't it?



Top Photo by generated by Dall-E, prompted and modified afterwards by me.

Sunday, April 20, 2025

Feedback loops in Python

How fast can we get useful feedback on the Python code we write?

This is a walk-through of some of the feedback loops we can get from our ways of working and developer tools. The goal of this post is to find a both Developer friendly and fast way to learn if the Python code we write actually does what we want it to.

What's a Feedback Loop?

From my point of view, feedback loops for software is about running code with optional input and verifying the output. Basically, run a Python function and investigate the result from that function.

Feedback Loop 1: Ship it to Production

Write some code & deploy it. When the code is up & running, you (or your customers) can verify if it works as expected or not. This is usually a very slow feedback loop cycle.

You might have some Continuous Integration (CI) already set up, with rules that should pass before the deployment. If your code doesn't pass the rules, the CI tool will let you know. As a feedback loop, it's slow. By slow, I mean that it takes a long time before you will know the result. Especially when there are setup & teardown steps happening in the CI process. As a guard, just before releasing code, CI with deployment rules is valuable and sometimes a life saver.

Commit, Push & Merge

Pull Requests: just before hitting the merge button, you will get a chance to review the code changes. This type of visual review is a manual feedback loop. It's good, because you often take a step back and reflect on the written code. Will the code do the thing right? Does the code do the right thing? One drawback is that you review all changes. For large Pull Requests, it can be overwhelming. From a feedback loop perspective, it's not that fast.

Testing and debugging

Obviously, this is a very common way for feedback on software. Either manual or automated. The manual is mostly a slower way to find out if the code does what expected or not, than an automated test. There's the integration-style automated tests, and the unit tests targeting the different parts. Integration-style tests often require mocking and more setup than unit tests. Both run fast, but the unit tests are more likely to be faster. You can have your development environment setup to automatically run the tests when something changes. Now we're getting close, this workflow can be fast.

I usually avoid the integration-type of tests, and rather write unit tests. I try to write small, focused and simple unit tests. The tests help me write small, focused and simple code too.

Test Driven Development

An even faster way to get feedback about the code is to write software in a test driven way (TDD): write a test that initially fails, write some code to make the test pass, refactor the test and refactor the code. For me, this workflow usually means jumping back-and-forth between the test and the code. Like a Ping Pong game.

TDD Deluxe

I'm not that strict about the TDD workflow. I don't always type the first lines of code in a test, or sometimes the test is halfway done when I begin to implement some of the code that should make the test pass. That's not pure TDD, I am aware. A few years ago, I found a new workflow that fits my sloppy approach very well. It's a thing called RDD (REPL Driven Development).

With RDD, you interactively write code. What does that even mean? For me, it's about writing small portions of code and evaluate it (i.e. run it) in the code editor. This gives me almost instant feedback on the code I just wrote. It's like the Ping Pong game with TDD, but even faster. Often, I also write inline code that later on evolves into a unit test. Adding some test data, evaluating a function with that test data, grab the response and assert it. The line between the code and the test is initially blurry, becoming clearer along the way. Should I keep the scratch-like code I wrote to evaluate a function? If yes, I have a unit test already. If not, I delete the code.

Interactive Python for fast Feedback Loops

I have written about the basic flows of REPL Driven Development before:

REPL - the Read Eval Print Feedback Loop

When starting a REPL session from within a virtual environment, you will have access to all the app-specific code. You can incrementally add code to the REPL session by importing modules, adding variables and functions. You can also redefine variables and functions within the session.

With REPL Driven Development, you have a running shell within your code editor. You mostly use the REPL shell to evaluate the code, not for writing code. You write the code as usual in your code editor, with the REPL/shell running there in the background. IPython is an essential tool for RDD in Python. It's configurable to auto-reload changed submodules, so you don't have to restart your REPL. Otherwise, it would have been very annoying.

Even more Interactive Python feedback loops

We can take this setup even further: modifying and evaluating an externally running Python program from your code editor. You can change the behavior of the program, without any app restarts, and check the current state of the app from within your IDE. The Are we there yet? post describes the main idea with this kind of setup and how I’ve configured my favorite code editor for it.

Jupyter, the Kernel and IPython

You might have heard of or already use Jupyter notebooks. To simplify, there's two parts involved: a kernel and a client. The Kernel is the Python environment. The client is the actual notebook. This type of setup can be used with REPL Driven Development too, having the code editor as the client and feeding the kernel or inspecting the current state by evaluating code. For this, we need a Kernel specification, a running Kernel, and we need to connect to the running Kernel from the IDE.

Creating a kernel specification

You can do this in several ways, but I find it most straightforward to add ipykernel as a dev dependency to the project.

    # Add the dependency (example using Poetry)
    poetry add ipykernel --group dev

    # generate the kernel specification
    python -m ipykernel install --user --name=the-python-project-name
    

The above commands will generate a kernel specification and is only run once. Now you have a ready-to-go kernel spec.

Start the Kernel
    jupyter kernel --kernel=the-python-project-name
    

The above command will start a kernel, using the specification we have generated. Please note the output from the command, with instructions on how to connect to it. Use the kernel path from the output to connect your client.

The tooling support I have added is as of this writing for Emacs. Have a look at this recording for a 13-minute demo on how to use this setup for a Fast & Developer Friendly Python Feedback Loop.



Top Photo by Timothy Dykes on Unsplash