Should features live in several independent Microservices, or is a Monolith the way to go? What kind of setup is the best one for the code in a repo? Organizing is difficult.
"... I'll just put it in a utils folder for now ..."
There's a thing called Polylith. I've written about different aspects of it before. Polylith is an architecture (and a tool) focusing on the Developer and Deployment experience. It is developed by Joakim Tengstrand, Furkan Bayraktar and James Trunk.
The Polylith Architecture is a fresh take on how to share code, and offers a nice and simple solution to the Monolith vs Microservice tradeoffs. In addition to that, it is a good fit for functional programming. Some time ago, I got an idea: how about bringing a couple of the good things from there to here? So I developed something that I believe could be useful for many Python teams out there.
I already released a preview in early 2022, it was missing some essential features and the great developer experience wasn't really there yet. Also, I had little knowledge about how to package Python apps & libraries, but have learned a lot since then.
Today, I believe the Python tools for the Polylith Architecture is ready to use in the Real World. You will find version 1 on PyPI.
A couple of Poetry plugins
I have developed it as two different Poetry plugins. One of them - the Multiproject plugin - adds support for workspaces to the popular Poetry tool, by adding a new command called build-project.
The second one - Polylith plugin - adds useful tooling support for the Polylith Architecture itself. With the tooling, you can add components & projects in a simple way and also keep track of what's happening in the workspace.
Polylith is using a components-first architecture. The components are building blocks, very much like LEGO. The code is separated from the infrastructure and the building of artifacts. This may sound complicated, but it isn't.
In a way, it is about:
thinking about code as LEGO bricks, that can be combined into features.
making it easy to reuse code across apps, tools, libraries, serverless functions and services.
Keeping it simple.
Have a look at these introductory videos, describing Polylith in Python and the tooling support:
REPL Driven Development is a workflow that makes coding both joyful and interactive. The feedback loop from the REPL is a great thing to have at your fingertips.
"If you can improve just one thing in your software development, make it getting faster feedback."
Dave Farley
Just like Test Driven Development (TDD), it will help you write testable code. I have also noticed a nice side effect from this workflow: REPL Driven Development encourages a functional programming style.
REPL Driven Development is an everyday thing among Clojure developers and doable in Python, but far less known here. I'm working on making it an everyday thing in Python development too.
But what is REPL Driven Development?
What is it?
You evaluate variables, code blocks, functions - or an entire module - and get instant feedback, just by a hitting a key combination in your favorite code editor. There's no reason to leave the IDE for a less featured shell to accomplish all of that. You already have autocomplete, syntax highlighting and the color theme set up in your editor. Why not use that, instead of a shell?
Ideally, the result of an evaluation pops up right next to the cursor, so you don't have to do any context switches or lose focus. It can also be printed out in a separate frame right next to the code. This means that testing the code you currently write is at your fingertips.
Easy setup
With some help from IPython, it is possible to write, modify & evaluate Python code in a REPL Driven way. I would recommend to install IPython globally, to make it accessible from anywhere on your machine.
pip install ipython
Configure IPython to make it ready for REPL Driven Development:
You will probably find the configuration file here: ~/.ipython/profile_default/ipython_config.py
You are almost all set.
Emacs setup
Emacs is my favorite editor. I'm using a couple of Python specific packages to make life as a Python developer in general better, such as elpy. The auto-virtualenv package will also help out making REPL Driven Developer easier. It will find local virtual environments automatically and you can start coding without any python-path quirks.
Most importantly, set IPython as the default shell in Emacs. Have a look at my Emacs setup for the details.
VS Code setup
I am not a VS Code user. But I wanted to learn how well supported REPL Driven Development is in VS Code, so I added these extensions:
You would probably want to add keyboard shortcuts to get the true interactive feel of it. Here, I'm just trying things out by selecting code, right clicking and running it in an interactive window. It seems to work pretty well! I haven't figured out if the interactive window is picking up the global IPython config yet, or if it already refreshes a submodule when updated.
Current limitations
In Clojure, you connect to & modify an actually running program by re-evaluating the source code. That is a wonderful thing for the developer experience in general. I haven't been able to do that with Python, and believe Python would need something equivalent to NRepl to get that kind of magic powers.
Better than TDD
I practice REPL Driven Development in my daily Python work. For me, it has become a way to quickly verify if the code I currently write is working as expected. I usually think of this REPL driven thing as Test Driven Development Deluxe. Besides just evaluating the code, I often write short-lived code snippets to test out some functionality. By doing that, I can write code and test it interactively. Sometimes, these code snippets are converted permanent unit tests.
For a live demo, have a look at my five minute lightning talk from PyCon Sweden about REPL Driven Development in Python.
Never too late to learn
I remember it took me almost a year learning & developing Clojure before I actually "got it". Before that, I sometimes copied some code and pasted it into a REPL and then ran it. But that didn't give me a nice developer experience at all. Copy-pasting code is cumbersome and will often fail because of missing variables, functions or imports. Don't do that.
I remember the feeling when figuring out the REPL Driven Development workflow, I finally had understood how software development should be done. It took me about 20 years to get there. It is never too late to learn new things. 😁
File & folder structures - there's almost as many different variations as there are code repositories.
One common thing though, is that you'll probably find the utils folder in many of the code repos out there, regardless of programming language. That's the one containing the files that don't fit anywhere in the current project structure. It is also known as the helpers folder.
Organizing, sorting and structuring things is difficult. There's framework specific CLIs and tools that will create a nice setup for you, specialized for the needs of the current framework.
"There should be one-- and preferably only one --obvious way to do it."
The Zen of Python
Is there one folder structure to rule them all? Probably not, but I have tried out a way to organize code that is very simple, framework agnostic and scalable as projects grow.
Structure for simplicity
A good folder structure is one that makes it simple to reuse existing code and makes it easy to add new code. You shouldn't have to worry about these things. The thing I've tried out with success is very much inspired by the Polylith architecture. Polylith is a monorepo thing, but don't worry. This post isn't about monorepos at all, however this one is if you are interested in Python specific ones.
It's all about the components
The main takeaway here is to view code as small, reusable components, that ideally does one thing only. A component is not the same thing as a library. So, what's the difference?
A library is a full blown feature. A component can be a single function, or a parser. It can also be a thin wrapper around a third party tool.
"Simple is better than complex."
The Zen of Python
I think the idea of writing components is about changing mindset. It is about how to approach a problem and how to organize the code that solves a problem.
It shouldn't be too difficult to grasp for Python developers, though. For us Python devs, it's an everyday thing to write functions and have them in modules. Another useful thing, probably more common in library development, is to group the modules into packages.
Modules, packages, namespaces ... and components?
In Python, a file is a module. One or more modules in a folder becomes a package. A good thing with this is that the code will be namespaced when importing it. Where does the idea of components fit in here? Well, a component is a package. Simple as that.
"Namespaces are one honking great idea -- let's do more of those!"
The Zen of Python
To make a package easier to understand, you can add an interface. Interfaces are well supported in Python. Specifying the interface of a package in an __init__.py file is a great way to make the intention of the code clearer and easier to grasp. Maybe there's only one function that makes sense to use from the "outside"? That's when to use an interface for your component.
Make code reuse easy
When organizing code into simple components, you will quickly discover how easy it is to reuse it. Code is no longer hidden in some utils folder and you no longer need to duplicate existing private helper functions (because the refactoring might break things), if they already are organized as reusable components with clear and simple APIs. I usually think of components as LEGO bricks to select from when building features. You will most likely produce new LEGO bricks of various shapes along the way.
Well suited for large apps
At work, we have a couple of Python projects using this kind of structure. One of them is a FastAPI app with an entry point (we named it app.py) containing the public endpoints. The entry point is importing a bunch of components that does the actual work.
The repo contains about 80 Python files. Most of them are grouped into components (in total about 30 components). This particular project is about 3K lines of Python code, but other repos are much smaller with only a handful of components.
Perfect for functional programming
Even though it is not a requirement, organizing code into components fits very well with functional programming. Separating code into data, calculations and actions are well suited for the component thing described here in this post.
Don't forget to keep the components simple, and try to view them as LEGO bricks to be used from anywhere in the app. You'll have fun while doing it too.
A Python dictionary has a simple & well-known API. It is possible to merge data using a nice & minimalistic syntax, without mutating or worrying about state. You're probably not gonna need classes.
Hey, what's wrong with classes? 🤔
From what I've seen in Python, classes often add unnecessary complexity to code. Remember, the Python language is all about keeping it simple.
My impression is that in general, class and instance-based code feels like the proper way of coding: encapsulating data, inheriting features, exposing public methods & writing smart objects. The result is very often a lot of code, weird APIs (each one with its own) and not smart-enough objects. That kind of code quickly tend to be an obstacle. I guess that's when workarounds & hacks usually are added to the app.
What about Dataclasses?
Python dataclasses might be a good tradeoff between a heavy object with methods and the simple dictionary. You get typings and autocomplete. You can also create immutable data classes, and that's great! But you might miss the flexibility: the simplicity of merging, picking or omitting data from a dictionary. Letting data flow smoothly through your app.
Hey, what about Pydantic?
That's a really good choice for things like defining FastAPI endpoints. You'll get the typed data as OpenAPI docs for free.
I would as early as possible convert the Pydantic model to a dictionary (using the model.dict() function), or just pick the individual keys and pass those on to the rest of the app. By doing that, the rest of the app is not required to be aware of a domain specific type or some base class, created as workaround for the new set of problems introduced with custom types.
Just data. Keeping it simple.
What about the basic types? 🤔
That is certainly a tradeoff when using dictionaries, the data can be of any type and you will potentially get runtime errors. On the other hand, is that a real problem when using basic data types like dict, bool, str or int? For me, I can't remember that ever has been an issue.
But shouldn't data be private?
Classes are often used to separate public and private functionality. From my experience, explicitly making data and functionality private rarely adds value to a product. I think Python agrees with me about this. By default, all things in a Python module is public. I remember when learning about it, and the authors saying that’s okay because we’re all adults here. I very much liked that perspective!
Do you like Interfaces? 🤩
Yes! Especially when structuring code in modules and packages (more about that in the next section). Using __init__.py is a great way to make the intention of a small package clearer and easier to grasp. Maybe there's only one function that makes sense to use from the outside? That's where the package interface (aka the __init__.py file) feature fits in well.
Python files, modules, packages?
In Python, a file is a module. One or more modules in a folder is a package. One or more packages can be combined into an app. Using a package interface makes sense when structuring code in this way.
Keeping it simple. 😊
I'm finishing off this post with a quote from the past:
“Data is formless, shapeless, like water.
If you put data in a Clojure map, it becomes the map.
You put data in a Python list and it becomes the list.
Now data in a program can flow or it can crash.
Be data, my friend.”
Bruce Lee, 1940 - 1973
ps.
If neither Bruce or I convinced you about the great thing with simple data structures, maybe Rich Hickey will? Don't miss his "just use maps" talk!
This feels like a great day to uninstall Docker Desktop on my Mac, and try to go all-in Podman.
What is Podman?
" ... a daemonless container engine for developing, managing, and running OCI Containers ... "
Podman is an open source project and you can run containers as root, or as a non-privileged user. It supports the Docker API and can be used with tools like docker-compose.
What's wrong with docker?
Nothing, docker is a great tool! Free to use for Open Source. The Desktop app for Mac and Windows will continue to be free for small businesses, but require a license for bigger companies. For some time now, I have wanted to find out what the alternatives are to using docker.
Uninstalling docker
I ran docker system prune to delete all existing containers and images.
Then I uninstalled the desktop client. By selecting the "Troubleshoot" menu option I found an uninstall button and pressed it.
It required some research (aka googling) to figure out how and where to find an uninstall option.
As you can see in the video, the docker app is still there even if the Desktop client was uninstalled. I just deleted that one, and then deleted the docker-related files and folders that were still in my file system:
Installing podman
I followed the official install guideline to install podman. In addition to that, also installed the Mac OS X specific helper tool to be able to run docker-compose. I learned that from the print-out displayed when the brew install command was completed.
I can now run containers with podman! Here I'm starting a ZooKeeper server running in a container:
podman run --rm -p 2181:2181 zookeeper
Where's my commands?
But wait, my 🧠 is already wired to use the docker command. Do I have to unlearn things? Podman is compatible with the docker commands already, but you can also create an alias:
alias docker=podman
It is now possible to use the docker command as before!
Installing docker-compose
By installing the podman-mac-helper tool as described before in this post, it is possible to use podman with docker-compose.
You might wonder: didn't I uninstall all docker related things? 🤔
Yes, but the docker-compose tool is available to install separately & you'll find it on Homebrew. 🍻
brew install docker-compose
Quirks & workarounds
When trying all of this out, the podman command worked well for building and running containers. However, I got an error when running the docker-compose up command. That's kind of a dealbreaker, isn't it?
The error:
listing workers for Build: failed to list workers: Unavailable: connection error: desc = "transport: Error while dialing unable to upgrade to h2c, received 404"
As I understand it (from this conversation on Github), there is something not working as expected when using the latest version(s) of docker-compose and podman.
Luckily, there is a simple workaround to use until the bug is fixed. Prefixing the docker-compose command to disable the buildkit API that is causing the bug. Doing this will make things work very well.
I recently found out about Should I deploy today? and laughed. A humorous way of guidning us developers when we hesitate about deploying code to production. This one is extra funny on a Friday or Saturday. Every now and then, memes about (not) releasing code to production on Fridays appear on my Twitter timeline. There's even T-shirts for sale, with No Deploy Fridays printed on them.
Funny because it's true?
I totally get the idea of minimizing the chance of ruining your or someone else's weekend. There's also a potential risk with postponing & stacking up features. The things we develop are there to add value, why make users wait for it?
Ship it!
Fridays shouldn't be different than any of the other working days, at least when it comes to deploying code to production. Just like on a Tuesday, you should probably think twice about deploying stuff just before you'll call it a day. When a fresh release has started causing errors for users, you might have already checked out from the office with loud music in the headphones, the mind focused on something else and unaware of the mess you've left to friends on call.
Deploy with some slack
It's a good idea to monitor the live environment for a while, making sure the deploy has gone well. Keep an eye on it, while working on the next thing. Prepare to act on alerts, errors appearing in your logs or messages in the company Slack.
Have a browser tab with graphs monitoring memory consumption, CPU usage and response times. For how long and what metrics to monitor is depending on the product. From my experience, being attentive to things like this is mostly good enough. When I call it a day a couple of hours later, I can go ahead and play that loud music in my headphones and relax.
When things go 💥
I haven't fact checked this - but unless a deploy is about database schema changes or to an offline machine, it should be relatively easy to roll back. Most deploys are probably easy to undo. If not, find out why and try making it easier.
I would suggest teams to practice: commit, push, merge, release to production and rollback. You'll probably find infrastructure related things to fix or tweak, maybe there's some automation missing? The team will know what to do when things go boom for real.
I've been fortunate working in environments with the basic automation already in place. We, the autonomous dev team, are the ones responsible for pushing features to production. We do that on a daily basis. Before, we hesitated about the deploy-on-a-Friday thing. Now we have better trust in our ways of working, and this Friday we deployed new features, dev experience improvements and bug fixes. Several small deployments went live, the last one about 3-4 hours before we called it a weekend 😀
Automate it
A good deployment process will stop and let the team know about when things break. The obvious code related issues will be caught this way. So, automation will get you far, but not all the way. That's why we'll need the monitoring. More importantly, releasing small portions of code changes is much safer than the ones with lot of code changes in it. Release small and release often.
What would slackbot do?
These days, when I hesitate if a new feature should be deployed right now or not, I'll just ask slackbot what to do.
For software development in general, it seems to be an ongoing trend towards using Monorepos. You know, organizing code in one big single repo. On the other hand, it seems to be a general trend in going the opposite way too. Developing features into many small repos.
I think that the JavaScript and Python communities in particular, are in favor of the latter approach. Developing features in isolation - in separate repos - and publish versioned libraries to package repositories (such as npm and pypi). That has potential to introduce some headache: versioning, backwards compatibility and keeping dependencies up to date. Probably also duplication of common source code and repeating the deployment infrastructure. You can solve those kind of issues, by using a Monorepo.
🔥 Monorepos
There are already solutions out there on how to develop software in Monorepos. It is my impression that they are mainly about solving the deployment experience. Which is good, because it is a huge problem in a lot of code bases out there. What about the Developer experience?
In the Clojure community, we have a thing called Polylith. It is an architecture, (including a tool) that is developed by Joakim Tengstrand, Furkan Bayraktar and James Trunk. The Polylith architecture is focusing on making both the Developer & Deployment experience great. There is also a very nice tool to help you get started with Polylith. Helping you create & keeping track on components, projects and much more.
"... Polylith is a software architecture that applies functional thinking at the system scale. It helps us build simple, maintainable, testable, and scalable backend systems. ..."
A Polylith code-base is structured in a components-first architecture. Similar to LEGO, components are building blocks. A component can be shared across apps, tools, libraries, serverless functions and services. The components live in the same repository; a Polylith monorepo. The Polylith architecture is becoming popular in the Clojure community.
🐍 What about Python?
Again, I find myself copying some existing Python code - a small module creating a "logger" - into a new repo. I think it’s the third time I do this. The logger code is too tiny to even bother packaging as a library, but still very useful in many code projects.
My mind is wandering. I'm daydreaming:
"What if we had Polylith in Python. I would just simply reuse an existing component."
✨We have Polylith in Python now✨
Porting Polylith to Python
A couple of weeks ago, I decided to give porting Polylith to Python a try. I quickly realized that there are some fundamental differences between Python and Clojure to be aware of when implementing an architecture (and developing a tool) like this.
Especially when it comes to namespacing and packaging. Python code is based on the concept of modules. Also, Python code is not compiled into a binary.
Short note on modules, packages, namespaces & libraries
In Python, a file is a module. One or more modules in a folder is a package. If you put a package in a folder, it is now namespaced. One or more packages can also be combined and built into a library to be used as a third party dependency. From my perspective, a Polylith component in Python should be a namespaced package, and not a library.
Almost like Poetry
When trying to figure out how to solve modules, paths & packaging problems I found Poetry. It is a tool that makes Python packaging and dependency management really easy. Out of the box, Poetry supports project based dependencies and references only. But the latest version of Poetry - currently in preview - has support for third party plugins. So I developed a plugin that enables the concept of workspaces, making it possible to reference dependencies outside of a project boundary.
This was the first step. A very important step. Now it became possible to build tooling support for a Python implementation of the Polylith architecture. Earlier this week I released a first version of such a tool. It is a new Poetry plugin, containing the very basic tooling support for the Polylith architecture.
poetry-polylith-plugin
This brand new Poetry plugin will help you create a Polylith workspace and add components, bases & projects to it. Poetry itself will handle the building & packaging. It is still in early development, but you should be able to develop apps, serverless functions and services with it.
As of today, I wouldn’t (at least not yet) recommend building libraries with it, mostly because of how the packaging is done in Poetry & the nature of Python modules. I’ll try to figure out a way to solve this in the future.
There’s also a lot missing & left to do compared to the original poly tool. But hey, you gotta start somewhere 😀
Check out the 10 minute video for a quick overview.
Learning about Polylith
I recommend to read the official docs to learn more about what Polylith is (and what it isn’t) - even though the code examples are about Clojure. You can easily bring the ideas into a Python context, now that we have tooling support for it.
My intentions with this is to introduce the Polylith architecture to the Python community, and I very much hope that it will be useful in your daily work as a developer.