The way you organise your teams' code seems like a trivial topic. But as I've discovered after many projects, simple decisions on how this is done have a big impact on how fast you can make changes, get them released and how well developers can communicate and collaborate. A big part of this is whether you go for polyrepo or monorepo.
- Polyrepo is when multiple source control repositories are used for each service and client. In a microservices project with multiple front ends (web, mobile, etc.), this can result in many repositories.
- A Monorepo is when all services and front end codebases are kept in a single repository.
Google, Facebook and Twitter are well known for using a monorepo at ridiculously large scale. Of course, it’s never a good idea to just go with an approach because Google/Facebook/Twitter said so. Instead, as with everything, measure how this impacts you and make the decision that works well for your organisation.
I used to be a big fan of keeping every module, service or library in a separate source repository (polyrepo). It made logical sense to me. Each module could be released separately and versioned separately. In larger companies, it seemed easier when teams had separate repositories for their separate areas of responsibility. In theory, they were less likely to conflict with each other and step on each others' toes.
In the last year, I've changed this stance, especially when it comes to getting started quickly on microservice/serverless projects. Previously, I spent too much time managing the coordination across multiple repositories. For microservice projects, the overhead isn't too bad at the start, but quickly gets out of hand as you add more services, libraries and dependencies. To be honest, I always felt like we were spending time creating and managing repos that could be saved by moving to a monorepo. The cost of polyrepo for smaller organisations was measurable. We often built scripts and tools to make this easier but that tooling just takes time away from building meaningful product. So what was holding me back?
Well, part of it was stubbornness. Developers often find ourselves standing our ground on positions we argued for long after our belief in them has faded. This affliction might be a way of coping with the huge number of choices available to us and the innate need to categorize everything into 'the right choice' or 'the wrong choice'. It's a habit I try to break by measuring how decisions cost us in time and money and making changes based on the data.
There was also one technical detail that had me stuck in the polyrepo camp too. The continuous integration and deployment (CI/CD) process is triggered by commits to the repo and I didn't want everything to be built on each commit from any module. It would be a waste of time and resources.
I asked around and read up to see what other people were doing. A few years into the popular wave of microservices, there were good posts on this. The Shippable post (link below) was one that helped me flick the mental switch the most. It detailed how a simple change detection script can allow a monorepo to work nicely with CI/CD. It detects what modules/microservices have changed and only builds what's necessary. It's that simple. Once you have that in place, everything else gets much simpler.
“A new developer should be able to start working on your product as quickly as possible. Avoid unnecessary ceremony and any learning curve that is unique to your team or company.”
When a bug fix or feature affects multiple modules/microservices, all changes are made in the same repository. There is just one branch on a single repository. No more tracking across multiple repositories. Each feature gets a single Pull Request! There is no risk that the feature is going to be partially merged.
A new developer should be able to start working on your product as quickly as possible. Avoid unnecessary ceremony and any learning curve that is unique to your team or company. Having all code for your product in one repository is way to help people get started, whether they are a new colleague or someone who needs to quickly get going again on a new laptop!
By sticking with single repository, your external tests (end-to-end or API tests) also belong with the code under test. The same goes for Infrastructure-as-Code. Any changes required in your infrastructure are captured together with the application code. If you have common code, utilities and libraries that are consumed by your microservices, keeping them in the same repository makes it quite easy to share. It avoids having to publish them to external repositories with the associated key management headaches. Of course, if you can open source those libraries in their own public repo, all the better!
Whether you choose monorepo or polyrepo is up to you. There are plenty of good arguments for polyrepo too, particularly if you’re dealing with very large codebases comprised of multiple, distinct, products. The point is to ensure your repo structure facilitates faster development and rarely gets in the way.
We have been working on putting all our best practices for building production Serverless microservice applications (including monorepo!) into an open source project called SLIC Starter. It's a complete application with examples on many of the fundamental but difficult-to-make decisions when you are building products with serverless. You can find it here.