A collection of coding conventions
In this post, I have collected a few conventions and best practices that I try to follow. Most of these are well known to developers, sometimes have additions that I found practical in tricky situations.
Configuration and architecture
Setting up your application to be scalable and robust in a modern containerized cloud environment is extremely important. One of the best collections of rules or advice on this topic is provided by "The Twelve-Factor App". Following this guide is an absolute must, which I'm sure many of you do already without even knowing about this particular collection. The 12 Factors to follow are:
- Store your code in version control, deploy to multiple environments from a single codebase
- Define and manage dependencies explicitly (like with you do Gradle)
- The configuration should be external to your code and bound to the environment it runs in (dev, prod, Kubernetes config maps for example or simple env vars)
- Treat backing services as attached resources, loosely coupled
- Separate build release and running stages of your app strictly, use CI
- Instances of your application should be run as independent and stateless processes
- Application should be completely self-contained and not rely on external environments. It should expose its services through port binding
- Use the process model for concurrency situations, scale horizontally
- Make the app robust with quick startup and graceful shutdown. Share nothing amongst instances. Super useful in a distributed containerized environment
- Keep your various environments (dev, staging, prod, etc) as similar as possible
- Treat logs as a stream of events
- Admin and management code and tasks should be handled like any other production part of your app
Most of these are familiar, right? For details on the above points check out the official website below, with descriptions providing further insights and reasoning.
Versioning your app
Versioning a single app
One can use many different approaches, and software version conventions to keep track of the application version, from random numbers to elaborate schemes. Sofware versioning can be a difficult topic, I saw two main ways that people often use.
Date based
Some choose to version their application based on the date of the release. This is often used when the releases happen on a schedule. For example, JetBrains has multiple major releases of their popular IDEs a year and they use versions like 2022.1, 2022.2, ...
This doesn't provide too much resolution so a patch version number is added as you can see on the screenshot above: 2021.2.3. This indicates that this is the 2nd release in the year 2021 and the 3rd patch for that. There is also a separate build number that is even more precise but separate from the version number.
Semantic versioning
Semver or semantic versioning uses a simple MAJOR.MINOR.PATCH pattern with clear rules when to increase each number. A version number following semver principles looks like this: 1.12.4
Each number can go from 0 to infinity basically. The rules on when to increase which part in short:
- If you add breaking API changes (by API we can mean every input/output data format) you must increase the MAJOR version number and zero out the others.
- Adding backward-compatible changes, new functions should trigger an increase in MINOR version number, and zero out PATCH.
- Bugfixes will increase the PATCH version
You can also postfix pre-release versions like 1.1.0-alpha
, 1.1.0-0.2
, etc or with a + sign add build information, like build date or git commit id like 1.1.0+6ca55f4
.
There is much more detail and rules that you can check out on the link below.
Versioning a complex system
It is all nice and easy if you only have a single backend or frontend that you have to version, but what if you have a complex system of microservices like many software products nowadays. You may want or need to communicate an overall system version to your customers and, surely, you can't give out a list of 30 version numbers for each of the services present in the system.
What I usually do in a situation like this is to introduce a single version number for the entire application, and keep track of the component versions for this. I can have these mappings in the background and I am free to represent the application as a single version to customers. If I get feedback on a version I can easily look up the service versions for that given release. If any service changes I change the main version number too. Keeping track of these mappings and main version changes can be done by hand but it is better to automate it with the CI/CD pipeline.
This method is limited you can do this with 10-30 services, but if you have hundreds of engineers working on hundreds of services making 20 or more releases a day this won't scale. In these situations, I think it is better not to provide version information to customers at all. In case of any issues, you will rely on your observability (Rollbar for example is a particularly useful tool for tracking releases and exceptions), your CI/CD pipelines, and git to know what versions were involved in the issue.
Using git
Commit messages
There are a few conventions out there but the best I know of, and that I like the most is written by @cbeams. It defines 7 main rules that I mostly try to follow:
- Separate subject from body with a blank line (note: it is a must)
- Limit the subject line to 50 characters (note: I tend to be a bit flexible with this, but try not to be a lot longer)
- Capitalize the subject line (note: it is a must)
- Do not end the subject line with a period (note: it is a must)
- Use the imperative mood in the subject line (note: it is a must)
- Wrap the body at 72 characters (note: see the next point, I rarely use the body)
- Use the body to explain what and why vs. how (note: I usually skip this, and summarize the changes in the pull request description. You can argue with this since PRs are only present in fancy git UIs like GitHub or GitLab. I have never seen anybody using the terminal only, especially if working as a team PRs and MRs are the way we collaborate, so I don't see this as a huge issue.)
I suggest reading the whole article to get a grasp on this. A good convention to follow.
Using branches, pull requests, merging
Everyone uses git differently, but it is a good thing to agree on a branching strategy. Lucky for us, Atlassian has some awesome descriptions of some of the most used methodologies.
GitFlow works great on bigger more complex projects, I think simple feature branches can go a long way too.
Some key things I usually do:
- The master branch is always protected, no direct commits
- Feature branch naming goes like
<ticket id>-<short description>
, for e.g:BIZREA-123-fix-3rdparty-connection
- Squash and merge to keep the
master
/develop
clean, this way for every pull request we will have a single commit - Use
pull
ormerge requests
depending on the platform, with details on what changes this feature brings.
CI/CD
Always use CI/CD. I have heard from an SRE engineer at Google that one of the sayings around there is "Automate yourself away" which I highly agree with. Automating as much work as possible makes us able to focus on what matters. Creating proper CI/CD pipelines or setting up automation in Jira for your tasks to be closed on PR merge is the easiest way to start on this path.
Major git platforms like GitHub and GitLab both have built-in CI/CD engines with free minutes available for your projects. Probably these are the easiest, to start with, and are also well integrated with their other tools. There are other solutions as well like CircleCI, Jenkins, TeamCity, and a thousand more.
CI/CD makes sure that the code you are about to publish:
- Can be built
- Passes the style checks (see later)
- Does okay on static code analysis (SonarQube for example)
- Passes the unit, integrations, and other tests
Furthermore, it can:
- Publish artifacts like JARs or Docker images to repositories
- Deploy to various environments
At this point also have to mention GitOps. GitOps is an advanced framework that combines the way we use git, our DevOps principles, and CI/CD capabilities. It also helps manage infrastructure and environments. If you are interested in these topics it is worth a read.
Changelog
Keeping track of what has been changed with each release is quite important. One way of doing this is to try to figure it out from the commits since the last release but this is not too user-friendly. It is better to keep a changelog. There are some options here.
The first one is that we can utilize GitHub releases or the relevant features in other platforms to keep track of releases and changes. These utilities can often generate a changelog from the pull requests since the last release. These look nice and are well integrated into the platform (at least the GitHub one, which I like). The downside is that this solution causes vendor lock-in since it is not portable between git platforms.
The second is to keep track of the changes in a dedicated changelog file. What should a good changelog contain? I think a very good answer is provided by Keep a Changelog. For every release, we should create a new section in our CHANGELOG.md
or similar file. These sections should contain the:
- version
- release date
- a subsection
Added
for new features. - a subsection
Changed
for changes in existing functionality. - a subsection
Deprecated
for soon-to-be removed features. - a subsection
Removed
for now removed features. - a subsection
Fixed
for any bug fixes. - a subsection
Security
in case of vulnerabilities.
You can read further details on the link below.
Style
Code style is a thing that is worth agreeing on with the team when we start a new project. There are many styles we can follow, or we can come up with our own. Two things however should be done in any case.
Choosing a style and making the IDE know it
The first is to make the IDE know about the formatting we want to use. This way auto code formatting will know what we want, and we can evade things like a single commit completely reformatting the whole codebase because of not paying attention. Most IDEs have their way to set up these rules but there is a better solution. Enter EditorConfig.
EditorConfig describes a file format that contains the rules we want to follow. We should have this config file in our project git repository. Most IDEs either have built-in support or have plugins to be able to read this file and act accordingly.
Choosing the correct style is up to us. Google has some style guides for quite a lot of languages that could be a good starting point.
The two rules I usually modify are line length and indentation.
In the Turbo Pascal days, line length was 80 characters, now around 120 seems to be a standard, but with our nice, wide, high-resolution monitors I don't take this too strictly.
Indentation is 4 spaces for me. 2 spaces is just not enough, it is hard to see the levels.
Enforcing the style
Great now we know what we want our code to look like, but we have to make sure everyone stays in line with our new rules. Checkstyle is a nice little tool that can help with this.
Similarly to EditorConfig, we can describe our rules (or download them from the internet) in an XML config file. After this, we can add Checkstyle to our builds with Maven or Gradle to check code style too. The good thing about this is that we can run this on CI too!
We should look at these two tools as EditorConfig is a handy one to configure the IDEs across platforms, and even across IDEs to use the same formatting rules. While Checkstyle is there to enforce all this. It has more options to configure and in the end, it will decide if our code complies or not.