DevOps - Smaller Builds

As stated in a previous article, one of the goals we are focusing on right now is deploying to pre-production every 45 minutes. At first blush, that seems a bit fast. When I broached it with my team they (rightfully) pointed our current CI build takes 15 minutes to build, deploy to development, run automated tests, and then deploy to testing. Excellent point.

It got me thinking, what exactly are we building on every check-in? If a component didn't change why go through the exercise of rebuilding it and redeploying it? Could the current build be broken apart into smaller builds and only build when that component changes? Could that provide enough of a speed boost to get to production faster?

After some research we decided to do just that, break up the main CI build into smaller builds. Each focused on building a specific component of the application. We didn't just do it willy-nilly, we tried some configurations, saw how it went for a few days and adjusted. This article will hopefully show you the thought process for each decision so hopefully, you can apply a similar idea yourself.

The Architecture

First up I wanted to examine what the build is currently doing. But to do that it is important to know the architecture of the system my team and I are working on. The architecture follows a standard N-Tier Architecture Model, with the User Interfaces communicating with the business logic via RESTful services.

  • Web UI - what all users interact with, written using Angular 1.x as a single page application or SPA. Initial rendering handled via ASP.NET MVC, but that is really just a host for an "ass-load" of JavaScript Files
  • API - a RESTful service hosted using ASP.NET WebApi. The user interface interacts make multiple requests, almost all business logic is housed here, all database access and other service interaction occur on the API.
  • External API - another RESTful service with endpoints for other systems to interact with the loan origination system. Not much churn on this service.
  • Windows Service Quartz .NET Scheduler (BatchRunner) - performs several automated tasks on a schedule. This service simply hits the API, the only business logic is the scheduling and error handling. For all intents and purposes, it is a dumb service. It is completely isolated.
  • Notification Service (Subscriber) - sits on top of an MSMQ. Our customer system needs to notify all the systems when a change to a customer occurs. MSMQ was chosen because of its reliability and reasonable guarantee all customer changes will be processed in the order in which they are made. Simply invokes a stored procedure to update loans customers are tied to. It too is completely isolated.
  • Core Layer - a C# project shared across all the applications. Contains all DTOs, logic to communicate with RESTful services, and interfaces to our configuration store and logging
  • Database - pretty self-explanatory.

All of the source code was housed in a single folder called "source" (I am not very imaginative). At a high level, the project structure was

  • Root
    • Database
    • Fitnesse
    • Source
      • API
      • External API
      • Notification Service
      • Quartz Scheduler
      • Web UI
      • Core
      • .sln file

All the projects, including their respective test projects, were housed inside a single solution.

Octopus Deploy-Setup

When we were collaborating with the web admins on how we should setup Octopus Deploy we determined it would be best if each component got its own Octopus Deploy project. The reasoning behind that was in the event of an emergency we could then deploy the broken application and leave the rest alone. What that means is our application has six Octopus Deploy projects, one for each component.

  1. UI
  2. API
  3. External API
  4. Quartz .NET Windows Service
  5. Notification Service
  6. Database

Initial Build Setup

When the code was checked into master the VSTS build would take over, build the solution, package and create releases for each six Octopus Deploy projects. Each build would run all the unit tests, both C# and JavaScript, as well as the full suite of service integration tests using Fitnesse. It is no surprise it took 15 minutes to build all that, deploy, and test it.

Change Frequency

With all that in mind, I took at look at the change log for the past several weeks. It was rather interesting.

  • Web UI - changed 8 - 12 times a day
  • API - changed 10 - 20 times a day
  • External API - shares same core logic as the API, so it changed 10 - 20 times a day
  • Windows Service Quartz .NET Scheduler - changed once in the last three weeks
  • Notification Service - changed once in the last four weeks
  • Database - changes 1 - 8 times a day
  • Fitnesse Tests - changes once every three days

Time Spent on Each Component

Looking through the logs it is easy to see how much each component takes in the build process (this includes building and deploying).

  • Web UI - 69 seconds
  • API - 63 seconds
  • External API - 65 seconds
  • Windows Service Quartz .NET Scheduler - 85 seconds
  • Notification Service - 62 seconds
  • Database - 180 seconds
  • Fitnesse Tests - 360 seconds

Breaking apart the builds

As stated before, we are using VSTS as the build tool. One of the newer features added in the past six months or so is the ability to set the trigger on a CI build to only fire when a file in a certain folder is changed.

Knowing that certain components rarely changed, it made more sense to break those out into their own solutions. To keep things separated (and hopefully less confusing), those pieces would be moved into a new folder called "Components" and each component would have its own solution file.

  • Root
    • Components
      • Notification Service
        • .sln file for Notification Service
      • Quartz Scheduler
        • .sln file for Scheduler
    • Database
    • Fitnesse
    • Source
      • API
      • External API
      • Quartz Scheduler
      • Web UI
      • Core
      • .sln file

Instead of having one main CI build I ended up with three after this exercise.

  1. Main_CI Build
  2. Looks for changes in /Source, /Fitnesse and /Database
  3. Builds and deploys items just from those folders
  4. NotificationService_CI Build
  5. Looks for changes in /Components/Notification Service folder
  6. Builds and deploy just that component
  7. QuartzScheduler_CI Build
  8. Looks for changes in /Components/Quartz Scheduler folder
  9. Builds and deploy just that component

Building Fitnesse tests take time. I'm not going to go into the ins and outs of Fitnesse, but know that it took the time to do it but there wasn't a lot of churn. Looking at the logs I knew it would take around 120 seconds to build and deploy fitnesse.

But it would take over 3 minutes to get to that point in the main CI build because of all the steps required.

That enabled me to also to get that chunk of work out of the Main_CI build. After this step, I now had four builds.

  1. Main_CI Build
  2. Looks for changes in /Source and /Database
  3. Builds and deploys items just from those folders
  4. Runs Fitnesse tests after deployment to the development environment.
  5. NotificationService_CI Build
  6. Looks for changes in /Components/Notification Service folder
  7. Builds and deploy just that component
  8. QuartzScheduler_CI Build
  9. Looks for changes in /Components/Quartz Scheduler folder
  10. Builds and deploy just that component
  11. Fitnesse_CI build
  12. Looks for changes in the /Fitnesse folder
  13. Builds and deploys just fitnesse tests

Fitnesse Test Runs

Our CI build runs Fitnesse tests as a service integration test in the development environment. If everything passes then it automatically promotes the project up to the test environment where QA and Business people can test the changes. One year ago, in December of 2015, we had no Fitnesse tests. At the time of this writing that has grown to over 300 unique tests and 15,000+ assertions. The runtime for all these tests has increased to 5 minutes, or about 1 test a second. Each test is responsible for setting up and tearing down the data needed for the test. This ensures we get consistent results.

The point of these tests is to ensure we didn't check-in something that will stop testers from testing. We really didn't need to run the full suite to determine that. We selected 40 or so tests we consider good smoke tests to run as part of the CI build. We then set up a scheduled build that will run all tests every two hours.

  1. Main_CI Build
  2. Looks for changes in /Source and /Database
  3. Builds and deploys items just from those folders
  4. Runs subset of Fitnesse tests after deployment to the development environment.
  5. NotificationService_CI Build
  6. Looks for changes in /Components/Notification Service folder
  7. Builds and deploy just that component
  8. QuartzScheduler_CI Build
  9. Looks for changes in /Components/Quartz Scheduler folder
  10. Builds and deploy just that component
  11. Fitnesse_CI build
  12. Looks for changes in the /Fitnesse folder
  13. Builds and deploys just fitnesse tests
  14. Fitnesse_Scheduled build
  15. Runs all fitnesse tests

Successes

These changes have gotten the build time down from 15 minutes to 7 minutes. That might not seem like a whole lot, but consider this, the most builds we could do in an hour was 4. That has now been doubled to 8. We have also managed to ensure quality is maintained. We can respond to issues quicker.

We have also reduced the amount of items needing to be deployed to production. If something didn't change, don't deploy it. Before these changes were made we would have to look through the changelog. And we were afraid of missing something so we just decided to deploy the whole stack. Now we can see in Octopus Deploy's dashboard certain components haven't changed since the last deployment, no need to deploy them again.

Pitfalls

It is not all rainbows and sunshine.

I got a little to separation happy and broke the UI and Database out from the main CI build. That made things very confusing for other people on my team. They were used to the full stack being built and deployed. I think of it like normalizing the database. I went all the way to 4th normal form when I should've stopped at 1st or 2nd.

Also, all the components in our application share a core library consisting of DTOs and logic for configuration, logging and connecting to other RESTful services. At the time of this writing, each component's solution references that core library which is currently stored in the "Source" folder. I'm still working through options to decouple the components from one another. But it is something to keep in mind if you go this route.

Conclusion

Overall, people on my team are happy this change was made. They can get bug fixes and new features up to the business to look so we can get faster feedback. I only wish I knew about VSTS's include folder in their trigger a lot sooner so this could have been implemented when we were in a time crunch.

If you do decide to do this I recommend discussing it with your team so they are aware of changes to the build process. It is very confusing and frustrating to have something you thought would build not build, or have the appearance of not building. The easiest way to get a win is to find a component that rarely changes and move it out of the main solution. That gave me the freedom to experiment and learn what needs to change in our architecture in a low-risk part of the code. If anything went south it was easy for me to roll back my changes.

This was just my first step making our builds smaller. My next goal is to start slicing up the application into smaller pieces. Doing that will take collaboration between the business, operations (so they know what is about to happen), and QA to ensure I don't mess something up. As I am able to do that, I will provide further updates on this topic.