There has been a lot of talk about Microservice architectures recently and I have read quite a bit about them so thought I would share some of my notes on the subject.

This article covers some basic concepts and aims to point out the important areas to consider before considering Microservices.

What is a Microservice?

In computing, Microservices is a software architecture style in which complex applications are composed of small, independent processes communicating with each other using language-agnostic APIs.

So says Wikipedia, which for me boils down to what we traditionally thought of as SOA but with smaller services. Many of the challenges implementing SOA are shared with Microservices. These being mainly fault-tolerance between services, organisational structure, and deployment complexity.

When to Microservice

So first things first, stop and think. The draw of Microservices is that we can ideally achieve a loosely coupled scalable system, however always remember you are inevitably swapping old problems for new ones and that to succeed I believe you need a supportive organisational structure as well as a technical one.

If you have scaling problems, a large team, an organisation that is ready, and the technical/operations support needed, Microservices will probably serve you well. I think you need ALL of these though to truly extract the benefit.

The rest of this article covers some of the things to consider when evaluating SOA/Microservice architectures.

Monoliths

All hail the Monolith! Martin Fowler did a great article on Monolith First approach to Microservices. I tend to agree with what is outlined there. For me it also brings up an important point that if you cannot manage to split up your Monolith, then you would probably fail at Microservices and having a Monolith to start with is a good tool to evaluate your team/tech against.

Conway's Law

I don't think it's legal to publish an article about SOA/MS without mentioning Conway's Law, so here it is. The law coined by Melvin Conway in 1967 states that:

Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations.

For Microservices this is a very important factor, services need to respect strict boundaries and this in turn means you need these boundaries to run through the organisation structure. Without this (and following the law) we end up with systems that bleed design/information/components between these boundaries.

This brings us onto the concept of the Bounded Context, which sort of runs alongside Conway's Law. This is a concept from the Domain Driven Design book/community that promotes designs that have boundaries between the various models in the system. These boundaries have consistency guarantee for things within them but not from the outside, they also don't share data/logic between boundaries.

As you can see this maps rather well to a "service" and I don't believe you can really succeed at Microservices without understanding and following the concepts of Bounded Context. For Microservices this means that each service should:

  • Enforce its own consistency
  • Have its own datastore
  • Have its own team and defined boundaries
  • Be isolated from other services

Depending on team/org size I would imagine that a team per service may not be practical but grouping the org teams and services together would be a pragmatic compromise, the important thing would be that the groups reflect organisational structure as well as the technical one and promote the boundaries.

Monitoring

One of the major tools you need for successful Microservices is good monitoring, without good insight into the health of your services and the wider system you are doomed to fail. Now this is a HUGE subject in itself, some of the interesting solutions include Netflix Atlas, AppDyanmics APM, and Twitter Zipkin. There are obviously many more and custom solutions may also work, but I would certainly read about monitoring heavily before switching to Microservices.

Also here are some good posts on why some metrics might not be what you expect:

Deployment

Deployment is hardâ„¢ and with Microservices you have a whole heap of deployment to do. With a system that could easy have 100's of different services running, you need a way to validate, deploy, and rollback new/updated services. There are probably a million different ways to skin this cat, but I think a Continuous Delivery approach would probably be the most successful. There are tools like GO that look very promising in this area, but Jenkins also has features for pipelining. Again this is a huge area to cover, but really important to consider before jumping in, without a good deployment system that the entire team can own managing all your new services will drag you down.

Testing

So I mentioned validation in the Deployment section, this is obviously needed to make sure when we deploy a service that all dependent services still work, and we do this via some sort of automated test.

Traditional approach might be to have an integration test suite that tests your service against each of its dependents. However with that approach the team building the service then needs to know how to bootstrap/deploy those services in a test environment and when we are aiming for self contained services and teams becomes a bit of a problem.

One really interesting concept I found on this was Consumer-Driven Contracts. The basic idea is that a consumer can create a contract that defines how it uses a service, this includes the request/response it expects from the service. The service can then have a set of contracts it can check whenever a change is made without needing real instances of the consumers.

There are two OS projects that provide a system for doing this and can be integrated into your CI/CD workflow, Pact and Pacto, both are worth a look at.

You still end up with basically an integration test suite, but without having to worry about setting up all those consumer services. And as with all integration testing you still have challenges of configuring the service under test and getting data into that service to test with.

Fault-Tolerance

Once you have services running they are going to fail, when that happens we need to recover gracefully. Again this is a HUGE subject in itself, so lets look at some of the common problems/tools needed to provide fault-tolerance.

Timeouts

These always seems simple until you start to really look at the affect timeouts have on a system as a whole. Without proper thought a service calling multiple services with bad timeouts can easily cascade into a waiting game that creates a large response time.

For example say you have a service that calls three other services, each has a default timeout of 30s, if two of those services are down or overloaded that is one minute waiting for a failure condition.

The general rule of thumb is to fail-fast, so we set low timeouts. However it's not as easy as that when we get to slow services, by slow I don't mean overloaded, I mean slower than the system norm. In this case we cannot set an arbitrary timeout across the system because those slow services won't ever return. To deal with this we could just up the timeout for these services, which is basically what you do but I would also suggest taking a data-driven approach and using metrics to decide the best timeout for each service based on real data. This way we keep a good balance between things working and cascading timeouts.

Circuit Breaking

Next up is handling failed services, there is no point trying a service that is down already. This is where the circuit breaker pattern comes in. The basic idea of a circuit breaker is to record the state of services and skip services that have failed over a threshold. The hard part with this is deciding on failure thresholds, again to set these you need decent metrics about your services and set thresholds based on that data.

Netflix Hystrix is an interesting OOS project worth looking at, also ODesks Phystrix for PHP folks.

Testing

To get consistent fault-tolerance we also need to be able to have automated tests for our services, and as our services generally rely on the network having tools that can simulate bad networks is important. Some nice tools for this are:

Conclusion

Hopefully I have covered some interesting points on Microservice architectures and the challenges to consider before moving them. The important thing for me was while some sell Microservices as a silver bullet, in reality you are switching from one problem domain to another, a monolith has its problems but so do Microservices. To move to Microservices you really need a driving factor coming from the business, if you have that, the additional challenges are worth taking on. If you don't then stick with what you already know. That said, I think Microservices will play a big role in the future (possibly with containers) and as the idea and tooling matures we could see more mainstream adoption.