Linkerd (linker-dee) is an RPC proxy for dealing with production microservices. It claims to help with command and control of services and deals with things like routing, load balancing and timeouts (to name a few), this blog post details my initial experiments with Linkerd and I will hopefully do some follow-ups covering different use-cases.

Design

To test out Linkerd I first needed to come up with a semi decent test case, for this I chose a exchange rate conversion utility as its fairly well known and lightweight.

The app will have two services the currency converter and the exchange rate provider, additionally it will have a web frontend that calls the services.

Figure 1

Implementation

To actually implement this I decided to give Rancher1 and Docker2 a try, Docker seemed like a decent way to package for services and Rancher looked like it would give me a more production like environment to play with.

This gives us something like:

Rancher hosts four docker containers for us and in theory allows us to scale by adding more hosts or scaling up individual services, I still need to try that out.

Linkerd

With the basic infrastructure handled by Rancher and Docker next up was configuring Linkerd. As Linkerd is a proxy it's going to act as our internal network router for the app, so all communication between services will be handled via Linkerd.

There are a few different options when doing this, the simplest way is to have a single Linkerd instance that all services talk to, however we then need to deal with that as a single point of failure (failure = total loss of services). We could then go for an instance per host and then loss of Linkerd is the same as losing a single host. What I went for though was a sidecar approach, with this we have a Linkerd instance per service so failure will be the same as losing a single service.

This looks like this:

As we can see all communication now happens between the Linkerd instances and any failure should be localised to the specific service instance meaning another instance can take over. Linkerd also provides load balancing so dead instances should be detected and taken out of rotation pretty quickly.

There are a couple of drawbacks to note, mainly running more instances means more to manage and that you will be using more memory. With memory though I used the 32bit JVM from here and saw about 130MB per container.

OK, so there is one more aspect to decide when configuring Linkerd and that is how the sidecar works. We can choose from either service-to-linkerd, linkerd-to-service or linkerd-to-linkerd (docs), I decided to go for linkerd-to-linkerd as it gives you latency metrics for both incoming and outgoing traffic.

This then looks like this:

Here we have a single instance of Linkerd per service but two routers configured one for ingress and one for egress. Note that web does not have an ingress config because its not a full service it just calls other services.

By doing this we now have metrics for all latencies and load balancing for all communications, at the cost of a bit more complexity and memory usage.

Configuration

The config for the routers looks like this (basically what was in the docs):

- protocol: http
  label: incoming
  servers:
  - port: 4140
    ip: 0.0.0.0
  baseDtab: |
    /local => /$/inet/127.1/80;
    /http/1.1/*/* => /local ;
- protocol: http
  label: outgoing
  servers:
  - port: 4141
    ip: 0.0.0.0
  baseDtab: |
    /http/1.1/* => /#/io.l5d.consul/dc1;

The incoming router listens on port 4140 and routes all http traffic to localhost on port 80. The outgoing router listens on port 4141 and routes all http traffic to the consul namer (more on this next). The service will make all requests via localhost:4141.

With the sidecar configured we should also look at how Linkerd actually decides where to send requests. Above we have this Dtab for the outgoing router; /http/1.1/* => /#/io.l5d.consul/dc1;. This will route all http requests to the consul namer, the consul namer is a service directory and Linkerd will use the Host http header to lookup the service in that directory.

For example if we send the request:

curl -H "Host: my-service" "http://localhost:4141/some/rest/service"  

Linkerd will lookup my-service in consul and route to that address so:

http://localhost:4141/some/rest/service

could be routed like:

http://172.x.x.x:xxxx/some/rest/service

Dtab is very powerful and based on Finagle so routing can be as complex as you need, see https://linkerd.io/doc/dtabs/ for details and it's not just limited to http traffic either Linkerd supports Thrift and Mux protocols.

Service Discovery

Next up is service discovery, for our app to work we will need a way to find each service. To do this I chose Consul as its fairly easy to setup (compared to Zookeeper ;)) and Hashicorp products are generally well made.

Obviously with any service discovery system, services need to register to be known, I didn't want to build this into the services themselves and managed to find Registrator for Docker. This monitors docker and automatically registers exposed ports with Consul.

With the addition of Consul and Registrator, our architecture then looks like this:

So now whenever we create a new service it will be automatically available to Linkerd and will have traffic routed to it!

Note that in this experiment I run Consul as single server, in production you would use a highly available cluster.

Thoughts

I would certainly consider Linkerd whenever rolling out services that need to talk to each other, it provides some great features and was pretty painless to get running. I still have some tinkering to do with it but at this early stage I am impressed. One feature I would like to see is a way to sync all the metrics into a single dashboard :)

I am hoping to expand this experiment in the future, you can get the code on github if your interested.

Next Experiments

  • Try the live routing and route to a versioned service to simulate staging changes.
  • Try blue-green deployment to simulate production deployments.
  • Test failure handling by introducing "bad" services.
  • Look at how to aggregate metrics, maybe using something like logstash.
  • Make debugging easier, its currently a pig to debug these services, there must be a better way :)

Footnotes

  1. Rancher turned out to be great, the visual UI helped a lot getting over some of the Docker learning curve!

  2. I am still unsure about running docker in production, its sometimes unreliable which could be OSX but I think for now I will watch from the sidelines :) It is great for development though! Will also be interesting to see how CoreOS and Rkt does.