DEMYSTIFYING AZURE TRAFFIC MANAGER

Azure Traffic Manager is designed to handle services when we have multiple instances of some Azure Hosted services that are in different regions.

For example, we have three different cloud services hosted over three different regions. One is in the US, one is in Europe and one is in Asia. Each of the instances has some web-based services. It could be either some web role or some IIS virtual machine or it could be an actual website.

So, each of the instance has that service, the same service which is geographically distributed. Now we have a user who wants to access this website. We need to do something that would make the user use that instance of the service which is closest to them, depending on the user’s location.

If the user is in US or near to it, we want them to use the instance which is in US. If the user is in Europe, we want them to use the instance which is in Europe and if the user is in Asia, then we want them to use the instance which is in Asia. Also, availability factor is checked as well. So, if the user is in US and the instance which is in US is unavailable, it will go to the next closest instance available.

Here comes the role of Azure Traffic Manager which is an additional service.

Its effect is somewhat to similar to that of DNS service. It lets you create profiles. So, we create a name and that name is globally unique in the Azure Traffic Management namespace. So maybe my profile would be krunaltrivedi.trafficmanager.net. It always sits in the trafficmanager.net namespace.

Now what does this profile lets you do is that this profile has some endpoints defined which it redirects to. So, my profile krunaltrivedi.trafficmanager.net would redirect me to the US version, the Europe version and the Asia version. And furthermore, we have different load balancing options on how this redirecting works.

The first option is the default option, the performance option which 70% of the people use. The concept of this option is that it redirects us to the closest service available. The way it works is that when the user hits the address from their machine, it first goes to their local DNS server as to what it’s configured in their IP config. At this point, I don’t want to give out krunaltrivedi.trafficmanager.net because I don’t want to make it publicly available. For example I have my own corporate website say www.techtrainingpoint.com and I have the authoritative DNS service for that zone. So, what I can do is I can create an alias in that authoritative zone and I would name it ttp. Say ttp.techtrainingpoint which is a cname record. So, this alias record ttp.techtrainingpoint actually points to my traffic manager name.

Now when the user types ttp.techtrainingpoint in their browser, it first goes to the local DNS server, then it goes and looks up through recursive queries, finds techtrainingpoint which is the authoritative DNS server, then finds ttp and returns the result to the user’s local DNS server. So, now the user is hitting my traffic manager service.

At this point, the traffic manager looks where is the user’s local DNS server located. It also looks for the latencies between the local DNS server and different Azure regions (in this case – US, Europe and Asia). It has a DNS latency map which it builds up over time. Based on the local DNS server that made the request, say our local DNS server is in US, it will redirect the user to the US region. This is what Traffic Manager is doing. It is basically looking at a local DNS server making the request and redirecting me to whichever service is the closest to my location. It is only going to redirect me to the ones that are available. So, every 30 seconds the traffic manager is trying to contact each of the services in different locations and identifying whether it is HTTP or HTTPS or a custom pool or a certain page within that website. It gives me all that information and after four failed attempts i.e. after two minutes, it will put the region into unavailable and won’t redirect the users to that region anymore. And yet, it is going to try every 30 seconds and if it becomes available, it will start redirecting people again to that region. It makes sure to redirect the user to the closest location that is available.

Additionally, the record krunaltrivedi.trafficmanager.net has a time to live just like all DNS which by default is very small – 5 minutes (300 seconds). That means that I am going to cache that record for a period of time. So, let’s say the US region’s site disappears, there are 2 minutes to realize that it is gone and then there is time remaining to live in the worst case before I get redirected to another site i.e. another location. I can reduce the 300 seconds to 30 seconds which has a cost impact to that because I pay for the request to the traffic manager alias, the name krunaltrivedi.trafficmanager.net. I would have to balance how important is it that I get people switched over as quick as possible, compared to the cost of those requests.

I can also implement round-robin so it just rotates between the different regions and returns with those that responds. I can also failover by putting them in order. It means that I can put US first and if that is unavailable, it goes to Europe and so on.

At high level, you can think of the Traffic Manager as an alternate DNS name that you can hide with a vanity domain that’s an alias to it. And the Azure traffic manager can point to different cloud services running in Azure and direct clients to either the version closest to them based on availability or the round-robin or the failover configuration. It is really whatever you want to do. It is giving you a sort of Geo awareness to redirect.

The key point is that it is based on the location of the DNS server and NOT the client’s location. Normally, the client and the DNS server would be in the same place but if the user was using like a Geo DNS service, then it is not going to work. If the user is using a global DNS, it is going to base where it sends the user based on the DNS server’s locale and not the user’s locale.

Bottom line is, it is not based on the client’s location. It is based on the local DNS server making the request.