In this last of the GSLB BLOG series, I’d like to share some implementation tips based upon my experiences in the project referenced throughout these BLOG posts. Again, many of these are common sense, while other ear obvious only in retrospect.
Understand the Network
This one is obvious – especially in retrospect!
Remember that the intent of this project was to overlay IP Proximity GSLB atop an existing network of SSLVPN portals – one in each datacenter with its unique FQDN. Early in the process, I had talked to the IT folks, and created my list of Site IP addresses, SSLVPN IP addresses, and routing information.
Since this was a four site implementation, the communications matrix became a little daunting. To keep track of the sites and their status, I built a connectivity spreadsheet. This contained a list of all the internal and external facing IP addresses of the SSLVPN portals, ADNS servers, GSLB Site IPs, and others. It was a conglomeration of all the addresses that GSLB and users would use to effectively delegate DNS requests and access the appropriate services.
(You know I just couldn’t supply the real IP addresses !)
This spreadsheet was extended, and used to track which site could communicate with which site using both internal and external addresses. So I filled in the appropriate status during the initial connectivity tests when we were adjusting firewall ACLs and such.
Since some sites were managed by corporate, while the others were managed locally, ensuring that the appropriate NAT, ACLs, and routing policies became interesting. During the deployment, we referred to this constantly. Often, in problem situations, I used it to show that “we could contact the site yesterday, what changed?”
Walk, Then Run!
To facilitate monitoring site IP address assignment, I used the Authoritative DNS (ADNS) service on only one NetScaler. While I actually had defined an ADNS on other NetScaler systems, I updated the corporate DNS server to contain only one NetScaler-based lookup delegation. This simplified gathering network traces for issue resolution if necessary.
I did, in fact, encounter “opportunities” to gather traces using the onboard tracing facility in the NetScaler appliance. While I am used filtering trace data based upon VIP addresses, in this situation the packets of interest were those going to/coming from the ADNS service.
A somewhat cumbersome aspect of this is that when a user reported an apparent GSLB mis-route they can easily give you their IP address. But this is never represented in this trace data. It is the IP address of this user’s ultimate DNS server that appears as the source IP in the traces. I say “ultimate” because I found that requests were often passed up the hierarchy within the IPS’s DNS structure. That meant that coordination of user activity and tracing was important.
Of course I had the opportunity to use my IP address to decimal converter and the Access database SQL lookup routines (as referenced in an earlier BLOG) frequently. That allowed us to correlate the traces and the IP location database to fully understand the user DNS locations and necessary overrides.
After completing the “walking” phase of implementation, (the configuration was working to satisfaction), I activated the ADNS service on the other NetScaler systems. With the appropriate corporate DNS server updates, this allowed the other NetScaler systems to participate in the GSLB DNS/ADNS delegation for resiliency.
Formalized User Testing
Personally, I’d have preferred running the tests myself from hotels in Singapore, Paris, Rio, Jo’burg, Tokyo, Beijing, Acapulco, Honolulu, Monaco, and – well you get the picture. In the absence of limitless travel budgets, however, I needed the assistance of the globally distributed users to assist in the validation of the GSLB site allocation algorithms. I had to solicit the assistance of colleagues and friends for the worldwide testing.
There were two issues I had to consider.
The first is that these users were not necessarily technical IT staff, and this testing dealt with technicalities that included IP addresses, DNS servers, and such. Since I was asking them to do me a favor, I wrote clear and specific instructions pertaining to accessing and recording the sites allocated. This took the form of a seemingly user-proof task list that they would follow.
Secondly, I had to persuade the users to run the testing from home. Since providing GSLB services to users on the corporate network was outside of the scope of this project, we did not add location overrides for each of the datacenters. That meant that if the users performed our tests from the PCs on the corporate network, the results would be unpredictable.
In addition to the instruction sheet, I gave them a tracking sheet. On this they recorded the results of the tests. A snippet of this is included to the right.
I knew that some routing anomalies would occur because it was simply not possible to address all conditions. To ease the user experience, the IT group made all landing pages the same after the user testing. For problem determination, however, slight changes to the landing page for each SSLVPN Site were made.
Rather than blatantly display the site-id, embedded site-specific comments were used. In this way, problem determination could be simplified via a “View Source”.
This approach also keeps users happier by deemphasizing where they are actually routed in the event of a site failure or other incident.
There are, of course many minor considerations that came into the project. The use of redirects to manage book marks, migration to wildcard certificates to ease the migration from site-specific FQDNs to the GSLB FQDN, and others.
But we can save those for another day.
This concludes my series of BLOGs pertaining to GSLB. I’ll update these BLOGs when my detailed paper becomes available.
As always, I’d love to hear your comments!
And don’t forget to follow all or us at: Ask the Architect