Identifying and avoiding improper GSLB distribution

Raghav SN

4 years ago

One common global server load balancing (GSLB) issue we see often in the field is improper distribution of client traffic across datacenters. This affects the end user experience by increasing latency while accessing an application, which can have an impact on the productivity of the end user. In most of these reported cases, it turns out to be an expected behavior due to one or more run time states. Rarely does it require a code fix.

In this blog post, we’ll look at why improper distribution in GSLB happens and how you can avoid it.

Review the State and Metastate of the GSLB Service

The first step in analyzing improper distribution of GSLB traffic is to review the state of the GSLB services bound to the application-specific GSLB virtual server, as well as the meta-state. This will give you insights into the overall health of the application endpoint.

You might run across scenarios where the state of a GSLB service shows as “Up,” even though it might not be in a state to serve the client traffic. This can happen for a number of reasons, including:

The virtual server (load balancing or content switching or VPN virtual server) hosted on the Citrix ADC appliance represented by the GSLB service has reached its configured operating threshold (the soThreshold setting on the virtual server) or all the back-end servers bound to the virtual server have reached their configured operating threshold (maxClient or maxBandwidth setting on service/servicegroup entity in Citrix ADC).
In case of a VPN deployment, the VPN virtual server has reached its maximum AAA user limit.
All back-end servers bound to the virtual server are down for maintenance.
If the virtual servers (LB or CS) are down and are configured to return a redirect URL.

You can review the metastate of a GSLB service using the ‘show gslb service <name>’ command, as shown below:

show gslb service gslb_svc1
gslb_svc1 (10.102.217.57: 80)- HTTP Server: 10.102.217.57       State: UP
    Monitoring Owner: 0
    Last state change was at Tue Feb 6 13:58:25 2018
Time since last state change: 0 days, 00:00:09.230
    Entity not participating in GSLB because: One or more services are in the transition out of service (TROFS) state.
  Idle timeout: Client: 180 sec   Server: 360 sec
    Effective State: DOWN
Max Conn: 0         Max Bandwidth: 0 kbits
    PublicIP: 10.102.217.57     PublicPort: 80
    Type: LOCAL         Site Name: siteA

Review the Monitor Setting

The next step is to review and correct the monitor settings. One of the main reasons for inconsistency between state and metastate is that the monitor bound to the GSLB service is of TCP type, which fails to monitor application health. The existence of TCP connectivity doesn’t indicate the absolute health of the application, so it’s important to bind application-aware monitors like HTTP monitors with appropriate request and response settings. For the scenarios listed above, HTTP monitors would fail, giving the admin a “503 Service Unavailable” error response, and the service would be marked as “Down”.

GSLB Method and Persistence Setting

If GSLB services are healthy, the next step is to review the GSLB method setting. If the configured GSLB method is round trip time (RTT), review the following conditions:

If the GSLB appliance is getting the request from an LDNS IP for the first time, RTT values will not be available. The backup GSLB method would be applied to give out the IP in the response. Based on the backup GSLB method setting, the IP returned might not match the datacenter IP expected. You should:
- Configure static proximity as the backup method because its decision would be closer to that of RTT method
- If you always have to return an IP based on the calculated RTT, you can choose to drop all the requests from an LDNS IP until the RTT is calculated by enabling this configuration parameter : set gslb parameter -dropLdnsReq ENABLED
If persistence is configured and the backup GSLB method is used to return the application endpoint for a client, the same IP would be returned even for subsequent requests from the same LDNS. You can interpret this as the GSLB wrongly routing the client to a suboptimal location.

You can track GSLB decisions made using the backup GSLB method by using “Primary LB method Failures” counter reported under ‘stat gslb vserver <name>’ command (highlighted in below command output). If both the primary and backup GSLB method fail, GSLB decision will be made based on round robin method. This condition can be tracked using “Backup LB method failure counter, as highlighted in below command output.

stat gslb vserver gv1

Gslb Vserver Summary
Protocol State Health actSvcs inactSvc
gv1 HTTP UP 86 6 1

VServer Stats:
                                          Rate (/s)                Total
Vserver hits                                       0                    0
Primary LB Method Failures                        —                    0
Backup LB Method Failures                         —                    0

In case of the GSLB method being set to static proximity or to RTT, if there is more than one datacenter that is equidistant from the client, the datacenter IPs would be returned in response to clients in a round-robin fashion. The same IP might not be returned for subsequent requests, and a client might be routed to a datacenter that is different from the first time. An application might break in this case, unless you have site persistence configured.

In the case of the static proximity GSLB method, you should review the accuracy of the IP database loaded. In some cases, a client IP address range might be missing or will be associated with a location that’s different from our assumption. These anomalies in IP database can alter the GSLB decision. If the client IP address range is missing, the backup GSLB method will be used to give out a DNS response. If the client IP address location is represented differently in the IP database, an optimal GSLB service would be chosen based on the location in the database.

DNS Policy Setting

DNS policies with actions of type “DNS view” and “preferred location” can influence your GSLB decisions. You should review your DNS policies to ensure you understand the GSLB distribution.

Based on the DNS policy evaluation, if a DNS view is selected, the GSLB services bound with that view are the only ones considered for your GSLB decision. You need to make sure that DNS views are associated with all relevant GSLB services.

Based on the DNS policy evaluation, if a list of preferred locations is selected, the configured GSLB method would be overridden, and your GSLB decision would be made based on the preference configured in the DNS policies.

Conclusion

Incorporating the above recommendations will help in the optimal use of your resources spread across multiple datacenters and increase the productivity of your application user by helping you to deliver the best possible user experience.

Learn more about Citrix ADC’s GSLB solution.