13
Views
0
Comments
Load balancer plus service actions
Application Type
Service
Platform Version
11.8.0 (Build 12006)

Hi All,


We have a Production environment with a round robin load balancer with 3 OS server behind it. We are seeing weird behaviour with timeouts. The case:

We have an OLD-API that needs to be phased-out and a NEW-API that is being phased-in. The way it works is that in the module of the NEW-API we also made a Service Action module, let's call it NEW-SA, that is used for re-routing the request of the OLD-API to the new module's core services. SO any request coming in to the OLD-API go to the NEW-SA for excecution. The OLD-API is thus only a  forwarder that stays in place until the entire organisation has moved from the OLD-API to the NEW-API. 

What we see is that once in a while we get timeouts. A normal request to the NEW-API takes app. 10-50 ms, a request to the OLD-API costs app. 50-200ms (probably due to the latency of calling the NEW-SA from the OLD-API). And in 1%% (one in 1000) of the the request on a day, mainly concentrated among sudden busy periods on a server (looks as if it has just been deployed to this server which is not what is the reality), the response time goes up to 10.000+ms (up to 100.000 which i think is the timeout time for OS service calls). 

The load balancer had sticky-sessions enabled, which we disabled and see that all request are neatly spread over the servers. Still things tend to go south when one of the servers is busy. Because we see that if 1 server is sort-of busy it looks like these service action calls timeout more often, it brought up the question how these service action calls are handled. These are OS - to - OS calls that are probably underwater API calls. Does that mean that they are handled as external calls (hence going via the load balancer) or as internal calls (staying on the same server)? 

Does anyone here recognize this situation or know how this mechanism works? Or even have a tip what we could do?

PS. The interesting problem here is the reason a server suddenly looks busy. The symptoms mimic those of a first time compile in the cache which happens shortly  after a deployment. And sometimes we see a system job refreshing the deployment just before a server shows these signs. But this is not supposed to effect the server. AND we also suspected that a virus scanner going over the files in the IIS .Net cache may be the culprit. But this we also turned off by excluding those folders from sweeps. A big mystery. 


Hope anyone has a spark over this one. 


Kind regards,

Alexander Lange