Bharti Infratel has been a leading telecom tower infrastructure provider in India consistently for years now. The enterprise of such a large scale—close to 2000 employees—had to assure the seamless availability of apps for all its users. But the CIO had to deal with a unique problem of the apps going haywire during peak business hours.
Availability was a distant dream
Rajesh Mittal, head-IT operations, Bharti Infratel, had a two-fold problem at hand. One, the application performance at the end user was a challenge and another was the application availability at the end-user level. “Most of the time, the application worked fine at the datacenter level; but often users complained that the app didn’t perform as it behaved on several occasions like business hours 10-12, peak hours, and month ends.”
When the technical team checked at the server level, the app worked fine; but the users still complained. For example, a CXO of Bharti Infratel needed to access a critical application—like a PI, PO approval, budget approval. In such a tedious situation any cost approval can’t exceed ½ a second, otherwise, the delay is considered an overwrite. “At the same time, there wasn’t a measurement tool. We relied on the words of end users about the problem. By the time we approached the external vendor the problem would be normalized,” says Mittal.
Mittal categorized the issue into three parts:
- User satisfaction was one problem.
- Productivity was a challenge.
- Availability of the application at end-user level was a challenge.
Prognosis and troubleshooting:
Mittal and his team readied a solution. The team deployed RUE at the edge of the network, i.e., the datacenter, ensuring every transaction on the server is tracked. This gives a hop-by-hop tracking of the point of delay. It is an X-ray of the transaction, which analyses the transaction and conducts a fault detection.
If any transaction is taking longer than the expected time, then an SMS and email alert is sent. For example: If a CXO finds a problem, prior to that the IT team gets notified in advance. There is a dashboard where the health of the app is notified. After looking at so many alerts, an analysis is done. The dashboard detects the point and level of the fault. When you have an exception log over time, you get to know the root cause of the problem, which was hidden earlier. With this tool, diagnosis happens proactively and troubleshooting is faster. Hence, the rate of incidents has come down.
The solution also measures the availability of the application. There is a proxy agent sitting at each location and doing a synthetic transaction from each location to the company’s servers, 24x7. At any issue, it captures the log and notifies the issue from the server location. It gives real availability 24x7. Any problem at the server level is captured and gives a closer look at the network.
Benefits are two-fold:
Mittal could finally reap benefits of such an elaborate solution. “Productivity has been enhanced. People are facing lesser issues, and problems are being detected proactively.” Mittal says that he can now notice the problem pattern too. Therefore, the issues aren’t being repeated.
“There has been a reduction of tickets by 20 percent. This helped baselining the application performance at various levels and we could do an upgrade,” says Mittal.