Application Monitoring
- Best Practices for application monitoring
- Systems can fail due to various reasons related to hardware ,
operating system , network or applications itself . Sometimes despite
good efforts systems and applications fail . Although one can not
assure always available status of these components there are some best
practices which can be followed to ensure high availability of applications
1. Plan Early : If there is a new application or software
component is becoming live and needs monitoring it is better to involve in
early discussions of architecture and design to get an overview of
things to come . This give time to think and implement the monitoring solution
when required. In many cases it will help as monitoring solution may not be a
straight forward and may require additional resources and efforts.
2. Monitoring proactively : Don’t let system/applications
go down and its failure be used as a point to start corrective
action . Monitor systems and applications proactively for the symptoms of
problem so that corrective action can be initiated before system/application
fails . Proactive monitoring can achieved by monitoring some
threshold values for resources utilization like CPU memory , network
bandwidth and application health parameters . If the system crosses the
threshold values a system health check has to be performed which include
finding the running processes , memory utilization by various process ,
monitoring application logs etc . The health check and corrective action
proactively can avoid system and application crash.
3. Balance the Load : Load balancers are used to distribute
the load on to the servers which can handle the load . In the event of one
server being heavily loaded or down the load balancers can automatically
direct the traffic to the healthy server . This operation by load balancers is
transparent to the users and they will not notice the difference. Load
balancers can be hardware or software based and if not present has to be
used for a high transaction application.
4. Cluster the servers : Clustering removes the single point of
failure by providing multiple points for request processing . In the event of
one server being down due to hardware failure , network failure or
heavily load on resources , requests are sent and processed by
other members of the cluster .
5. Create a Recovery Plan : To avoid delay online applications
should have a well documented & tested recovery plan . The plan should
cover the steps and checklists to be followed in the event of a application
failure. A simple example would be to test the fail over feature of a server
and observe the total requests failure and time taken to failover etc. which
can give a estimated time when a alternate server will be up . Having a plan
at the time of failure avoid time wastage to look for alternatives.
6. Deploy application code from a trusted & tested source :
Application code should be released from the trusted & tested source such
as version control system , staging or quality assurance environments . No
code should be released which has external changes other then trusted source
where only authorized persons have access . Using code in this way presents a
opportunity to simulate any code problems and examine the code base itself by the development teams.
7. Create a Service Level Agreement : A service level agreement in
writing emphasize the need and scope of monitoring . It provides
monitoring requirements for the support team and a standard
to measure the application availability by the business groups.
This document will give a estimated time to respond and fix the issues and
teams can work in advance to create a recovery plan which meets the service
level agreement .
8. Use Good hardware : Hardware which is proven to be reliable in
the industry should be used for production environment . All the additional
component cabling etc should be of high standard to avoid problems due to
hardware failures . Replacement components should be of exact specifications
as original. The hardware should have support mechanism with manufacturing
company or other company which can supply the components and
troubleshooting expertise in case of a failure.
9. Seek Professional Help : If your application is mission critical
,involves impact to customers and revenue then it is not sufficient to relay
on home grown solutions for monitoring but you should seek professional advice
from the companies which have been doing monitoring for other companies. These
companies besides monitoring applications can provide you with different type
of reports like response time , downtime , uptime etc. which may be helpful in
marinating and planning for the application resources.
NextPage – Best Practices for application monitoring
Tags: Application Monitoring

Posts 
Leave a Reply
Comments will be published after approval by Moderator.