Server Monitoring Best Practices

As a business, you may be running many on-site or Web-based applications and services. Security, data handling, transaction support, load balancing, or the management of large distributed systems. The deployment of these will depend on the condition of your servers. So it’s vital for you to continuously monitor their health and performance.

Here are some guidelines designed to help you get to grips with server monitoring and the implications that it carries.

Understand Server Monitoring Basics

The basic elements of “Server monitoring” are events, thresholds, notifications, and health.

1. Events

Events are triggered on a system when a condition set by a given program occurs. An example would be when a service starts, or fails to start.

2. Thresholds

A threshold is the point on a scale that must be reached, to trigger a response to an event. The response might be an alert, a notification, or a script being run.

Thresholds can be set by an application, or a user.

3. Notifications

Notifications are the methods of informing an IT administrator that something (event, or response) has occurred.

Notifications can take many forms, such as:

  • Alerts in an application
  • E-mail messages
  • Instant Messenger messages
  • Dialog boxes on an IT administrator’s screen
  • Pager text messages
  • Taskbar pop-ups

4. Health

Health describes the set of related measurements defining the state of a variable being monitored.

For instance, the overall health of a file server might be defined by read/write disk access, CPU usage, network performance, and disk fragmentation.

Set Clear Objectives

Server Monitoring Best PracticesDecide what it is you need to monitor. Identify those events most relevant to detecting potential issues that could adversely affect operations or security.

A checklist might include:

  1. Uptime and performance statistics of your web servers
  2. Web applications supported by your web servers
  3. Performance and user experience of your web pages, as supported by your web server
  4. End-user connections to the server
  5. Measurements of load, traffic, and utilisation
  6. A log of HTTP and HTTPS sessions and transactions
  7. The condition of your server hardware
  8. Virtual Machines (VMs) and host machines running the web server

Fit Solid Foundations

It’s safe to say that most IT administrators appreciate useful data, clearly presented enabling them to view lots of information in a legible area. This means that you should take steps to ensure that your monitoring output is easy to read and well presented.

A high-level “dashboard” can serve as a starting point. This should have controls for drilling down into more detail. Navigation around the monitoring tool and access to troubleshooting tools should be as transparent as possible.

It’s also necessary to:

• Identify the top variables to monitor, and set these as default values. Prioritise them in the user interface (UI).

• Provide preconfigured monitoring views that match situations encountered on a day-to-day basis.

• Have a UI that also allows for easy customization.

• Users/IT managers should be able to choose what they want to monitor at any given time. Or be able to adjust the placement of their tools. And they should be able to decide the format they want to view the data in.

• The UI text should be consistent, clear, concise, and professional. From the outset, it should state clearly what is being monitored – and what isn’t.

Build, to Scale

Organizations of different sizes naturally have different monitoring needs. Small Organization IT administrators often look to fix problems after they’ve been identified. Monitoring is generally part of the troubleshooting process. Monitoring applications should intelligently identify problems, and notify the users via e-mail and other means. Keep the monitoring UI simple.

Medium Organization IT administrators monitor to isolate big and obvious problems. A monitoring system should provide an overview of the system, and explanations to help with the troubleshooting process. Preconfigured views, and the automation of common tasks performed on receiving negative monitoring information (e.g., ping, trace route.) will speed response. Again, keep the monitoring UI simple.

Large Organization/Enterprise IT administrators require more detailed and specific information. Users may be dedicated exclusively to monitoring, and will appreciate dense data, with mechanisms for collaborating. Long-term ease of use will take precedence over ease of learning.

Set Up Red Flags

You should provide a set of “normal” or “recommended” values, as a baseline. This will give context to the information being monitored. The system may give the range of normal values itself, or provide tools for users to calculate their own.

Within the application, make sure that data representing normal performance can be captured. This can be used later, as a baseline for troubleshooting. In any case, users should be able to tell at a glance when a value is out of range, and is then a possible cause for concern. Your monitoring software can assist in this, by setting a standard alert scale, across the application.

In Western cultures, common colors for system alerts are:

  • Red = Severe or Critical
  • Yellow = Warning or Informational
  • Green = Good or OK

For accessibility, colors should be combined with icons, for users who are sight-impaired. Words that can be dictated by a screen reader are also appropriate. Limit the use of custom icons in alerts though as users may resent having to learn too many new ones. There may also be conflicts with icons in other applications but saying that, common icons, that are recognizable, are fine, as there’s nothing new to learn.

Explain the Language

Don’t assume that your users will understand all the information your monitoring software provides. Help them interpret the data, by providing explanations, in the user interface.

  • Use roll-overs to display specific data points, such as a critical spike in a chart
  • Explain onscreen, how the monitoring view is filtered. For example, some variables or events might be hidden (but not necessarily trouble-free). The filter mechanism, an explanation of the filter, and the data itself should be positioned close together
  • Give easy access to any variables that are excluded in a view
  • State when the last sample of data was captured
  • Reference the data sources
  • There should be links to table, column, and row headings, with pop up explanations of the variables, abbreviations, and acronyms
  • Provide links beside the tables themselves, with pop up explanations of the entire table

Let Them Know

Alerts should be sent out, to indicate there is a problem with the system. Notifications should be informative enough to give IT administrators a starting point to address the problem. Information which helps the user take action should be displayed near the monitoring information. Probable causes and possible solutions should be prominently displayed.

Likewise, the tools needed for solving common problems should be easily accessible at the notification point.

You should log 24 to 48 hours of data. That way, when a problem arises, users will have enough information available to troubleshoot. Note that some applications need longer periods of monitoring, and some shorter. The log length will be determined by the scope of your day-to-day operations.

Provide multiple channels for notification (email, Instant Messages, pager text, etc.)

Users should be able (and encouraged) to update, document, and share the information needed to start troubleshooting.

Keep Them Informed

Users often need to use monitoring data for further analysis, or for reports. The monitoring application itself should assist, with built-in reporting tools. Performance statistics and an overall summary should be generated at least once a week. Analysis of critical or noteworthy events should be available, on a daily basis.

Allow users to capture and save monitoring data – e.g., the “normal” performance figures used as a baseline for troubleshooting. Users should be able to easily specify what they want recorded (variables, format, duration, etc.). They should also be allowed to log the information they’re monitoring.

There should be a central repository, for all logs from different areas of monitoring. A single UI can then be used, to analyze the data. Export tools (to formats such as .xls, .html, .txt, .csv) should be provided. This will help to facilitate collaboration in reporting and troubleshooting.

Take Appropriate Measures

Different graph types should be appropriate to the type of information you are analyzing.

Line graphs are good for displaying one or more variables on a scale, such as time. Ranges, medians, and means can all be shown simultaneously.

Table format makes it easy for users to see precise numbers. Table cells can also contain graphical elements associated with numbers, or symbols to indicate state. The most important information should appear first, or highlighted so that it can be taken in at glance.

Histograms or bar graphs allow values at a single point in time to be compared easily. Ranges, medians, and means can all be displayed simultaneously.

Some recommendations:

  • When using a line graph, show as few variables as possible. Five is a safe maximum. This makes the graph easier to read
  • Avoid using stacked bar graphs. It’s better to use a histogram, and put the values in clusters along the same baseline. Alternatively, break them up into separate graphs
  • When using a graph to show percentage data, always use a pie chart
  • Consider providing a details pane; clicking a graph will display details about the graph in the pane
  • Avoid trying to convey too many messages in one graph
  • Never use a graph to display a single data point (a single number)
  • Avoid the use of 3D in your charts; it can be distracting
  • Allow users to easily flip between different views of the same data

Push the Relevant Facts

Displaying a lot of stuff onscreen makes it harder for administrators to spot the information that is of most value – like critical error messages.

Draw attention to what needs attention, most:

  • by placing important items prominently
  • by putting more important information before the less important
  • by using visual signposts, such as text or an icon, to indicate important information

Preconfigured monitoring views will reduce the emphasis on users configuring the system. Allow users to customise the information and highlight what they think is important, so it can be elevated in the UI. Group similar events – and consider having a global overview of the system, visible at all times.

Hide the Redundant

If it hasn’t gone critical, or isn’t affecting anything, they don’t need to see it. At least, not immediately. If a failure reoccurs, don’t keep showing the same event, over and over. Try to group similar events into one.

Allow your users to tag certain events as ones they don’t want to view. Let them set thresholds that match their own monitoring criteria. This allows them to create environment-specific standards, and reduces false alarms. Use filters and views, to give users granular control of what they are monitoring.

Provide the ability to zoom in for more detailed information, or zoom out for aggregated data. Allow users to hide unimportant events, but still have them accessible.

Be Prepared, for the Worst

As well as probable causes, the application should suggest possible solutions, for any problems that occur. Administrators will likely have preferred methods of troubleshooting. But, in diagnostic sciences, it helps to get a second opinion. It’s essential to identify events most indicative of potential operational or security issues. Then, automate the creation of alerts on those events, to notify the appropriate personnel.

Being prepared also means that all data should be backed up and stored off the premises as well as on the network. This protects against the obvious such as hardware failure or malware attacks, but also against complete disaster such as a fire at the premises.

And the Best, that Can Happen

With proper monitoring measures in place, you greatly reduce the risk
of losses due to poor server performance. This has a corresponding positive effect on your business – especially online services and transactions.

A well-tuned monitoring system will help facilitate the identification of potential issues, and accelerate the process of fixing unexpected problems before they can affect your users.

OK, folks that’s it for this post. Have a nice day guys…… Stay tuned…..!!!!!

Don’t forget to like & share this post on social networks!!! I will keep on updating this blog. Please do follow!!!

Troubleshooting Network & Computer Performance Problems

Problem solving is an inevitable part of any IT technician’s job. From time to time you will encounter a computer or network problem that will, simply put, just leave you stumped. When this happens it can be an extremely nerve-wracking experience, and your first instinct might be to panic.

Don’t do this. You need to believe that you can solve the problem. Undoubtedly you have solved computer performance or network troubles in the past, either on your job or during your training and education. So, if you come across a humdinger that, at first glance at least, you just can’t seem to see a way out of, instead of panicking, try to focus and get into the ‘zone’. Visualize the biggest problem that you’ve managed to solve in the past, and remember the triumph and elation that you felt when you finally overcame it. Tell yourself, “I will beat this computer,” get in the zone, and prepare for battle.

Top 3 Computer & Network Issues You’re Likely To Experience

Network staff and IT security personnel are forever tasked with identifying and solving all manner of difficulties, especially on large networks. Thankfully there are, generally speaking, three main categories that the causes of these issues will fall into. These are: Performance Degradation; Host Identification; and Security.

Let’s take a closer look at each of these categories.

1. Performance Degradation

Performance DegradationPerformance degradation is when speed and data integrity starts to lapse, due, normally, to poor quality transmissions. All networks, no matter their size, are susceptible to performance issues, however the larger the network, the more problems there are likely to be. This is due in the main to the larger distance, and additional equipment, endpoints and midpoints.

Furthermore, networks that aren’t properly equipped with an adequate amount of switches, routers, domain controllers etc. will inevitably put the whole system under severe strain, and performance will thereby suffer.

So, having an adequate amount of quality hardware is of course the start of the mission to reduce the risk of any problems that you may encounter. But hardware alone is not enough without proper configuration – so you need to get this right too.

2. Host Identification

Host IdentificationProper configuration is also key to maintaining proper host identification. Computer networking hardware cannot deliver all of the messages to the right places without correct addressing. Manual addressing can often be configured for small networks, but this is somewhat impractical in larger organizations. Domain controllers and DHCP servers and their addressing protocols and software are absolutely essential when creating and maintaining a large, scalable network.

3. Security

Network SecurityHost identification and performance issues will not make any difference to a network that finds itself breached by hackers. And so, security is also of utmost importance.

Network security means preventing unauthorized users from infiltrating a system and stealing sensitive information, maintaining network integrity, and protecting network denial of service attacks. Again, these issues all magnify in line with the size of the network, simply because there are more vulnerable points at which hackers may try to gain access. On top of this, more users mean more passwords, more hardware, and more potential entry points for hackers.

Your defenses against these types of threats will of course be firewalls, proxies, antivirus software, network analysis software, stringent password policies, and invoking procedures that adequately compartmentalize large networks within in internal boundaries – plenty of areas, then, which may encounter problems.

Troubleshooting the Problems

Ok, so those are the potential difficulties that you are most likely to encounter. Identifying the source of any given problem out of all of these things can of course cause a lot of stress for the practitioner tasked with solving it. So, once you’ve got into the ‘zone’, follow these next 5 simple problem solving strategies and you’ll get to the bottom of the snag in no time. Just believe.

1. Collect Every Piece of Information You Can

Troubleshooting-Gather InformationThis means writing down precisely what is wrong with the computer or network. Just by doing this very simple act starts to trigger your brain into searching for potential solutions. Draw a diagram to sketch out the problem as well. It will help you visualize your task at hand.

Next you need to ask around the office to find out if anything has changed recently. Any new hardware for instance, or any new programs that have been added. If it turns out that there has, you need to try the simple step first of reversing the engines. Revert everything back to how it was before and see if that fixes things.

One of the best troubleshooting skills that you can have is pattern recognition. So, look for patterns in scripts, check for anything out of the ordinary. Is there a spelling mistake somewhere? A file date that is newer than all the rest?

2. Narrow the Search Area

Harware or Software - Narrow Search AreaFirstly you need to figure out if the problem is to do with hardware or software. This will cut your search down by half immediately.

If its software, then try and work out the scale of the problem – which programs are still running and which are not? Try uninstalling and then reinstalling the suspected program.

If it’s hardware, then try swapping the suspect component in question with something similar from a working machine.

3. Develop a Theory

Theory for Possible Causes of ProblemsMake a detailed list of all the possible causes of the problem at hand, and then ask yourself very seriously, using all of your experience, which one is it most likely to be? Trust your instincts, and also turn to the internet. The likelihood is that someone somewhere will have encountered just this very thing before, and may well have posted about it in a blog or forum. If you have an error number, then that will improve your chances of finding a reference. From here, you are in the perfect position to start the process of trial and error.

4. Test Your Theories Methodically

Test Your TheoryThe best troubleshooters test one factor at a time. This can actually be quite a discipline, but it is essential in order to be thorough. Write down every single change that you make, and keep listing potential causes as they occur to you, as well as possible solutions, and keep drawing diagrams to help you visualize the task.

5. Ask For Help!

Ask for HelpSeriously, there is no shame in it, so don’t start getting precious. Try and figure out who the best person would be to solve the problem and get in touch with them. Send out emails,post to forums, call an expert or contact the manufacturer. Do whatever it takes. It’s all part of the troubleshooting process, and you need to know when you require assistance.

OK, folks that’s it for this post. Have a nice day guys…… Stay tuned…..!!!!!

Don’t forget to like & share this post on social networks!!! I will keep on updating this blog. Please do follow!!!

 

 

 

What Are the Differences Between Routers and Switches?

In this article I will talk about the differences between two of the most common networking devices, routers and switches. You may already be somewhat familiar with these devices, even if you are not working in an IT department. Home internet connections became so common these days that we are practically addicted to them without even realizing we are. Because technology evolved so fast, newer, faster and cheaper networking devices have been developed to fulfill our needs. Many of you may own a router to connect to the Internet. If you are an IT professional, you probably know how network devices work, but for a casual user, these things may sound a bit like science fiction. If you’ve ever been curious about how routers and switches work, this is a perfect opportunity to learn about their role and functionality.

Protocol Stacks and Layers

Two main protocol stacks are used in today’s communications, OSI and TCP/IP. These designs define the rules that manage data communications inside computer networks. You should know that these stacks are divided into several layers. Each layer is independent and provides an important and unique role in communications. For more information about this, check out the following link to download “Networking Short Review“.

After having a general idea of protocol stacks, you can identify at what layer each networking device works. Based on a defined set of rules, both switches and routers take decisions on how and where data should be forwarded. If you don’t know by now, routers are also called layer 3 devices, while switches are layer 2 devices. But how did we get to this idea and what things are defined by each layer? Well, the networking layer (this is how it’s named in the OSI stack or the Internet layer in the TCP/IP design) is where routers take decisions based on the information gathered from the network. The IP (Internet Protocol) was developed as the central piece in data transmissions. There is much to talk about regarding this layer, but it’s not the main topic for this article. For those interested, read more here.

IP Addresses

An IP address is a 32 bit element used to identify a certain machine. Whenever data is sent between networking devices, it must be segmented into smaller pieces for better manipulation and transmission. At the network layer, these pieces are called packets. Each packet carries all the elements needed to communicate between devices. Layer three is responsible for the logical transmission between two devices. It’s called a logical transmission because even if devices are not physically connected, at layer three the transmission is seen as a client-server communication. Source and destination IP addresses are used to identify each machine involved in this operation and based on the information gathered from them, routers take forwarding decisions. All routing information is stored in routing tables

Diagnose Routing ProblemsWhat are Switches?

Switches are layer 2 devices because they make decisions based on the physical address (also known as the MAC address – Media Access Control). In the OSI stack, this layer is also known as the Data Link Layer. Each physical device uses the MAC address to uniquely identify itself in a computer network (two devices with the same MAC address cannot exist). Switches communicate between each other using physical addresses. To exchange information switches also use broadcasting and ARP mechanisms. The PDU, or the protocol data unit, defined at the Data Link Layer is known as the frame. A frame contains all the information involved in a layer 2 transmission. A frame is formed by adding the header (that contains the source and destination MAC address) and the trailer (error checking and other information) to a packet. This mechanism is also known as encapsulation.

Check out the following link from Wikipedia to better understand how Ethernet frames look. Switches store their layer 2 information in MAC address tables. The whole concept is pretty simple: when a frame is received, the switch will check the packet’s destination MAC addresses. If the address is found, the frame will be forwarded, through the desired interface, directly to the destination machine. MAC address tables store bindings between MAC addresses and switch ports. If the MAC address is not found, it will be added as a new entry in the MAC address table. The switch will then flood the frame on all its interfaces except the one that the frame was received from. This is known as a broadcast message and it is an important aspect of switches. Remember that switches will forward broadcasts while routers will block them.

Differences Between Switches and Routers

You may already know that routers define broadcast domains while switches define collision domains. A broadcast domain is defined by a single physical interface on a router. We say that switches segment collision domains because unlike hubs, each port defines a separate communication channel. In these channels collisions do not occur and transmission is made in full duplex mode (both sending and receiving of data can be done in the same time).

Another difference between these network devices is that usually, routers have a lower port density than switches. Then why should you use routers when you have a higher number of ports available on switches? Because each port connects a different network, the transmission between routers is made using the highest available speed on the physical port. With switches, the whole available speed is divided between all the transmitting ports. So even if you have fewer ports available on a router, the ports will forward data at the highest available speed. This is why routers are used when sending data between two distant networks.

Switches are used to create LANs while routers are used to interconnect LANs. A group of interconnected LANs is known as a WAN (Wide Area Network).

switches & routersRouters and switches can use different ports. Besides the normal FastEthernet, Fiber or Serial ports, they can also be equipped with console or aux ports and other special interfaces. Some advanced networking devices are modular, meaning their configuration can be changed even when the device is turned on in order to reduce the downtime. These modular devices are redundant meaning that they have two or more components with the same functionality. Such network devices are expensive and are usually used by large enterprises or ISPs. Remember that the cost of a network device can vary from several to thousands of dollars.

Unlike switches, routers can also support additional services like DHCP, NAT or packet filtering. These services can be activated using the router’s GUI or the console line. Network devices use different technologies to support their functionality. For example, switches use VLANs, STP or VTP technologies and routers use dynamic routing protocols, VLSM or CIDR.

I hope all the important aspects of these two network devices have been pointed out. If you think that there is more to be added here don’t hesitate to leave a comment.

OK, folks that’s it for this post. Have a nice day guys…… Stay tuned…..!!!!!

Don’t forget to like & share this post on social networks!!! I will keep on updating this blog. Please do follow!!!

PLANNING YOUR NETWORK FOR AWS CLOUD MIGRATION

Cloud computing symbol being pressed by hand

A core component in your company’s move to the Amazon Web Services cloud is the design of Amazon Virtual Private Cloud (VPC) network resources. Although AWS provides a VPC network design wizard, issues such as IP address range selection, subnet creation, route table configuration and connectivity options must be carefully evaluated.
Thinking through such issues ensures a smoother transition from in-house infrastructure while reducing the risk of time-consuming backtracking of your cloud architecture.

Single versus Multiple AWS Accounts

Before designing your VPCs it is necessary to consider into how many AWS accounts you will deploy.

In some situations, using a single AWS account may be sufficient. One account is often sufficient when using AWS for disaster recovery or as a development sandbox. However, in many other situations multiple AWS accounts should be considered.

For instance, you may wish to separate development and testing environments into one account and place a production environment into a second account. Additional AWS accounts may be created to reflect the organizational structure of larger enterprises. Multiple accounts can be utilized also to separate workloads based on security requirements such as the isolation of PCI-compliant workloads from those that are less sensitive.

 

Single versus Multiple VPCs

Choosing whether your AWS infrastructure utilizes a single VPC or several VPCs is not a straightforward decision.

Using multiple VPCs provides for better isolation between the systems, contains the scope of security audits, and limits “blast radius” in case of an operator error or security breach. However, multiple VPCs increase the complexity of network topology, routing, and connectivity between the VPCs and on-premise data centers.

Using a single VPC simplifies the networking and connectivity but makes it harder to isolate workloads from one another. With a single VPC isolation of workloads, user accounts and network access leans heavily on the use of AWS Security Groups (SGs) and Network Access Control Lists (NACLs). The likelihood of running into AWS limits related to SGs and NACLs is higher in this scenario.

If you use multiple VPCs, consider isolating them from each other. VPC isolating is also appropriate if you want VPCs for  shared infrastructure  tools such as authentication stores, management tools, or common entry points (e.g., bastion servers).

Single vs Multiple Region Deployments

AWS regions by design are isolated from each other which means that virtual networks are also inherently separated. For most uses, a single region network configuration is sufficient. However, in circumstances that require low latency with active processing workloads in globally shared configurations there may be additional considerations to take into consideration.

When evaluating interconnection between regions, scrutinize whether you can deploy to multiple regions within an isolated architecture or if a content delivery network (CDN) solution such as CloudFront meets your needs.

Subnetting

There are a number of factors with respect to VPC subnetting that must be taken into account :

  • High availability of AWS Managed Services, such as RDS, is achieved by using multiple subnets in multiple AWS Availability Zones
  • AWS subnets cannot be resized
  • AWS subnets will either share or have independent routing tables assigned

If your subnet IP spaces may not meet your future needs, consider using from the start the newly added support for IPv6 in AWS.

Network Connectivity Options

VPC Peering

Peering allows communication between VPCs using private IPv4 or public IPv6 addressing over a virtual connection. This feature enables AWS cross-account connections within a single AWS region. These facilitate resource sharing between two or more VPCs although it does not allow transitive peering relationships.

VPN

AWS provides several flavors of VPN connectivity depending on your needs. These are used to connect your VPCs to remote networks such as your corporate intra-net:

  • AWS managed hardware VPN – A high-availability, redundant IPsec connection compatible with major vendors’ routers
  • Customer managed software VPN – Consists of an EC2 instance within a VPC running a software VPN appliance obtained from a third-party
  • AWS VPN CloudHub – For connection to multiple remote networks

AWS Direct Connect

AWS Direct Connect provides a dedicated physical connection for high-performance and high-reliability connectivity between AWS and on-premise data centers. Often, VPNs are configured over Direct Connect connections.

Conclusion

Choosing the correct VPC architecture for your cloud migration is a critical first step in moving to the cloud. “Re-dos” are unfortunately common when system partitioning, CIDR sizing or inappropriate VPC options are chosen that lead to hard-to-manage, insecure or inefficient cloud infrastructure.

Don’t forget to like & share this post on social networks!!! I will keep on updating this blog. Please do follow!!!