The methodology for
solving corporate network problems: an introduction to best diagnostic
practices
Diagnosing and
troubleshooting a network problem in a corporate network monitor can be a
daunting activity. With the potential of multiple branches, hundreds or even
thousands of hosts, dozens of routers, switches and servers, all with different
vendors or firmware and good old-fashioned human error, knowing where to start
is the key. implementing a quick fix. There is a well-established methodology
for diagnosing a significant network problem and following its instructions
will help administrators maintain an organized approach to troubleshooting.
Know where to start
Previous experience with
the network in question can help administrators find and resolve the problem.
If most of the network problems that occur during NOC as a Service come
from specific errors with a known fix, this will quickly provide
troubleshooting for a first choice for resolving a problem. Even without
familiarity with the network, it is possible to follow a procedure that will
help keep everyone involved.
The first and most
obvious step is to define the problem in order to resolve network errors. If a
user is unable to connect to a file server to access their work, this would
define the problem. This first step is generally known simply by its nature. It
is rare to be called for troubleshooting without a clear problem already
occurring!
Then collect information
from users or affected systems. In the example above of a user having problems
connecting to a file server, some basic questions might be worth asking. When
was the last time the user was able to access the server? Has anything changed
since then? Do other users also have the same problem? If the problem is more
prevalent, there is likely to be a problem upstream in the network. If isolated
from a single host, there is probably no bigger network problem to solve.
Gathering information can be one of the most important and often overlooked
steps in troubleshooting a large network. The data and testimonials collected
here can be used to guide administrators through the troubleshooting process.
Data collection with
Ping and TraceRoute
It is quite important to
collect your section. The ping and trace routing tools provide much more
information than their simplistic functions imply. A large amount of data can
be collected for further analysis using only these two commands.
Using another example,
let's assume that some users in a part of an office are unable to connect to
the network. The ping command can be used to gather information and isolate the
problem. This diagnostic tool works at the network level and its first use can
be attributed to the division and conquest approach for troubleshooting. Simply
send a package from the host machine to the destination. Keep in mind that some
interfaces may have access controls or that there may be a hardware/software
firewall that prevents pings from reaching a host, so this command can have
limited uses, especially on incoming WAN interfaces.
Cisco recommends a
specific four-step procedure when using ping to diagnose network-level IP
errors:
- Ping the loopback address. This
is 127.0.0.1 and is used specifically for diagnostic purposes. This
confirms that TCP / IP is working on the host.
- Ping the localhost. This is
the internal IP address of the affected host. For example, 10.0.0.2. If
this ping is successful, the network card works.
- Ping the default gateway. If
successful, the problem is likely to be upstream of the host computer.
- Ping an external IP. If
successful, but the host still cannot connect to the Internet or another
network, there may be a DNS error, an incorrectly configured access
control list, or a problem with a firewall.
Depending on the
information gathered about the problem, some of these steps may have been
skipped. In the example above, if it is already known that the host within this the network can still communicate with each other, it makes sense to skip steps one
and two.
Another powerful command
is traceroute (on Cisco IOS) or tracert (on the Windows command prompt). The
tracking path will send a packet to the destination and report the necessary
steps to get there. If the packet cannot communicate with a router on the way
to its destination, it will be reported to the user executing the command. This
can highlight where a potential problem is occurring and give administrators a
good idea of where to start looking for the problem.
Analyze the data and
work on a solution
Once you have defined
the problem and collected the information, you need to perform an analysis to
resolve network problems. This can be simple or complex, depending on the data
present. Analyzing available data is an important step in solving a network
problem, as it provides advice on the methodology with which to start solving
the problem.
Top-down approach
These methods are
exactly as they are: troubleshooting from the top of the bottom OSI model or
from the bottom of the top OSI model. Working with these methods can be
effective because, in general, if a layer works, the underlying layers usually
work properly. This will not always be true, but in most cases, it will be. The
downside is that if insufficient information has been collected, starting on
the wrong side of the model can create an unnecessary amount of extra work.
That's why collecting detailed information and analyzing it is so important! If
the problem is at the application level and troubleshooting starts from the
physical level, it will take a long time and effort to confirm that the other
six levels work before reaching the real problem. Depending on network access,
it can sometimes be difficult, if not impossible, to verify the higher levels
of the OSI model, which must be taken into consideration before selecting this
approach.
Divide and conquer
Often the most effective
methodology when information is limited, this approach starts in the middle of
the OSI model, usually the network layer, and works outward. This is where the
ping and traceroute commands come in. Depending on the positive (or negative) the outcome of the ping test, it will guide the troubleshooting up or down in the
model. If the ping works correctly, there is probably a problem at a higher
level. Likewise, if ping fails, there is a problem at level 3 or lower. This
can help you quickly find a path to the problem to be solved and get
administrators to work quickly on a solution.
Improvisation and other
methods
A handful of methods
fall into this category and should generally only be used when the information
gathered indicates a very specific problem. Another reason to opt for this
method first would be that the same network problem appears regularly and that
a fix is already known. If there is a high probability that the problem will
be quickly identified and resolved using this method, this will save time and
resources compared to other methods. Knowing a particular network will help
administrators decide if this is the correct way to go about resolving a
problem.
Be flexible
Every network is
different, every problem is different, and administrators need to be able to
adapt to an evolving network environment in order to diagnose and resolve
network problems quickly and effectively. While a consistently followed and
well-documented troubleshooting plan will help keep everyone on the same page
to quickly resolve potential problems, flexibility is needed to speed up
response and correction times. Understanding when not to follow procedures is
essential to maintaining a large network.
Solve recurring problems
All networks will
experience a significant number of errors and problems. However, if the same
problem constantly raises its ugly head, the search for a permanent solution is
important. If a router fails regularly, for example, it may be time to replace
it. Redundancy can help solve, but not solve, recurring network problems.
Likewise, stop-gap or quick-fix solutions should implement long-term solutions
as early as possible to avoid future headaches. Moving forward to a problem is
often the best way to solve it.
The human factor and
harmful intrusions
People make mistakes.
They forget to plug things in, turn them on, set them up correctly or just
don't know how to make something work. The best way to combat human error is to
have knowledge and practice. An experienced user will cause far fewer network nightmares
than the uneducated user. Always consider the human factor when analyzing data
and finding a solution to a problem.
Likewise, humans
sometimes have unscrupulous goals when they access a network. Always follow
best security practices and keep in mind that sometimes network errors can come
from malicious sources designed to stop the service. These types of attacks
come in many forms and the best way to prevent them is through education and
proactive defense.
Network monitoring
software
There is a multitude of
networking software that will help monitor, diagnose and troubleshoot large
networks. From open-source tools available for free on the Internet to full-service business-oriented options, there will be a software solution for anyone
who can help administrators manage their networks. Using these tools can help
speed up the resolution of network problems, taking up a lot of time and human
resources and putting them in the hands of the software.
No comments:
Post a Comment