“CF FIBERLINK” enterprise switch common fault classification and troubleshooting methods

Switches are very commonly used in network construction. At the same time, in daily work, the phenomenon of switch failure is diverse, and the causes of the failure are also diverse. CF FIBERLINK divides the switch into hardware and software failure, and targeted analysis, category by category elimination.


Switch fault classification:

Switch faults can be generally divided into hardware faults and software faults. Hardware failure mainly refers to the failure of the switch power supply, backplane, module, port and other components, which can be divided into the following categories.

(1)Power failure:
the power supply is damaged or the fan stops due to unstable external power supply, or aging power line, static electricity or lightning strike, so it cannot work normally. Damage to other parts of the machine due to the power supply also often occurs. In view of such faults, we should first do a good job of external power supply, introduce independent power lines to provide independent power supply, and add voltage regulator to avoid instantaneous high voltage or low voltage phenomenon. Generally speaking, there are two ways of electric power supply, but due to various reasons, it is impossible to provide dual power supply for each switch. UPS (uninterruptible power supply) can be added to ensure the normal power supply of the switch, and it is best to use UPS that provides voltage stabilization function. In addition, professional lightning protection measures should be set up in the machine room to avoid the damage of lightning to the switch.

(2) Port failure:
this is the most common hardware failure, whether it is fiber port or twisted pair RJ-45 port, must be careful when plugging and plugging the connector. If the fiber plug is accidentally dirty, it may cause the fiber port pollution and can not communicate normally. We often see a lot of people like to live to plug the connector, in theory, it is ok, but this also inadvertently increases the incidence of port failure. Incare during handling may also cause physical damage to the port. If the size of the crystal head is large, it is also easy to destroy the port when inserting the switch. In addition, if a section of the twisted pair attached to the port is exposed outside, if the cable is struck by lightning, the switch port will be damaged or cause more unpredictable damage. In general, a port failure is a damage to one or several ports. Therefore, after eliminating the fault of the computer connected to the port, you can replace the connected port to judge whether it is damaged. For such failure, clean the port with an alcohol cotton ball after the power is switched off. If the port is indeed damaged, the port will only be replaced.

(3) Module failure:
the switch is composed of a lot of modules, such as stacking module, management module (also known as control module), expansion module, etc. The probability of failure of these modules is very small, but once there is a problem, they will suffer huge economic losses. Such failures can occur if the module is being accidentally plugged in, or the switch is being collided, or the power supply is not stable. Of course, the three modules mentioned above all have external interfaces, which is relatively easy to identify, and some can also identify the fault through the indicator light on the module. For example, the stacked module has a flat trapezoidal port, or some switches have a USB-like interface. There is a CONSOLE port on the management module for connecting with the network management computer for easy management. If the expansion module is fiber connected, there is a pair of fiber interfaces. When troubleshooting such faults, first ensure the power supply of the switch and module, then check whether each module is inserted in the correct position, and finally check whether the cable connecting the module is normal. When connecting the management module, it should also consider whether it adopts the specified connection rate, whether there is parity check, whether there is data flow control and other factors. When connecting the extension module, you need to check whether it matches the communication mode, such as using full-duplex mode or half-duplex mode. Of course, if it is confirmed that the module is faulty, there is only one solution, that is, you should immediately contact the supplier to replace it.

(4) Backplane failure:
each module of the switch is connected to the backplane. If the environment is wet, the circuit board is damp and short circuit, or the components are damaged due to high temperature, lightning strike and other factors will cause the circuit board can not work normally. For example, the poor heat dissipation performance or the ambient temperature is too high, resulting in the temperature in the machine, ordering the components to burn out. In the case of normal external power supply, if the internal modules of the switch can not work properly, it may be that the backplane is broken, in this case, the only way is to replace the backplane. But after the hardware update, the circuit plate of the same name may have a variety of different models. In general, the functions of the new circuit board will be compatible with the functions of the old circuit board. But the function of the old model circuit board is not be compatible with the function of the new circuit board.

(5) Cable failure:
the jumper connecting the cable and the distribution frame is used to connect modules, racks and equipment. If a short circuit, open circuit or false connection occurs in the cable core or jumper in these connecting cables, a failure of the communication system will form. From the above perspective of several hardware faults, the poor environment of the machine room is easy to lead to various hardware failures, so in the construction of the machine room, the hospital must first do a good job of lightning protection grounding, power supply, indoor temperature, indoor humidity, anti-electromagnetic interference, anti-static and other environment construction, to provide a good environment for the normal work of network equipment.

Software failure of the switch:

Software failure of a switch refers to the system and its configuration failure, which can be divided into the following categories.

(1)system mistake:
Program BUG: There are defects in the software programming. The switch system is a combination of hardware and software. Inside the switch, there is a refreshing read-only memory that holds the software system necessary for this switch. Due to the design reasons at that time, there are some loopholes, when the conditions are appropriate, it will lead to the switch full load, bag loss, wrong bag and other conditions. For such problems, we need to develop the habit of often browsing the websites of device manufacturers. If there is a new system or a new patch, please update it timely.

(2) Improper configuration:
Because to different switch configurations, network administrators often have configuration errors when configswitches. The main errors are: 1. System data error: system data, including software setting, is used to define the whole system. If the system data is wrong, it will also cause the comprehensive failure of the system, and has an impact on the whole exchange bureau.2. Bureau data error: The bureau data is defined according to the specific situation of the exchange bureau. When the authority data is wrong, it will also have an impact on the entire exchange office.3. User data Error: The user data defines the situation of each user. If the user data is set incorrectly, it will have an impact on a certain user.4, the hardware setting is not appropriate: the hardware setting is to reduce the type of the circuit board, and a group or several groups of switches set on the circuit board, to define the working state of the circuit board or the position in the system, if the hardware is not set correctly, will lead to the circuit board does not work properly. This kind of failure is sometimes difficult to find, need a certain amount of experience accumulation. If you cannot determine if there is a problem with the configuration, restore the factory default configuration and then step by step. It is best to read the instructions before the configuration.

(3) External factors:
Due to the existence of viruses or hacker attacks, it is possible that a host may send a large number of packets that do not meet the encapsulation rules to the connected port, resulting in the switch processor is too busy, resulting in the packets too late to forward, thus leading to buffer leakage and packet loss phenomenon. Another case is the broadcast storm, which not only takes up a lot of network bandwidth, but also takes up a lot of CPU processing time. If the network is occupied by a large number of broadcast data packets for a long time, the normal point-to point communication will not be conducted normally, and the network speed will slow down or paralyze.

In short, software failures should be more difficult to find than hardware failures. When solving the problem, it may not need to spend too much money, but need more time. The network administrator should develop the habit of keeping logs in their daily work. Whenever a fault occurs, timely record the fault phenomenon, fault analysis process, fault solution, fault classification summary and other work, in order to accumulate their own experience. After solving each problem, we will carefully review the root cause of the problem and the solution. In this way can we constantly improve ourselves and better complete the important task of network management.

Post time: May-15-2024