|
Reliability in process control computing can be defined as the correct operation of a system up to a time t = T, given that it was operating correctly at the starting time t = 0. 1 However, correct operation can have many meanings, depending on the requirements previously established for the system. A common attitude today is that single or multiple failures can be accepted as long as the system does not go down or the desired operation is not interrupted or disturbed. Reliability is therefore a goal to be expected of a system and is set by the users. To obtain a certain measure of reliability, the term faulttolerant computing can be used. It may be defined as “the ability to execute specified algorithms correctly regardless of hardware errors and program errors.” 2 Since different computers in different applications have widely different requirements for reliability, availability, recovery time, data protection, and maintainability, an opportunity exists for the use of many different fault-tolerant techniques. 3 The understanding of fault tolerance can be helped by first understanding faults. A fault can be defined as “the deviation of one or more logic variables in the computer hardware from their design-specified values.” 1 A logic value for a digital computer is either a zero or a one. A fault is the appearance of an incorrect value such as a logic gate “stuck on zero” or “stuck on one.” The fault causes an “error” if it, in turn, produces an incorrect operation of the previously correctly functioning logic elements. Therefore, the term fault is restricted to the actual hardware that fails. Faults can be classified in several ways. Their most important characteristic is a function of their duration. They can be either permanent (solid or “hard”) or transient (intermittent or “soft”). Permanent faults are caused by solid failures of components. 4 They are easier to diagnose but usually require the use of more drastic correction techniques than do transient faults. Transient faults cause 80 to 90% of faults in most systems. 5 Transient faults, or intermittents, can be defined as random failures that prevent the proper operation of a unit for only a short period of time—not long enough to be tested and diagnosed as a permanent failure. Often, transient faults become permanent with further deterioration of the equipment. Then, permanent fault-tolerant techniques must be used for system recovery.
The goal of system reliability or of fault-tolerant computing therefore is to either prevent or be able to recover from faults and continue correct system operation. This also includes immunity to software faults induced into the system. To achieve a high reliability, it is essential that component reliability be as high as possible. “As the complexity of computer systems increase, almost any level of guaranteed reliability of individual elements becomes insufficient to provide a satisfactory probability of successful task completion.” 6 Therefore, successful fault-tolerant computers must use a judicious selection of protective redundancy to help meet the reliability requirements. The three redundancy techniques are as follows: 1. Hardware redundancy 2. Software redundancy 3. Time redundancy These three techniques cover all methods of fault tolerance. Hardware redundancy can be defined as any circuitry in the system that is not necessary for normal computer operation should no faults occur. Software redundancy , similarly, is additional program instructions present solely to handle faults. Any retrial of instructions is known as time redundancy . Hardware Redundancy Hardware redundancy can be described as the set of all hardware components that need to be introduced into the system to provide fault tolerance with respect to operational faults. 1 These components would be superfluous should no faults occur, and their removal would not diminish the computing power of the system in the absence of faults. In achieving hardware fault tolerance, it is clear that one should use the most reliable components available. 7 However, increasing component reliability has only a small impact on increasing system reliability. Therefore, it is “more important to be able to recover from failures than to prevent them.” 8 Redundant techniques allow recovery and are thus very important in achieving fault-tolerant systems. The techniques used in achieving hardware redundancy can be divided into two categories: static (or masking) redundancy and dynamic redundancy.
Static techniques are effective in handling both transient and permanent failures. Masking is virtually instantaneous and automatic. It can be defined as any computer error correction method that is transparent to the user and often to the software. Redundant components serve to mask the effect of hardware failures of other components. Many different techniques of static redundancy can be applied. The simplest or lowest level of complexity is by a massive replication of the individual components of the system. 1 For example, four diodes connected as two parallel pairs that are themselves connected in series will not fail if any one diode fails “open” or “short.” Logical gates in similar quadded arrangements 9,10 can also guard against single faults, and even some multiple faults, for largely replicated systems. More sophisticated systems use replication at higher levels of complexity to mask failures. Instead of using a mere massive replication of components configured in fault-tolerant arrangements, identical nonredundant computer sections or modules can be replicated and their outputs voted upon. Examples are triple modular redundancy (TMR) and more massive modular redundancy (NMR), where N can stand for any odd number of modules. In addition to component replication, coding can be used to mask faults as well as to detect them. With the use of some codes, data that has been garbled (i.e., bits changed due to hardware errors) can sometimes be recovered instantaneously with the use of redundant hardware. Dynamic recovery methods are, however, better able to handle many of these faults. Higher levels of fault tolerance can be achieved more easily through dynamic redundancy and implemented through the dual actions of fault detection and recovery. This often requires software help in conjunction with hardware redundancy. Many of these methods are extensions of static techniques. Massive redundancy in components can often be better utilized when controlled dynamically. Redundant modules, or spares, can have a better fault tolerance when they are left unpowered until needed, since they will not degrade while awaiting use. This technique, standby redundancy, 11 often uses dynamic voting techniques to achieve a high degree of fault tolerance. This union of the two methods is referred to as hybrid redundancy . 12 Additional hardware is needed for the detection and switching out of faulty modules and the switching in of good spares within the system by this technique. Error detecting and error correcting codes 13 can be used to dynamically achieve fault tolerance in a computing system. Coding refers to the addition of extra bits to and the rearranging of the bits of a binary word that contains information. The strategy of coding is to add a minimum number of check bits, the additional bits, to the message in such a way that a given degree of error detection or correction is achieved. 4 Error detection and correction is accomplished by comparing the new word, hopefully unchanged after transmission, storage, or processing, with a set of allowable configurations of bits. Discrepancies discovered in this manner signal the
existence of a fault, which sometimes be corrected if enough of the original information remains intact. This means that the original binary word can be reconstructed with some such codes if a set number of bits in the coded word have not changed. Encoding and decoding words with the use of redundant hardware can be very effective in detecting errors. Through hardware or software algorithms, incorrect data can also often be reconstructed. Otherwise, the detected errors can be handled by module replacement and software recovery actions. The actions taken depend on the extent of the fault and of the recovery mechanisms available to the computing system. Software Redundancy Software redundancy refers to all additional software installed in a system that would not be needed for a fault-free computer. Software redundancy plays a major role in most faulttolerant computers. Even computers that recover from failures mainly by hardware means use software to control their recovery and decision-making processes. The level of software used depends on the recovery system design. The recovery design depends on the type of error or malfunction that is expected. Different schemes have been found to be more appropriate for the handling of different errors. Some can be accomplished most efficiently solely by hardware means. Others need only software, but most use a mixture of the two. For a functional system, i.e., one without hardware design faults, errors can be classified into two varieties: (1) software design errors and (2) hardware malfunctions. The first category can be corrected mainly by means of software. It is extremely difficult for hardware to be designed to correct for programmers’ errors. The software methods, though, are often used to correct hardware faults—especially transient ones. The reduction and correction of software design errors can be accomplished through the techniques outlined below. Computers may be designed to detect several software errors. 14,15 Examples include the use of illegal instructions (i.e., instructions that do not exist), the use of privileged instructions when the system has not been authorized to process them, and address violations. This latter refers to reading or writing into locations beyond usable memory. These limits can often be set physically on the hardware. Computers capable of detecting these errors allow the programmer to handle the errors by causing interrupts. The interrupts route the program to specific locations in memory. The programmer, knowing these locations, can then add his own code to branch to his specific subroutines, which can handle each error in a specified manner. Software recovery from software errors can be accomplished via several methods. As mentioned before, parallel programming, in which alternative methods are used to determine a correct solution, can be used when an incorrect solution can be identified. Some less sophisticated systems print out diagnostics so that the user can correct the program off line
http://abzardaghigh.ir/duh/doc_download/143-redundant-and-voting-systems.html.
|
پی ال سی زیمنس به زب...
جهت دانلود همه مقالات سایت ابتدا لاگی...
پی ال سی زیمنس به زب...
re: plc زيمنس بزبان فارسي - سرکا...
انواع روشهای اندازه ...
یک سوال - با سلام : مخواستم بدونم فر...
جزوه آموزشي plc وزار...
جزوه آموزشي plc وزارت كار - درود ب...
پی ال سی زیمنس به زب...
plc زيمنس بزبان فارسي - درود بر ...