جهت استفاده از محتویات سایت :1:عضو شوید .2: لاگین کنید .جهت درج آگهی وبخش آگهی .اینجا کلیک کنیدصفحه اصلی

 
 
 
 
Redundant and Voting Systems
مقالات - مقالات مربوط به اصول کلی ابزاردقیق

Reliability in process control computing can be defined as
the correct operation of a system up to a time
t
=
T,
given
that it was operating correctly at the starting time
t
=
0.
1
However, correct operation can have many meanings,
depending on the requirements previously established for the
system. A common attitude today is that single or multiple
failures can be accepted as long as the system does not go
down or the desired operation is not interrupted or disturbed.
Reliability is therefore a goal to be expected of a system and
is set by the users.
To obtain a certain measure of reliability, the term
faulttolerant
computing
can be used. It may be defined as “the
ability to execute specified algorithms correctly regardless of
hardware errors and program errors.”
2
Since different computers
in different applications have widely different requirements
for reliability, availability, recovery time, data protection,
and maintainability, an opportunity exists for the use of
many different fault-tolerant techniques.
3
The understanding of fault tolerance can be helped by
first understanding faults. A fault can be defined as “the
deviation of one or more logic variables in the computer
hardware from their design-specified values.”
1
A logic value
for a digital computer is either a zero or a one. A fault is the
appearance of an incorrect value such as a logic gate “stuck
on zero” or “stuck on one.” The fault causes an “error” if it,
in turn, produces an incorrect operation of the previously
correctly functioning logic elements. Therefore, the term
fault
is restricted to the actual hardware that fails.
Faults can be classified in several ways. Their most
important characteristic is a function of their duration. They
can be either permanent (solid or “hard”) or transient (intermittent
or “soft”). Permanent faults are caused by solid failures
of components.
4
They are easier to diagnose but usually
require the use of more drastic correction techniques than do
transient faults. Transient faults cause 80 to 90% of faults in
most systems.
5
Transient faults, or intermittents, can be
defined as random failures that prevent the proper operation
of a unit for only a short period of time—not long enough
to be tested and diagnosed as a permanent failure. Often,
transient faults become permanent with further deterioration
of the equipment. Then, permanent fault-tolerant techniques
must be used for system recovery.

The goal of system reliability or of fault-tolerant computing
therefore is to either prevent or be able to recover from
faults and continue correct system operation. This also
includes immunity to software faults induced into the system.
To achieve a high reliability, it is essential that component
reliability be as high as possible. “As the complexity of computer
systems increase, almost any level of guaranteed reliability
of individual elements becomes insufficient to provide
a satisfactory probability of successful task completion.”
6
Therefore, successful fault-tolerant computers must use a
judicious selection of protective redundancy to help meet the
reliability requirements. The three redundancy techniques are
as follows:
1. Hardware redundancy
2. Software redundancy
3. Time redundancy
These three techniques cover all methods of fault tolerance.
Hardware redundancy
can be defined as any circuitry in the
system that is not necessary for normal computer operation
should no faults occur.
Software redundancy
, similarly, is additional
program instructions present solely to handle faults. Any
retrial of instructions is known as
time redundancy
.
Hardware Redundancy
Hardware redundancy can be described as the set of all hardware
components that need to be introduced into the system
to provide fault tolerance with respect to operational faults.
1
These components would be superfluous should no faults
occur, and their removal would not diminish the computing
power of the system in the absence of faults.
In achieving hardware fault tolerance, it is clear that one
should use the most reliable components available.
7
However,
increasing component reliability has only a small impact on
increasing system reliability. Therefore, it is “more important
to be able to recover from failures than to prevent them.”
8
Redundant techniques allow recovery and are thus very
important in achieving fault-tolerant systems. The techniques
used in achieving hardware redundancy can be divided into
two categories: static (or masking) redundancy and dynamic
redundancy.

Static techniques are effective in handling both transient
and permanent failures. Masking is virtually instantaneous
and automatic. It can be defined as any computer error correction
method that is transparent to the user and often to the
software. Redundant components serve to mask the effect of
hardware failures of other components.
Many different techniques of static redundancy can be
applied. The simplest or lowest level of complexity is by a
massive replication of the individual components of the system.
1
For example, four diodes connected as two parallel
pairs that are themselves connected in series will not fail if
any one diode fails “open” or “short.” Logical gates in similar
quadded arrangements
9,10
can also guard against single faults,
and even some multiple faults, for largely replicated systems.
More sophisticated systems use replication at higher levels
of complexity to mask failures. Instead of using a mere
massive replication of components configured in fault-tolerant
arrangements, identical nonredundant computer sections or
modules can be replicated and their outputs voted upon.
Examples are triple modular redundancy (TMR) and more
massive modular redundancy (NMR), where N can stand for
any odd number of modules.
In addition to component replication, coding can be used
to mask faults as well as to detect them. With the use of some
codes, data that has been garbled (i.e., bits changed due to
hardware errors) can sometimes be recovered instantaneously
with the use of redundant hardware. Dynamic recovery methods
are, however, better able to handle many of these faults.
Higher levels of fault tolerance can be achieved more
easily through dynamic redundancy and implemented
through the dual actions of fault detection and recovery. This
often requires software help in conjunction with hardware
redundancy. Many of these methods are extensions of static
techniques.
Massive redundancy in components can often be better
utilized when controlled dynamically. Redundant modules, or
spares, can have a better fault tolerance when they are left
unpowered until needed, since they will not degrade while
awaiting use. This technique, standby redundancy,
11
often uses
dynamic voting techniques to achieve a high degree of fault
tolerance. This union of the two methods is referred to as
hybrid redundancy
.
12
Additional hardware is needed for the
detection and switching out of faulty modules and the switching
in of good spares within the system by this technique.
Error detecting and error correcting codes
13
can be used
to dynamically achieve fault tolerance in a computing system.
Coding refers to the addition of extra bits to and the rearranging
of the bits of a binary word that contains information.
The strategy of coding is to add a minimum number of check
bits, the additional bits, to the message in such a way that a
given degree of error detection or correction is achieved.
4
Error detection and correction is accomplished by comparing
the new word, hopefully unchanged after transmission, storage,
or processing, with a set of allowable configurations of
bits. Discrepancies discovered in this manner signal the

existence of a fault, which sometimes be corrected if enough
of the original information remains intact.
This means that the original binary word can be reconstructed
with some such codes if a set number of bits in the
coded word have not changed. Encoding and decoding words
with the use of redundant hardware can be very effective in
detecting errors. Through hardware or software algorithms,
incorrect data can also often be reconstructed. Otherwise, the
detected errors can be handled by module replacement and
software recovery actions. The actions taken depend on the
extent of the fault and of the recovery mechanisms available
to the computing system.
Software Redundancy
Software redundancy refers to all additional software installed
in a system that would not be needed for a fault-free computer.
Software redundancy plays a major role in most faulttolerant
computers. Even computers that recover from failures
mainly by hardware means use software to control their
recovery and decision-making processes. The level of software
used depends on the recovery system design. The recovery
design depends on the type of error or malfunction that
is expected. Different schemes have been found to be more
appropriate for the handling of different errors. Some can be
accomplished most efficiently solely by hardware means.
Others need only software, but most use a mixture of the two.
For a functional system, i.e., one without hardware design
faults, errors can be classified into two varieties: (1) software
design errors and (2) hardware malfunctions.
The first category can be corrected mainly by means of
software. It is extremely difficult for hardware to be designed
to correct for programmers’ errors. The software methods,
though, are often used to correct hardware faults—especially
transient ones. The reduction and correction of software design
errors can be accomplished through the techniques outlined
below.
Computers may be designed to detect several software
errors.
14,15
Examples include the use of illegal instructions
(i.e., instructions that do not exist), the use of privileged
instructions when the system has not been authorized to
process them, and address violations. This latter refers to
reading or writing into locations beyond usable memory.
These limits can often be set physically on the hardware.
Computers capable of detecting these errors allow the programmer
to handle the errors by causing interrupts. The interrupts
route the program to specific locations in memory. The
programmer, knowing these locations, can then add his own
code to branch to his specific subroutines, which can handle
each error in a specified manner.
Software recovery from software errors can be accomplished
via several methods. As mentioned before, parallel
programming, in which alternative methods are used to determine
a correct solution, can be used when an incorrect solution
can be identified. Some less sophisticated systems print out
diagnostics so that the user can correct the program off line

http://abzardaghigh.ir/duh/doc_download/143-redundant-and-voting-systems.html.

نظر ها
جستجو RSS
تنها کاربران عضو شده می توانند نظر ارسال کنند!

!joomlacomment 4.0 Copyright (C) 2009 Compojoom.com . All rights reserved."

 
 
 

وضعيت سايت

اعضا : 4258
محتوا : 662
لینك وب ها : 60
بازدیدکنندگان : 294606
297160
امروز322
ديروز481
اين هفته803
اين ماه2702
ما 20 مهمان آنلاین داریم