#distributed-systems

All Messages

Paridhi Agarwal June 25, 2020 at 05:38 PM

3 b v)

Paridhi Agarwal June 25, 2020 at 03:28 PM

3 b iv) Phases In The Fault Tolerance • Implementation of a fault tolerance technique depends on the design , configuration and application of a distributed system. • In general designers have suggested some general principles which have been followed. 1)Fault Detection 2)Fault Diagnosis 3)Evidence Generation 4)Assessment 5)Recovery

1)Fault Detection • Constantly monitoring the performance and comparing it with expected outcome. •Fault is reported if there is a deviation from expected outcome.
2)Fault Diagnosis •Done to understand the nature of the fault and possible root cause.
3) Evidence Generation •Report generated based on the outcome of the fault diagnosis.
4) Assessment • Understanding the extent of the damage caused by the faulty component. •Done by examining the flow of information that has passed out from the faulty component to the rest of the system. •A virtual Boundary is created.
5) Recovery Making the system fault free and restoring it to a consistent state- Forward recovery and Backward recovery.

Paridhi Agarwal June 25, 2020 at 03:13 PM

3 b iii contd

Paridhi Agarwal June 25, 2020 at 03:12 PM

3 b iii) Fault tolerance refers to the ability of a system (computer, network, cloud cluster, etc.) to continue operating without interruption when one or more of its components fail.
The objective of creating a fault-tolerant system is to prevent disruptions arising from a single point of failure, ensuring the high availability and business continuity of mission-critical applications or systems.
Fault-tolerant systems use backup components that automatically take the place of failed components, ensuring no loss of service. These include:
• Hardware systems that are backed up by identical or equivalent systems. For example, a server can be made fault tolerant by using an identical server running in parallel, with all operations mirrored to the backup server.
• Software systems that are backed up by other software instances. For example, a database with customer information can be continuously replicated to another machine. If the primary database goes down, operations can be automatically redirected to the second database.
• Power sources that are made fault tolerant using alternative sources. For example, many organizations have power generators that can take over in case main line electricity fails.
In similar fashion, any system or component which is a single point of failure can be made fault tolerant using redundancy.
Fault tolerance can play a role in a disaster recovery strategy. For example, fault-tolerant systems with backup components in the cloud can restore mission-critical systems quickly, even if a natural or human-induced disaster destroys on-premise IT infrastructure.

Paridhi Agarwal June 25, 2020 at 03:07 PM

3b i)Atomic multicast is a communication building block of scalable and highly available applications. With atomic multicast, messages can be ordered and reliably propagated to one or more groups of server processes. Because each message can be multicast to a different set of destinations, distributed message ordering is challenging. Some atomic multicast protocols address this challenge by ordering all messages using a fixed group of processes, regardless of the destination of the messages. To be efficient, however, an atomic multicast protocol must be genuine: only the message sender and destination groups should communicate to order a message.

To see why atomicity is so important, consider a replicated database constructed as an application on top of a distributed system. The distributed system offers reliable multicasting facilities. In particular, it allows the construction of process groups to which messages can be reliably sent. The replicated database is therefore constructed as a group of processes, one process for each replica. Update operations are always multicast to all replicas and subsequently performed locally. In other words, we assume that an active-replication protocol is used.

Paridhi Agarwal June 25, 2020 at 02:50 PM

2b iii)

Paridhi Agarwal June 25, 2020 at 02:43 PM

2b i)Names play a very important role in all computer systems. They are used to share resources, to uniquely identify entities, to refer to locations, and more. An important issue with naming is that a name can be resolved to the entity it refers to. Name resolution thus allows a process to access the named entity. To resolve names, it is necessary to implement a naming system. The difference between naming in distributed systems and nondistributed systems lies in the way naming systems are implemented.
 
In a distributed system, the implementation of a naming system is itself often distributed across multiple machines. How this distribution is done plays a key role in the efficiency and scalability of the naming system.

Tip: Use quotes for exact phrases, "from:username" to filter search results by user