A Byte of Blockchain : Week 27 - Byzantine General's Problem

Recap

Last week we discussed how decentralized ledgers are synchronized across nodes forming part of the blockchain network. We used the current world example of servers - production & backup servers which need to be in sync at all times to ensure seamless processing of customer requests to the server.

In a similar fashion, new nodes update their "blocks" by synchronizing with nodes having the most number of blocks in the network & downloading the "missing" blocks so that all the nodes reflect the same updated copy of the ledger.

We know that Blockchain is a decentralized network of nodes scattered across the globe unknown to each other & engaged in activities for managing the network depending on their type (Miner nodes, Full Nodes, SPV Nodes etc). These nodes do not trust each other yet they have to "agree" on everything regarding the functioning of the network through consensus protocols which we discussed in Weeks 9 & 10.

As part of our exploring the intricacies of managing consensus in a blockchain, let us explore the Byzantine General's Problem

Byzantine General's Problem

What is the Byzantine General's Problem?

The Byzantine General's Problem is used as an analogy to describe the challenge of establishing consensus in a decentralized network where some nodes can be dishonest or nodes are connected using an insecure network.

The research paper titled "The Byzantine General's problem" was released in 1982 by Leslie Lamport, Robert Shostak & Marshall Pease.

(Source : wikipedia)

Byzantine GeneralProblem.jpg

Imagine the Byzantine army trying to attack a completely encircled city.
The army is divided into divisions with each division headed by a General and the divisions are at different locations around the city.

To win the war, ALL the generals have to attack the city at the same time to capture the city or else they will fail in their attempt. This means they have to reach a CONSENSUS on attacking the city.

But how will they reach consensus in such a scenario?

To add to the complication,

Some of the generals may be corrupt or not trustworthy.
As the generals are at different locations, they must co-ordinate & communicate with each other for the attack. It is assumed that the generals use messengers who unfortunately have to travel through the encircled city for transmitting the message to the other generals.
How would such message co-ordination & communication take place?

But what does the above have to do with a Blockchain ??

Keeping the above in mind, let us ask ourselves a few questions..

How do the decentralized nodes all distributed across the globe agree on which copy of the ledger is accurate or
How do nodes agree on anything at all pertaining to transactions or blocks - e.g., balances or confirming source of funds?
What if some nodes exhibit failed behavior like, going offline or sending conflicting information to other nodes in the network?
The nodes are connected through the internet and includes insecure connection. How can the nodes trust each other to carry out their responsibilities?

In a Blockchain, nodes which exhibit failed behavior like, going offline or sending conflicting or malicious information to other nodes in the blockchain network is called a Byzantine Node.

Solution to Byzantine General's Problem

So to summarize, the problems are :

Messages broadcast across nodes in an insecure network
Nodes turning "Byzantine" due to malicious intent or unintentional due to system or network issues which in turn destabilizes the Blockchain network

So, what is the solution for the above issues?

Hashing &
Proof of Work

We discussed Hashing in Weeks 12 & 13 and Proof of Work in Week 23. However, let us do a quick recap

1. Hashing

Hashing is the process of having an input of any length, running it through a hashing algorithm and getting an output of fixed length. This means the hash output generated is independent of the length of the input.

2. Proof of Work

A mining node can add blocks to a blockchain only after solving hard cryptographic puzzles. And these nodes compete with each other to add blocks to the blockchain. The puzzle is so difficult to solve that there are specialized hardware only to solve these puzzles. The cost factor to mine a block is the processing power and electricity expended for solving this problem.

The solution to the cryptographic problem is called PROOF OF WORK.

So, how does Hashing & Proof of Work resolve the Byzantine General's Problem?

We know that if the messages are tampered with by capturing the messenger, the generals will act or not act according to the tampered message resulting in a loss.

So, to prevent tampering, General 1 will hash the message with their private keys and the hashed message can be decoded using the public key which the other generals have access to thus ensuring everyone received the original message.

Now, let us take this one step further. What if the enemy city captured the messenger and takes access to the public key to decode the hashed message? So, the generals need to take steps to additionally secure the message. So, they will add a nonce to the message (Please refer week 23 for more details on Nonce). The Nonce has to be iterated till the target hash message is computed.

Now, (The Message + The Nonce) will generate the hash target (Proof of Work Solution). This takes more time and processing power. Hence, it is not possible for malicious or Byzantine nodes to tamper with messages or transactions as the cost of doing so would be prohibitively high. The nodes can validate easily that adequate efforts have been put in to generate proof of work for the block.

Both the above make the network Byzantine Fault Tolerant (BFT). It is a feature of reaching an agreement or consensus regarding transactions & blocks based on Proof-of-work, despite some nodes not responding or responding with malicious messages. The main objective of BFT is to safeguard the Blockchain network even in the presence of faulty nodes.

This is how nodes who are scattered across the globe who do not know or trust each other ensure verification and validation of transactions & blocks. The cost of malicious actions are prohibitively high discouraging byzantine node behavior & encouraging behavior as per consensus & other network protocols.

As a side note, last week we discussed about server synchronization for application & data storage management. The Byzantine General's problem is widely used in such distributed data storage solutions & data centers to maintain data & application consistency across servers.

And we still have one pending question from last week..

When a new transaction or block is to be validated, do all the nodes "see" them at the same time?