Home / Tech Talks / Understanding Erasure Coding – Chainsafe at Sub0 Asia 2024

Understanding Erasure Coding – Chainsafe at Sub0 Asia 2024

Explore erasure coding with Axay Sagathiya. Enhance data durability and efficiency in Polkadot's blockchain. Join ChainSafe.

Updated:June 12, 2024Tags:

In a recent tech talk, Axay Sagathiya, a software engineer at ChainSafe, delved into the intricacies of erasure coding, a pivotal technology in data protection and redundancy. Here’s an overview of his insightful presentation.

What is Erasure Coding?

Erasure coding is a data protection method that ensures the durability and availability of crucial data without the excessive storage overhead associated with traditional data replication methods. Instead of creating multiple identical copies of data, erasure coding divides data into fragments and adds redundant information, enabling data reconstruction even if some fragments are lost.

How Erasure Coding Works

Imagine having crucial data that you absolutely cannot afford to lose. Traditionally, you might create identical copies of this data and store them in various locations, which, while safe, is highly redundant and requires significant memory space. Erasure coding addresses this by dividing the data into fragments, adding redundancy, and distributing these fragments across multiple devices or nodes.

For example, if you have important data and six storage devices, erasure coding can create six fragments of this data, where four are original data fragments and two are redundant fragments (parity). Even if you lose some fragments, you can still reconstruct the original data using the remaining fragments.

Advantages of Erasure Coding

Improved Durability: Data can be reconstructed anytime from the available fragments.
Enhanced Availability: Fragments can be accessed simultaneously, ensuring data availability.
Reduced Memory Space: No need to store entire redundant copies, thus saving storage space.

Trade-offs of Erasure Coding

While erasure coding offers significant benefits, it also comes with trade-offs:

Computational Overhead: Encoding and decoding data require additional computational resources, potentially impacting system performance.
Slower Reconstruction: Reconstructing data can be slower due to the computational demands of encoding and decoding.

Application of Erasure Coding in Polkadot

In the Polkadot ecosystem, erasure coding is used to ensure data availability and integrity. Polkadot’s architecture involves validators and collators working together to maintain the network. Here’s how erasure coding fits in:

Parachain Validators: Validators assigned to specific parachains (parallel chains) generate erasure-coded fragments of parachain blocks and their proofs of validity.
Distribution of Chunks: These fragments are distributed to corresponding validators on the relay chain.
Reconstruction: Even if some fragments are lost, the original data can be reconstructed using the remaining fragments.

Polkadot uses its implementation of erasure coding based on a polynomial approach, optimizing encoding and decoding processes.

Formula for Reconstruction

Polkadot’s implementation uses a specific formula to determine the number of chunks required to reconstruct the original data. For instance, with six validators, only two chunks are needed to regenerate the data.

Conclusion

Erasure coding is a game-changer in the realm of data protection and redundancy. It offers a more efficient way to ensure data durability and availability while reducing storage overhead. As demonstrated in Polkadot, it is a vital component in maintaining a robust and scalable network.

For those interested in the technical details and implementation of erasure coding in Polkadot, Axay recommended checking out the research papers and resources provided in his presentation.