Episode 359

Celestia – The Layer-1 Data Availability and Consensus Solution

NB: Since the recording of this podcast LazyLedger changed their name to Celestia.

Celestia is a scalable general-purpose data availability layer for decentralized apps and trust-minimized sidechains. It is a minimal, viable blockchain which does time stamping and block ordering.

Think back to Bitcoin in the early days, before Ethereum. Layer-2 systems were being built on top of Bitcoin and were leveraging Bitcoin’s consensus layer. This is what Celestia is doing, although it is purpose built and scalable for the exact use case. The implementation details are a lot more complex, and the vision is to create a modular pluggable Layer-1 that does nothing but consensus and data availability. It is designed for people who want to create their blockchain without consensus.

The project is yet to be launched, however we had Ismail Khoffi, Co-Founder and CTO, and Mustafa Al-Bassam, Co-Founder and CEO on the show to give us a deep technical overview and vision of Celestia.

Topics discussed in the episode

  • Ismail and Mustafa’s backgrounds, and how they got into the crypto space
  • How Ismail and Mustafa met and created Celestia
  • The Data Availability paper which was co-written with Vitalik and introduction to Celestia
  • The purpose and function of Celestia
  • How data availability works and the advantages and disadvantages to their ledger
  • The purpose of Celestia
  • How transaction fees work
  • The interoperability aspect
  • Honest validators on the block
  • Non-interactive proofs on Celestia
  • The interaction between Celestia and Cosmos SDK
  • How will Celestia compete with other Layer One’s, in particular Filecoin
  • When the ledger will be launched

Brian: Hi. We’re here today with Ismail Khoffi and Mustafa Al Bassam the cofounders of LazyLedger. LazyLedger is this very innovative, new kind of blockchain layer two protocol that we’re going to dive into today. Thanks so much for joining us today.

Mustafa: Thank you for having us.

Ismail: Thanks a lot. Very excited to be here.

Brian: Absolutely. Mustafa, you have a very interesting background. I was watching this talk of you before at some hacker conference where you talked about the work you did in Anonymous and this project called Lulz. Do you mind going into a little bit, what’s your history there and how did that lead you to crypto?

Mustafa: Sure. That was actually a very long time ago when I was a teenager, I was involved in various hacker groups, including anonymous and I co-founded a hacker group called Dasic, which compromised many corporations and governmental entities. This was when I was about 15, 16 years old. In terms of how it relates to crypto not much, but that was a very interesting time. I’ve moved on to other things.

Brian: I guess there’s at least some similarity in they’re both potentially disruptive activities going against the status quo. You see some similarity there between the two fields.

Mustafa: Yeah. I mean, in that sense, it’s quite similar to crypto, in the sense that there’s similar political ideals, and philosophy as the hacker movement and the ideals of the crypto cypherpunk movement are all interlinked with each other. I guess the ideals that drove my hacktivism as a teenager, also drives the same ideals that makes me very interested in cryptography, which is to give people more freedom. Back then it was motivated by freedom of speech and financial transactions, a form of speech and cryptocurrency allows people to transact money freely.

Sunny: How did you first get involved with the crypto space? What was your first foray into the field?

Mustafa: I first heard about Bitcoin in 2010. Even before I heard about Bitcoin, I was always very interested in peer to peer systems in general. I was very interested in BitTorrent, for example, and I was very closely following a website called The Pirate Bay, which still is the biggest torrent tracker and where people were uploading copyrighted movies and software. That idea of creating decentralized protocols was very interesting to me. When I heard about Bitcoin, that was actually very interesting to me.

Even though I heard about it in 2010, I only really got involved in a full time capacity in 2016, when I started doing a PhD in London at university college London, focusing on the topic of on chain scalability. I was specifically very interested in that topic because I was very closely following the Bitcoin community from 2010 onwards. I was always thinking about the one megabyte block size limit Bitcoin. Even before there was a massive debate on the Bitcoin community, when the Bitcoin the block size started skyrocketing. There’s a whole huge debate about how we should scale Bitcoin and whether on-chain scalability is even possible to do securely.

Even before that, I was actually very worried about this block size limit and prominent Bitcoin community members like Greg Maxwell told me that this isn’t a problem and we shouldn’t worry about this. I started doing research on on-chain scalability to figure out how we can scale blockchain securely in a decentralized way, on layer one. As part of my PhD, I was a coauthor on the paper called Chain Space, which was one of the first proposals for a sharding blockchain design. That was the spin out into a company based in Gibraltar, but developers were based in London, and that company was later acquired by Facebook. Now many of the people involved are working on Facebook’s Libra project.

Sunny: Ismail. How about you? I know you have a lot of background as well in distributed systems and peer-to-peer. How did you get involved with this space?

Ismail: I think the timeframe is actually very similar to Mustafa’s. I was interested in distributed systems and decentralized systems for a while. After I finished my studies and I don’t remember actually exactly when it was, but around 2015, I was really interested and tried to get more involved into it. I was working at a research Institute at Fraunhofer and I was proposing to do something in that direction. I was believing that this will get bigger and more relevant in the future, but there was no space for that in that job. Basically I was looking around for something where I can dive in more deeply into research, but also work a bit on actually implementing real world systems.

I found this exactly at (DEDIS) distributed and decentralized systems lab of Bryan Ford, where one of the goals was also to scale Bitcoin. Bryan does basically everything. That was a very chaotic and very interesting year where I learned a lot when I was at EPFL. I co authored a bunch of papers there as well. One about scaling Bitcoin as well as called Bizcoin. Also we did a bunch of work in privacy preserving technology and all this. This is where I learned so much in very little time, because I had to implement a bunch of these systems. In a team of three engineers, which is quite unique, I think, at university or in academia.

I briefly started a PhD as well and did a detour in a PhD internship at Google where I worked on something like Conics. Google has this project called key transparency, which is very similar to Conics. They use a lock to put design. They don’t just chain it, but they put the Merkel tree roots in the log. After that, I decided there’s so much happening and there’s so much going on, it’s probably not the best idea to stay in academia. Also that time I met with Zachid in London and he was already, he was already kinda hinting that there might be hiring at Tendermint. Almost a year later, actually, I started at Tendermint as well. There, I got involved, knee deep into the implementation side of layer one blockchains.

Sunny: So how did you guys end up meeting and starting to work on LazyLedger?

Ismail: Good question. I knew Mustafa already for quite a while from Twitter and online. I’ve been going to these hacker conferences, the Chaos Communication Congress, where he’s also been always present. I think we met and talked more during my time at EPFL I guess. We met at academic conferences and he approached me about LazyLedger at the camp. Yeah, he had, there’s this research paper and I read it and I felt like I didn’t fully understand it, when I first read it and I had a bunch of questions, but I immediately saw the potential there and yeah. Then we started working. We started working on that shortly after.

Mustafa: I asked Ismail to join LazyLedger in the middle of a field about 50 miles North of Belden in a bunch of tents.

Ismail: Like a little hacker village.

Sunny: LazyLedger, a lot of it’s derived from the data availability paper that you co authored from a couple of months ago. What was that paper, did you have in mind that, I want create to LazyLedger project and that paper was sort of the white paper for it, or did the paper come first, and then after that you were, Hey, how do we productionize this?

Mustafa: You’re referring to the paper that I wrote with Vitalik. To give some context first about that paper. That was a paper about something called fraud and data availability proofs, and it was basically solving a fundamental problem in sharding was kind of the missing piece of the puzzle or maybe the biggest missing piece of the puzzle to make sharding complete, which is: if you increase on chain throughput, if you increase the number of transactions that people can post on the chain, and regardless of how you do that, whether it be sharding or increasing the block size limit, you also need a way for people to verify the entire chain efficiently.

The question is how can you do that without requiring everyone to download every single transaction in every single shot to make sure every single transaction is valid. That’s very important to scale because the whole point of blockchains is that they’re decentralized. The reason why that is actualized is because anyone can publicly verify that the blockchain is correct and the transactions are valid, but you can’t do that cheaply if there’s a lot of transactions to verify. That’s why the Bitcoin community has been reluctant to increase the block size limit to more than one about one megabyte, because it would, it wouldn’t make it possible anymore for people to verify the blocks using raspberry pies basically.

There’s this idea of fraud proofs, and the idea of fraud proofs is that instead of verifying, instead of verifying every transaction yourself, what you can do is you can assume that the blocks are valid or specifically the blocks that have consensus about it. If they’re not then someone or any person or any node on a network can generate a very succinct and small proof called a fraud proof, more specifically, what’s called a state transition fraud proof, to prove to you that block is invalid and you can reject that block. Therefore you can indirectly verify the whole chain with very little resources.

To be able to do that, another prerequisite called data availability proofs, because you can only generate a fraud proof for a block, if the data for that block has been actually published by the miner or the block producer, because what the miner might do is simply publish and the header the block header, but not actually published the actual data in that block. What data availability proofs allow to do is efficiently convince yourself that the block has actually been published, without downloading the book yourself.

LazyLedger is using this data availability proof primitive to make it very efficient for people to prove to themselves that the data has been published in blocks. However, the idea of LazyLedger itself actually came about long before this fraud proof paper. The idea of LazyLedger, when I was starting my PhD. I was thinking about what are some alternative design paradigms that we could use to build layer ones, but more specifically, what is the most minimal layer one you can build, the most modular, basic, layer one you can use to build a cryptocurrency on or to allow for cryptocurrencies to exist, and how can we scale that? And so this data availability primitive made that much more practical and scalable.

Brian: Cool, thanks. That was a great explanation, and a great intro to LazyLedger. Before we go into details from LazyLedger. I’m curious what you want the impact of LazyLedger to be?

Mustafa: As I mentioned, you can think of LazyLedger as a very basic layer one. I would call it a minimum viable layer one. If you stripped back layer one to its core components and made it as scalable as possible using existing technology, we would get LazyLedger. In simple terms is basically a blockchain where people can dump arbitrary data onto it. That data gets ordered and the consensus doesn’t care or process what that data is. It’s basically a blockchain for dumping data on, and the data gets ordered and you can use this primitive to build all kinds of applications and blockchains.

What this is really useful for, or what this has impact for. I guess to expand the overall vision is as follows. When Bitcoin came about the vision for Bitcoin was August the technical architecture of a Bitcoin was that it was using a blockchain for one application. That application is cryptocurrency, or payments, or store value, or depending on who you talk to. It was basically a single purpose blockchain. Then Ethereum came about. The idea of Ethereum was let’s actually create a general purpose blockchain, where you can for every application that has a general purpose virtual machine, where people can upload smart contracts.

Those architectural visions were very stock opposites to each other. At the same time, there was Tendermint, more similar to Bitcoin vision. The idea of Tendermint is that it allows you to create your own blockchain for your application. At the moment, if you wanted to create an application blockchain, there’s only really two ways to do that. The first way to do that is you use a shared execution environment or a shared computer, like Ethereum. You code up your smart contract, you upload it to Ethereum and your smart contract runs on this same machine or the same chain as everyone else does smart contract. That’s basically the world computer model that Ethereum created.

The problem with that is the Ethereum Virtual Machine is very limited in terms of what you can deploy on it. It has very high gas costs. If you want to build applications that the Ethereum virtual machine does natively support, such as complicated cryptography or cryptographic proofs that don’t have native built-in functionality in the EVM. If you want to do that, if you want to build more complicated applications that EVM doesn’t support, you have to basically build your own blockchain using something like Tendermint.

The drawback of Tendermint, or any solution like Tendermints is that, in order to declare your own blockchain, there’s a huge overhead for creating your own blockchain. There’s a lot of work involved in creating your own blockchain. You also have to create your own consensus layer and that consensus layer nowadays, it’s usually based on proof of stake, which is what Tendermint provides for you. To do a proof of stake network, it takes a lot of work. You have to create a token sale, make sure that it’s distributed decentralized. You have to make sure there’s lots of validators and so on and so forth.

So what LazyLedger comes in, it provides a modular pluggable layer, one that does nothing but consensus and data availability. LazyLedger layer one does not do smart contracts or execution. It does the core things that layer one does and nothing else. That makes it very useful as a pluggable layer one, for people that want to create their own chains, but don’t want to go to the overhead of having to create the consensus network. Instead, they can simply plug LazyLedger into their chain as a consensus layer.

Sunny: Could you maybe walk through a little bit of architecture, what that would look like? One way of thinking about it is you’re building the minimum viable blockchain in the sense that it’s basically just a timestamping system, right? you’re getting consensus on ordering of transactions, but nothing else about the transaction. This was actually the idea of a lot of the early Bitcoin extensions, things like colored coins, for example. Could you maybe compare, talk a little about how these compare to those?

Mustafa: Sure. So, yeah, so before Ethereum existed, there were various projects to extend Bitcoin to support many applications. The overall basis of those proposals was to basically embed data into Bitcoin transactions and use an operation code called opt return. When submitting a Bitcoin transaction, you can attach data to it that allows you to timestamp or order arbitrary data using the Bitcoin blockchain. That’s why projects like colored coin and Master coin were using Bitcoin as a data layer to dump data on. That allows them to basically create all of these other applications, like smart contracts, but more specific applications. but some people would say abusing the Bitcoin blockchain by dumping data that it was not designed to receive.

The difference with LazyLedger is that it’s designed for that specific purpose. It’s designed to be more scalable for that specific purpose in its architecture, and also in its primitive, including data ability proofs, which allows nodes to verify the entire ledger without having to download every block, which is what you would have to do in Bitcoin right now, if you were running a mastercoin node, you would need to download every single Bitcoin block, basically to make sure that the Mastercoin data is valid.

Sunny: Isn’t one of the nice features in a way you could think of a proof of work is like a consensus protocol that directly incentivizes data availability, because if you’re not making your data available, no other miner is going to build on top of your blocks. Maybe that’s a reason why Bitcoin wouldn’t need data availability proofs? If I had data and I wanted to make it available, I could put it on the Bitcoin blockchain and have with very high level of certainty that data will be available just because of the nature of the Bitcoin network and how widely distributed it is, and the fact that it’s proof of work incentivizes widespread distribution of the data. Why wouldn’t I want to just do that?

Mustafa: There’s several things to this. With Bitcoin that’s okay because Bitcoin is like a single execution environment, but the problem is if you wanted to create a general purpose data overbuilt layer, you can’t just rely on the consensus and the word of the consensus to tell you that the data is valid for two reasons. First of all, the threat model of Bitcoin and Ethereum are such that even if the consensus, even if there’s a 51% attack on the consensus that consensus cannot steal money or cannot insert malicious state into the chain, the only thing they can do is censor transactions or investment actions.

With solutions like optimistic roll-up side chains that use a data availability layer like basic ledger. If you configure the nodes in those systems to simply trust the consensus to tell you that data is available, then if the consensus is dishonest, then they could potentially lie to you and inject invalid state to the system that you will never know about. That significantly increases the incentive for doing a 51% attack because the financial reward for doing a 51% attack no longer becomes you just double spending on a transaction, but the reward becomes, potentially, injecting state that generates unlimited money or steals people’s money.

That’s the first reason. Second reason is that with our general purpose data availability layer, there may be many different applications or chains that don’t necessarily connect to each other, or are independent of each other, that are using the same data layer. You don’t want to end up in a situation where you have irrelevant data from other chains to verify the data for your chain? The data availability proofs allow you to verify that the data is available without downloading the data.

Sunny: So one of the advantages here would be, let’s say I wanted to store data on Bitcoin. I had this idea once of storing all Tendermint Cosmos Hub Validator set changes on the Bitcoin blockchain, as a long range attack prevention. The problem there is that finding one of these transactions on the Bitcoin blockchain is really hard. You have to scan through every transaction. One of the benefits of having something that’s specialized for data availability, it’s much easier for me to query for particular properties of transactions. Yeah?

Mustafa: Exactly, with LazyLedger, we expect the chain to be used by many different applications or side chains or execution environments that co-exist, but don’t necessarily communicate with each other. We need an efficient way to allow nodes in each of these applications to query the data relevant to the application, without having to worry about the data for other applications. The way we achieve this is we basically use something called a namespace merkle tree that basically allows you to query for specific messages in that tree for specific applications.

Each application has its own namespace identifier. You basically query for the namespace identifier that you’re interested in and you can very efficiently verify that the node that you’re talking to has given you all the relevant transactions for that application that are in that block.

Brian: You mentioned, in Ethereum, for example, you have this transaction ordering, but you also have transaction execution. You guys are getting rid of that part. I’m curious what are the downsides of this? what you give up?

Mustafa: Yeah. It’s a good question. Right? You can think of it basically as we’re pushing the execution to layer two, and in terms of what you give up, it depends on, as we’ve mentioned or to clarify, there’s no user execution environment on these ledger, therefore developers have to define their own execution environments using something like Cosmos or the Ethereum virtual machine, or one of the many optimistic rollup side chain of implementations out there.

Brian: Just on that. Developers have to define their own execution environment. Let’s say, using Cosmos as an example, can you explain what that would actually look like if somebody wanted to do something like that?

Mustafa: Sure. Effectively, as I mentioned, this ledger is basically a chain where people dump any data they want onto it. That data gets included in the blocks and the blocks get ordered and each application has its own namespace ID. If you were to build a Cosmos SDK based chain that you wanted to build a Cosmos SDK app, but you do not want to go with the overhead of having to actually create your own proof of stake network using Tendermint. What you would do is you create your own Cosmos SDK chain, and then you would post the blocks of the Cosmos SDK chain directly on LazyLedger. That would basically give you consensus for that Cosmos SDK chain, because the blocks immediately get ordered. It’s like the blocks are within the LazyLedger blocks. It’s like a sub block, If you like.

Brian: Where do you keep agreement about how to execute those blocks? Let’s say, now we have an upgrade. In Cosmos, we know this process okay. On chain governance and then hard fork and would not, what would that look like here?

Mustafa: That’s all defined on layer two. On LazyLedger, layer one has no concept of what transactions are valid or not. Instead all of those concepts are defined by the users or by the Cosmos SDK app itself. All the users that use the Cosmos SDK app will know which transactions are valid or not, because they have the code for your Cosmos SDK app. I said, cost for your Cosmos SDK app basically defines what’s the execution, and what transactions are valid or not.

Ismail: Maybe I can add to this, you can still have governance proposals and all this on the optimistic rollup. I mean, you could basically have something built completely with the Cosmos SDK, but, from an implementation point of view, where we are looking at it right now, you can use all the modules, but you replace on the optimistic rollup, you replace tendermint, which additionally provides consensus, which you don’t need in the optimistic rollup, you would replace it with another what’s called ABCI client. We would implement our own node that fulfilled the tendermint side of the ABCI contract. This would be used instead. Then you ideally could use all the SDK modules, for instance.

Sunny: Don’t you still need consensus on the state root? The LazyLedger side is giving you consensus on transaction ordering, but then that would mean that anyone who wants to interact with this chain now has to actually run all the software, but realistically, oftentimes clients just want to query something that’s currently in state. For that, you want some sort of consensus on state root.

Mustafa: Yeah, that’s right. I guess I think this is a good point. If you talk a little bit about optimistic roll-ups and I’m assuming that you have your listeners might not be familiar with them. This is basically our answer to consensus on state roots, if you like. Going back to the Cosmos SDK example, you might create your Cosmos app and you post the blocks for the Cosmos app onto LazyLedger. These blocks have block headers and these block headers contain state roots and these stateroots can be used by lite clients.

That answers your question partially, but then the question becomes, what if people post invalid blocks and the way that optimistic roll-ups work optimistic roll-ups is basically this side chain technology where you can create side chains that use layer one as a data availability layer. The whole concept is that it’s on chain data availability but off chain execution. The execution for your side chain happens off chain, but the data availability for your side chain happens on chain on some other layer like Ethereum or LazyLedger.

In the optimistic roll-ups, there is a node called the aggregator. The job of the aggregator is to collect transactions for that side chain and to aggregate them into blocks and then submit that block to this data availability layer. Now, what happens if the aggregator submits an invalid blog? First of all, I should add, by the way, I’m in many of these optimistic rollup proposals, anyone can be an aggregator. The question is, what if the aggregator submits invalid block?

What would happen is basically a fraud proof can be generated. Because the data for that block is available to everyone because it’s been published on the data availability layer. Because that’s the case, anyone that’s watching the chain can generate a fraud proof to prove that they may have generated invalid block with an invalid state root.

Sunny: So is LazyLedger going to provide this place to do fraud proofs or would that happen on another chain, like Ethereum?

Mustafa: Well, I mean the fraud proof themselves, they don’t have to be posted on chain. The fraud proof system is also a layer two.

Sunny: I mean, the challenge game that, where would the challenge game take place?,

Mustafa: That depends on layer two. All of this execution is irrelevant to this layer one. On layer two, there’s different ways of doing it. I guess the simplest way of doing it is, each optimistic rollup chain has its own sub network where the users of that chain communicate with each other. If someone generates a fraud proof, then that fraud proof gets distributed and propagated across that side chains peer-to-peer network. That allows all the users in that side chain to verify that the block is actually valid.

Sunny: You guys have a third team member, John Adler. I believe he also has another project he works on called Fuel Labs, which is an optimistic rollup system on Ethereum. What is the relationship between these two projects?

Mustafa: John is chief research officer at LazyLedger. He’s actually the person who proposed the idea of optimistic roll-ups. Fuel Labs is basically an optimistic rollup, a side chain library for Ethereum or specifically it’s an EVM compatible optimistic rollup implementation that allows you to do payments. The idea is that in the future Fuel Labs will also support other data availability layers, such as LazyLedger. The main advantage of that is that because this ledger is designed from the ground up to be scalable data at this layer, and you’ll be able to process much more transactions on the basis than Ethereum.

Sunny: I’m more familiar with the optimism teams rollup design, there I send transactions to the operator and then the operators put them onto the call data of an Ethereum block. In your model where a LazyLedger plus a rollup, are users submitting transactions to the operator who then puts a block and stores them on LazyLedger or our users submitting transactions directly to LazyLedger, then the operator picks them off from the LazyLedger chain.

Mustafa: It’s the first one. The user submits the transactions directly to the aggregator and then the aggregator aggregates them into a single block. That’s the block that gets posted onto the LazyLedger. I can say that the original model of LazyLedger was actually the second option. LazyLedger paper was released before the idea of optimistic rollups came about, and the model was that simply users submitted transactions directly on chain, but that’s a major drawbacks a specifically because it meant that every users have to basically process everyone every other user, this transaction and light clients won’t be supported because there’s no fraud proofs involved and there’s no aggregator to create a state. root

Brian: Curious about transaction fees in this world. Would you have transaction fees on the LazyLedger level and then potentially also transaction fees on the level of a particular application running on it? How do you see that working?

Mustafa: Yeah. They’ll definitely be transaction fees on the LazyLedger mainchain. This will basically be very straightforward. It’s basically storage fees, because on the LazyLedger main chain, nodes don’t actually process or care about the contents of your messages and transactions, they simply just take them and put them on the chain. There’s no execution costs, there’s no gas.

Brian: Like in Bitcoin right where it’s basically dependent on the size.

Mustafa: Exactly. The transaction fees will be solely dependent on the size of the transactions. That being said, you can implement gas fees on the execution environments or the layer two chains that people build on this measure. If that would be specifically useful. If you wanted to create a general smart contract platform using this ledger as a data layer, then you can implement gas feeds. However, the main vision for lazy ledger I envision is that people will use it to build app specific chains. Chains for one app and each app has its own chain. In that model, you don’t really need gas fees because you can just directly hard-code the fees for the actual transactions in your app because there’s a limited amount of methods in the app.

Brian: I’m also curious, so you have these different apps then to what extent with different apps that are running on top of Lazy Ledger interoperable and yeah how would that work?

Mustafa: Sure. The interoperability aspect is also a Layer Two concern and is dependent on the execution environment that chains use. If the chains are using it using a Cosmos SDK based execution environment, then they could use IBC or which is short for the inter blockchain communication protocol, which is the protocol that Cosmos has developed for Cosmos chains to communicate with each other. That’s on layer two. On Layer One side of things we want to make it possible for people on other layer ones’ continuum to develop smart contracts that use LazyLedger as a data availability layer.

For example, you might develop Ethereum smart contracts that is very data heavy, and you might want to, you might find it cheaper to post that data on LazyLedger, but you will need to wait for that smart contract to verify that data has been posted on LazyLedger and stuff like that. We’re developing a LazyLedger Ethereum bridge, where basically the LazyLedger block headers are posted on to the Ethereum chain that would allow you to verify that certain pieces of data have been included in LazyLeger via an Ethereum smart contract.

Sunny: One of the main things about this LazyLedger project is, it’s not just like you could have a Tendermint chain chugging along and then storing transactions on there and you could do all this layer two stuff. On top of that, you guys also do these sort of data availability guarantees for all the transactions. Before we even get into how those proofs work, I’m actually struggling a little bit to seeing why they’re necessary, because if you have a Tendermint chain, isn’t that already giving you some sort of data availability, because let’s say you have a hundred validators on your Tendermint chain.

In tendermint consensus, as long as you have some honest validators, they’re not supposed to sign pre votes unless they have the full block proposal. As long as you have, one-third validators being honest in the sense that they’re not just signing stuff without getting the proposal. You’re guaranteed that all the honors validators at least have access to the data. Why is that not sufficient?

Mustafa: Sure. This is going back to the threat models. One security model for data availability might be as you described, which is, let’s just assume that one third of validators are honest and they will only sign valid blocks. That sounds very reasonable. When you actually compare this to the Bitcoin and Ethereum security model, and you understand the implications of this is much less secure than those models, because with Bitcoin the Ethereum at the moment, if there’s a 51% attack and the consensus goes rogue, and the worst thing that could happen is that the rogue and malicious consensus can either send such transactions or undo transactions.

They can’t inject or insert invalid transactions, and they also can’t hide transactions. They can’t make data unavailable because the validity rule on Bitcoin Ethereum is such that the full nodes also verify the data availability of the chain. If you start assuming that one thirds of validators are honest for other things, not just for double spending, if you start assuming they’re honest for things like data availability and validity of state, that completely changes the thermal model. That completely changes incentives for doing such an attack.

At the moment there isn’t really a big incentive for doing a 15 minute 51% attack because from an economic perspective, at least, because the worst thing you can do is undo transactions and do double spending attacks. Maybe you can buy maybe the best case scenario. You can buy a Tesla or something with Bitcoin or some expensive car, let’s say it’s worth a 100K or something, buy some supercar and then do a 51% attack and then undo that and do that transaction, he’s got a free sports car. That’s for the Bitcoin model. If you start assuming the consensus is honest for other things like data availability, then the consensus can steal everyone’s money. That’s a lot more profitable than getting a free sports car.

Sunny: Yeah. You need one third, you can turn that into one third rational by adding in a challenge game. Here’s an example. Let’s say I create Tendermint, then I say, okay, every validator gets a random challenge saying, give me the Nth leaf in the Merkle tree. Every validator has to, with their next vote, in consensus, they have to include their piece in that. If they don’t, then they’re slash double right. Or their votes just don’t get counted. Now you suddenly turn that honesty assumption into a rationality assumption where, okay, every validator now does have an incentive, they will say I need the data before I sign a pre vote, because if I don’t get the data, I’m not going to be able to answer the challenge next time. Now you’re not even depending on honesty anymore.

Mustafa: Right. So what you’re proposing is basically a naive protocol to prove data availability, which is basically what we’re, what we’re doing. So at that point, I think we agree that you do need to verify. I guess what you’re saying is different in that the node, the end clients themselves are verifying data availability, but you’re incentivizing miners to be honest. So what you’re proposing is basically something similar to proof of cost scheme. There’s two problems with what you proposed. The first problem that’s that it allows the miners to prove that they have the data, but it does not prove that they’ve actually published the data. That’s the first problem. Because they’re only publishing a very small file up there.

The second problem is related to the first problem, which is that the only people that have a very small part of data, where in reality, you need 100% of data to be available because you can hide an invalid transaction in a very small part of the block, a hundred bytes.

Sunny: But it’s random, right? So that’s the part that they’re challenged with is going to be random. They need to have the whole block in order to have guarantees that they could provide random data. That the point here is it’s turning it. You only need one honest validator to make sure the data is available, right? You’re no longer depending on all validators being honest or the proposer being honest, or one third being honest, as long as you have one honest validator, the data will be available.

Mustafa: How much of the block will they have to publish in the challenge?, let’s say 1%, for example. Right. That’s not, that’s not good enough because that means that there’s a 99% chance of them winning the challenge without actually having that there. That’s why we use erasure coding in data availability proofs to make a very high probability guarantee that the data is actually published. I’m happy to go into detail a little bit to explain how the scheme works.

Sunny: Yeah. That’d be great.

Mustafa: Yeah. I mean, that’s a good place to start. The way it works is basically, so first of all, I should explain erasure coding. Erasure coding is a very old mathematical primitive. That was, I think, first discovered in the fifties, used in all kinds of technology, CD, ROMs, and satellites and stuff like that. What it basically allows you to do is if you have a piece of data, let’s say one megabyte big. Let’s say you lose half of that data. Let’s say with what you, with your one megabyte of one megabyte file, you do 500 clear parts of the there, and you can still recover the entire one megabyte file with only half the data.

Let’s say you have a 500 kilobyte file. What you do is you apply this erasure coding scheme onto it. What that does is it blows up that file to one megabyte big and the extra 500kb of that file is not your original data, but what’s called the erasure code, which is, I won’t go into the mathematics of it, but it’s basically, it creates extra redundancy for your file, such that if you lose half of the data, you can recover the whole data. Thanks to this extra part of the file that you included, the Erasure code.

What we do basically is when we create a new block, in Ethereum when a miner creates a block, they commit to a merkel root of all the data in that block. What we say is instead of committing the merkel roots of the data block, you commit to the erasure coded version of data of that block. If the block was originally one megabyte big, and there’s one megabyte of transactions, you apply the coding scheme onto it and the transaction size becomes two megabytes, that will one megabyte becomes the code itself. Then you commit the merkel root of that data to the block header.

What that creates is basically this property where, let’s suppose a miner is malicious and wants to hide even one single bite in that block, let’s say they want to withhold one byte in that block, they have to withhold half the block because they can’t just withhold one bite of the block, because you can recover that bite from the erasure code. The only way you can hide that bite is if you withhold half of that block. That basically makes it possible to create a challenge scheme based on random sampling, because let’s say that you’re a client, or you’re a nerd that wants to check that full data in the block has been published, but you’re too lazy, or you don’t have the resources to download the entire block.

What you do is you ask anyone on the network to give you, for example, 10 random pieces of the block. For each sample, you ask that block, there’s a 50% chance that you will hit the portion of the block that has been withheld, right? Then if you do two samples, there’s a 75% chance. If you do the three samples, there’s 87.5% chance and so on and so forth until you get to a situation where you do 16 samples, there’s a nice 99% chance that you’ve sampled the half of the block that the miner has withheld. Therefore you will not get a response to your sample requests. Therefore you can assume that the block is not available.

Sunny: Doesn’t that assume that the miner, pre decided what to withhold, let’s say the original data is a hundred bytes. The new encoder data is 200 bites. No matter what I request for the first 99 will be, Oh yeah, here’s the thing. It’s only when I hit the hundredth request and the miner won’t, I’m going to stop now. I’m not going to reply.

Mustafa: Yeah, that’s exactly right. In order for this to work, there has to be a minimum number of nodes sampling enough samples of the block, such that the miner is forced to release half of the block. Under a naive basic network model, the miner successfully passes all of the sampling challenges for those first few lite clients. Then whether that’s acceptable or not depends on how big and the network is. In Bitcoin, for example, model 1 million lite clients, according to Google Play’s Bitcoin wallet download statistic. Then with reasonable parameters you can only fool a couple of hundred lite clients. There’s a very few number of light clients, and the attack is not worth the cost.

If that’s not acceptable to you, or you have a much smaller network, then you can use a more advanced network model, such that, as you make each sampling request through a mixed network, or you use an onion network like tor, and you add a mixed network on top of that. You basically make it such that each something request is sent at a uniformity random time. That makes it impossible for the miner to link each sample request to each specific light client. That basically the thwarts that attack.

Sunny: Is it the idea here that lite clients would inform each other? Let’s say there’s a hundred lite clients talking to a validator. For 90% of them, the validator ends up not responding, 10% of them got lucky and they only requested stuff that wasn’t withheld. How are lite clients supposed to inform each other, saying, maybe in your challenge game with the miner, it looked available. To me it didn’t work out. How do they inform each other of this?

Mustafa: There’s no need to inform each other. The data availability challenge itself does not require any cooperation. You only need to verify it for yourself to be convinced.

Sunny: Let’s say, back to the attack I was describing earlier, where it gives 49% of the data to anyone and it could be a different 49%. Let’s say the miner only gives 49% or anyone, all the light clients now need a way combining their own 49% to recreate the original.

Mustafa: Right. They need to cooperate to share data with each other, but they don’t need to cooperate for the actual part of whether the data is available or not. They do have to work together to reconstruct data. You can use BitTorrent for this. For example, you can basically represent the block has a bittorrent file and different peers in that BitTorrent file can have different chunks they can share with each model any peer to peer network. You can use any peer to peer network for this, ipfs or whatever.

Sunny: One of the challenges here is how do you turn this into a non interactive proof? I see this as a very interactive game. I can convince myself that data is available, but let’s say I have a rollup on Ethereum. Before the smart contract on Ethereum should accept the state root, it should have some non interactive proof that the data is available. How do I construct that?

Mustafa: Yeah. I always have been thinking about non interactive proofs a lot. There’s different definitions of non interactive proof. If we’re talking about, can I generate some string of data and I can give you that string data to convince you that some other piece of data is available on the network. As far as I know that’s not possible. I’ve tried to construct schemes, do this, but require a lot of assumptions. It’s not really practical. However if you’re talking about more generally the goal of verifying data availability proofs on Ethereum. There’s two ways of looking at this.

With our current plan to create a LazyLedger to Ethereum bridge, the Ethereum part of the bridge does not verify data availability proofs for LazyLedger, therefore the Ethereum side of the bridge does make an honest majority assumption for the consensus and assumes that consensus is honest and is only signing blocks are actually available, which is problematic for the reasons that I propose, but it’s okay for certain applications that I can describe later.

The second way of looking at it is there have been proposals, on Ethereum Research, and Vitalik made this proposal as well, to allow Ethereum to validate user specified data availability proofs, and that would basically make it possible for Ethereum verify data availability proofs for third party chains, so that you can basically submit the merkel root of some data to Ethereum. This could be a special op code that is added in a future hard fork. Then the Ethereum chain would verify the data availability of this other specified merkel root using the data availability proofs by making any sampling requests and the nodes that are verifying the chain itself have to do this.

Brian: Ismail, you mind diving into a little bit, what’s the connection between a LazyLedger and Cosmos SDK, is this currently being developed? Can you expand a bit on that?

Ismail: Sure. The current plan is to use, so for the LazyLedger layer one, we will use tendermint for consensus. Mustafa originally proposed to not have any state execution on the layer one. It’s like the purest form of LazyLedger, but then we’d have to define the POS layer, a proof of stake layer as well as a rollup. Then, as we have to do this execution anyways, so I think for a first implementation, we will go with using the Cosmos SDK as much as possible and do that part of the execution, the minimal amount of execution to have a proof of stake network on LazyLedger.

Then for the optimistic roll-ups one way to build these is to use the SDK as well. Ideally we would, abstract away is the wrong word, because it’s already abstracted away through ABCI. We will remove tendermint from the Cosmos SDK and write our own ABCI client for instance, and then people could use the SDK to write their optimistic rollups.

Brian: Yeah. Cool. Very interesting. If LazyLedger launches as this layer one how do you think that will compete with other layer ones? whether that’s Ethereum, Solana, or various others that are coming along. How will it differentiate and for what use cases do you think it will be better or worse versus those.

Sunny: I’d be especially interested in the comparison to file coin. 

Ismail: As far as I understand, Filecoin doesn’t do ordering, right? it’s more of a dumping data only. If your application needs to dump data, but it needs the ordering of that data, then you would prefer LazyLedger.

Brian: But if the roll-up operator is the one picking the order and then submitting it to the LazyLedger chain then,

Ismail: Well, it depends what you, what you say the data is. When I said the data, I was speaking more generally in that case, the data would be the rollup, the block itself. In that obviously you’d have an ordering type of chain, but for other applications, it could be Ethereum applications, you need the data to be available but also ordered, then you couldn’t just use file coin as far as I understand.

Back to Brian’s question. You asked, what applications would preferably build on LazyLedger? I think if you want to build an application specific blockchain or you have a cool idea for your fancy blockchain and decentralized app. You could do this with LazyLedger in a very simple way, because you don’t need to assemble that validator set, the proof of stake network as Mustafa mentioned earlier. You’d have the ease of Ethereum for deploying a smart contract and you could launch your app without any further hassle, basically in an ideal world, you could just deploy it in a few minutes, that would be ideal.

Brian: When this LazyLedger chain launches. It’s a proof of stake chain, so I guess it will have some sort of staking token. Do you see any other role for that? A token besides just validators putting up a bond.

Ismail: It will also be used to pay the fees to submit the data. Probably the optimistic rollup chains could also use the LazyLedger native token. They don’t need to define their own token if they don’t want to if you need a token on your chain, you could just also use that.

Mustafa: It can be used as a payment token as well. More generally, if the teams building on this you want to use it as that. The beauty of it is that you don’t have to, and you can actually build chains on top of LazyLedger, where users have no interaction whatsoever with LazyLedger token. You don’t have to force users to use some of the token, but the block aggregators or the block proposers pay for the storage fees on this ledger in LazyLedger tokens. The users do not have to be exposed to that at all, because they can simply do a currency conversion, where the users pay for aggregators in some different currency. The aggregator does the currency exchange back to LazyLedger and pays for these little storage fees.

To go back to your question about how LazyLedger differs from other layer ones? I am pretty much, I feel the other Layer Ones have a very similar value proposition and LazyLedger has a very different value proposition. The value proposition of most Layer ones, let’s say ETH 2.0 or Avalanche or Algorand and so on and so forth. THeir value proposition is effectively to create the coolest layer one with a superior execution environment that’s more scalable than everyone else’s, and they’re all doing that under the world computer model. They’re all basically creating competing computers and they’re trying to attract developers to build applications on top of them using their execution environments, using whatever program language is they provide.

LazyLedger’s value proposition is actually, we’re just providing a very minimal modular plugable layerone and the developers actually should just create their own app specific chains, using whatever execution environment they want. I think this is a very useful piece of infrastructure that does not exist yet. The end goal, for the first time in history, it makes it possible for people to deploy their own blockchains, in a decentralized way, very quickly, in minutes, possibly, without overhead deploying a new network. In terms of impact, I think this is comparable to what the cloud or what virtual machines did for the internet because cloud virtual machines and services like Amazon ec2 made it possible for the first time for developers to deploy their own virtual servers for their execution environments on the internet.

Whereas previously, if they wanted to do that, it needs a physical server and put that in a data center or in their own home, or it would have to use someone else’s server with a limited execution environment. Back in the day was Geocities, for example, and more modern ones include Bluehost, but virtual machines allow people to actually create their own servers with their own execution environments. I think that’s really what drove the later stages of web 2.0 development, as it allowed people to use things like Docker containers and all kinds of new environments and languages like Rust and Ruby and Python and stuff like that just weren’t available on the shared web hosts.

Brian: Cool. Amazing., what’s the, what’s the timeline for you guys to launch LazyLedger? When is it, when can people use it and build on it?

Mustafa: Yeah, it’s still very early stages and at the moment we’re just completing the legal aspects of a seed round. That will allow us to hire developers, but we expect to have a test net release within about nine to 12 months and then the main net release within 12 months after that. Before then we’ll have a devnet for people to play with and experiment with. We’ve had a lot of activity on github at the moment. If you go to github.com/lazyledger, we’re actively developing the project there. We also work with community contributions and input from anyone that’s interested in following the project.

Brian: Cool. Thanks so much for joining us today, guys. I think this was very interesting to dive into this and definitely seems like a really radically new approach to doing this. I’m excited to see what the impact will be.

Mustafa: Thanks for having us. It’s great to talk to you. Thank you very much.

Sponsors

  • Algorand

    To learn more about Algorand and how it’s unique design makes it easy for developers to build sophisticated applications, visit algorand.com/epicenter

0:00:00 | -:--:--

Subcribe to the podcast

New episodes every Tuesday