Solana – Reaching for the Limits of Blockchain Performance
The vision of a world computer on which anyone can deploy smart contracts has captured our imagination since the publication of the Ethereum whitepaper. But while Ethereum demonstrated the viability of the concept, its shortcomings in terms of capacity and throughput prevent it from realizing the vision. Today, throughput has become a major bottleneck for widescale adoption of decentralized technologies.
Many projects have set out to deliver where Ethereum 1.0 falls short. Most projects, including Ethereum 2.0, aim to create some kind of sharded network with many interoperable chains. Solana may be the only project that went the other way. Through identifying every performance bottleneck for a single blockchain and developing novel ways of removing them, they target to achieve throughput of 50,000 tps.
We were joined by Solana creator and CEO Anatoly Yakovenko to discuss Solana’s approach to building a web-scale blockchain. We covered some of their novel ideas including proof-of-history and parallelizing smart contract execution.
Topics discussed in the episode
- The performance-obssessed engineering background of the Solana team
- The arguments against sharding for scalability
- The value of a shared sense of time in a distributed system
- How Solana’s Proof-of-History creates a global clock for the network
- How Solana uses GPUs to parallelize smart contract execution
- How value could be accrued for the native token if transaction throughput is abundant
- Solana’s upcoming Tour de Sol testnets and launch timeline
(17:25) How Anatoly got into Blockchain
(19:31) On working for Qualcom
(22:01) How working in the cellular industry shaped his perspective on Blockchain scaling.
(25:08) On the idea of designing a system that could process and validate as many transactions as bandwidth would allow.
(26:18) How Bitcoin is designed to work for the weakest possible connection, so it can work for everyone. Considering the necessity of that contstraint.
(29:01) On Ethereum’s shortcomings
(32:10) Ethereum’s current design goals
(33:00) On the design goal of Solana
(35:02) The barrier to having many validators is economic, since it’s quite difficult to produce enough income, vs time spent, for it to be viable.
(36:02) The effects of slashing on number of validating nodes
(38:05) Slashing in Solana
(38:51) On-chain vs Off-chain Governance
(40:46) Parallelization in Runtime
(45:25) The Interoperability between smart contracts written in different languages
(47:05) Analogy with Ethereum and Chess
(50:16) Parallel execution with GPU
(52:25) On GPU Burst Capacity and Performance
(54:16) Solana’s consensus layer
(56:58) Thinking of it in terms of Bitcoin consensus, parallels with radio technology and synchronizing clocks
(1:00:12) Censorship resistance and responsiveness
(1:00:43) Expanding upon the Bitcoin example
(1:03:45) The need for a clock
(1:05:35) Why proof of elapsed time is important
(1:06:51) How the clock actually works
(1:08:47) In Bitcoin you parallelize block production, and in Solana you parallelize verification.
(1:11:22) Making a global clock when the distributed speed of hashing is variable.
(1:15:12) What happens if an alien comes with 100x faster hashing, and gets 5% of the hash rate? (more on consensus)
(1:17:43) Validators betting on which fork of the chain will persist.
(1:20:18) Is Solana’s token valued because of it’s use for paying fees?
(1:25:38) Value based pricing rather than cost based pricing.
(1:28:27) If Solana could secure a monopoly, then maybe only use partial capacity and charge high fees, but otherwise there are so many competing chains, it’s difficult to use fees as a business model.
(1:30:05) Plans for a DEX
(1:32:12) What’s remaining between now and Solana’s launch?
(1:33:01) The problems of finding a custody solution that everyone can use in a secure way.
(1:34:57) goal for the launch date?
Brian Fabian Crain: So we’re here today with Anatoly Yakovenko, who’s the founder and CEO of Solana. So Meher Roy and I, we’ve been following Solana for quite a long time, we spend a fair bit of time hanging out in San Francisco at the slanter office working with Solana you know, both invested in Solana. So it’s a project that we’ve been following for a long time. And I’m really excited that we have on a totally on today. It’s a complicated project, hard to explain, but also infinitely interesting. So yeah, thanks so much for joining us. Anatoly.
Anatoly Yakovenko: I’m super excited to be here. Yeah. I’ve been a huge fan of you guys for a long time as well.
Brian: Thanks so much for coming on. I mean, it’s, I think already when we when we talked in the beginning I knew okay at some point we’ve got to do a Solana episode. But the challenging thing is explaining Solana is just you know, it’s a challenge so hopefully we’ll we’ll manage it today.
Anatoly: Solana is a beach in San Diego County, that’s the explanation.
Brian: So tell us how did you originally become interested in blockchain.
Anatoly: I’m an engineer, right? I’ve been spent most of my career at Qualcomm working on one form of another of a distributed system, but a majority of my life working on semiconductors, chip level operating system firmware stuff. And when Bitcoin came out, it was a curiosity. Oh, wow, this is a totally different way to solve this problem. I even tried mining, it was CPUs, but this is just a toy. didn’t really think too much about it.
Brian: And so I know a lot of people right at the Solana team come from Qualcomm. So what’s Qualcomm Like? What’s special about that place?
Anatoly: So I graduated in 2003, from the University of Illinois, and this was post dot.com crash. So a couple friends of mine were actually at a startup if you guys ever remember Grand Central Dispatch or Vonage, so we’re building these Voice over IP systems, but in South Central Illinois, Illinois. After the.com crash there was zero funding interest in new tech. And while we build this really cool thing that project died, but turned out Qualcomm was using similar technology stack and hired me on the spot. And they interviewed a bunch of places, but the interview there, everybody was wearing flip flops and board shorts and had their own office and it was just dumped surfing. So I was smitten by the culture there, it was very laid back people that work there. The San Diego is a beach town, there’s really nothing else to do there besides do yoga, go to the beach, and have a dog but that’s literally what everybody does there. But it turned out that on the tech side, it was really, really an insane breakneck speed from 2003 up until I would say 22 When mobile phones went through this crazy evolution, if you guys remember what phones were in 2003, to what iPhone six iPhone eight is we went from running a microcontroller, basically a 16 bit single core 400 megahertz dinky chip to eight cores 64 bit desktop grade or server grade hardware with 20 different subsystems GPUs, DSPs, modems, all this other stuff all in the same day. So it was really crazy. Basically, in 10 years, every year, we had a insane architecture evolution and the chips are just things just changed. So I learned a lot about how fast technology can move. it was really not something anybody could have predicted in 2003 that right now in your hand, you hold a server, great computer.
Brian: So coming with that background and approaching blockchain coming with that history, how do you think that changes your perspective relative to the rest of the field?
Anatoly: You know, software actually takes longer to build than hardware. People think that hardware is a huge investment. But the way hardware works is once you have a design, you can scale it up. the whole point of hardware design is you, you build something that can scale up as you add more, basically more lines to memory, more higher frequency, memory, and more area that can be used, for caches and stuff that. Those kinds of tweaks that you do are very easy to do. I mean, not easy, but they’re they’re straightforward. They’re not design revisions. So the hardware you want to ship is the kind that when you do have cheaper available dice to you, you can do Those numbers and just get more performance. And then you do start optimizing for power and stuff that. And power management is probably the hardest part in hardware. So because software takes so much longer to build, because it’s an it’s an iterative process, we ship it, customers hate it, we have to rewrite it, there’s far more requirements, it’s a much broader set of API’s to fill.
The way to design it is to build it such that when those hardware improvements happen, that you don’t have to do a lot of work, especially in the upper layers, your lower layers, you really want them to scale up with hardware. So the way I started thinking about software, especially the part that I was working on in the operating system, is that how do I make sure that when this hardware eventually changes literally next year, right, I’m working on something that I’m shipping today that will have to run a new hardware next year? that I know is going to be faster? How do we make sure that it’s it gets faster, right how to make sure That the software doesn’t get in the way of the hardware. It’s one of one of my quotes.
So when I started thinking about blockchain, I’ve already been doing that approach for 10 years. So, really the first thing that I thought of was is what? What are our constraints? So the obvious one, the main constraint is bandwidth. So given that we have X amount of bandwidth, how do we utilize all of it? And that’s where the design started from if you have one megabit of bandwidth, you can stuff you know about 5000 transactions in there if you have 100 megabits of bandwidth, that’s 50,000 transactions, right? So that’s that you can think of it as is 100 megabits of bandwidth a lot? Your 4g phones, the standard for 4g is point to point anywhere globally. Hundred megabits.
Meher Roy: Yeah. I mean, being from India, I debate that point.
Anatoly: The first project that I worked on, we got push-to-talk to work from San Francisco hydro bot, and I think under 150 milliseconds on this thing that was precursor to LTE. It’s called a video, if I remember correctly.
Meher: So, I mean, but stating the broader point, the broader point is that if I have 100, megabit connection, which I do at my home, and presumably many people do at their homes, then we could get an A 50,000 transactions a second in to our systems somehow, that’s sort of the model of our system, right. And so your perspective would be that if we were to be a full node, and we’re going to process these transactions, that somehow the software running on our systems has to be designed in a way that it can consume all of those 50,000 transactions and process them and validate that if you could build your software in a way that you could consume, whatever interests the bandwidth would allow, that would be the ideal block.
Anatoly: Yeah, that was basically the premise that I started with, I didn’t even think of sharding as an option. You know, it thought that every team out there is doing the exact same thing. It was we have to build this as fast as possible, because everyone’s gonna do the same thing. And we need to, get ahead of us.
Brian: Yeah, I mean, it’s interesting, you coming from this outside perspective, if this completely because if you sort of approach it from the Bitcoin side, right, then you’d be okay there are all of these variances in bandwidth, but we almost have to sort of adapt to the least common denominator, right? Because otherwise they fall behind and in centralizes, right, and I know there’s this. Well, number one developer know, Gregg Maxwell, who lives somewhere in the bush, and he has a super terrible connection. So you know, it can be too big too big blocks. He can’t mine there. And, and I you know, I guess this philosophy carried over to ethereum with proof of work. But of course, it’s interesting to ask me is that necessary at all? And then I guess you have to come up with something very different, right? If you want to get rid of that constraint.
Anatoly: Yeah, so the trade off is definitely that you need those 100 megabit connections. Every four years bandwidth basically drops in price by 50%. So if people think that 100 megabits is expensive now, it’s going to be twice as cheap four years from now. If you look at Bitcoin, it’s been around for, what, 12 years now. The price for the original design, then the amount of bandwidth available to the design has, what is it eight times more bandwidth, you have two to the sixth, so 32 times more compute available to each node when the design started, but no aspect of Bitcoin is leveraging that. It’s not doesn’t have any throughput. It’s not better latency, it’s not more censorship resistant. So there’s a design mismatch between Bitcoin and hardware, right and the real world and that design mismatches is the thing is going to be what kills it honestly. I think if they’re really focused on going the other direction, then build something bunker coin project that doesn’t exist, but you know, imagine blocks that are transmitted over shortwave radio, that cannot be censored, because there’s no network infrastructure using bouncing the data off the ionosphere, so you’ll need 6 hour blocks, but then it’s as decentralized as possible, right?
Brian: But yeah, I think it’s a great point about Bitcoin. Now, of course the thing that Solana is more aligned towards or I would say, competing with in a way is Ethereum. Knowing that Solana is a full smart contract platform,
Anatoly: we’re open source project. competing to the death.
Brian: Yes. There’s no such thing as competition in the blockchain world, I know. But if you look at Ethereum, what do you think are the biggest problems and shortcomings there?
Anatoly: So let’s say that actually… I’m a huge supporter of Ethereum if Ethereum fails, we’re doomed. My project is doomed. Everyone else’s projects is doomed. So we all depend on a Ethereum 2.0 succeeding like it or not.
Brian: I don’t know if that’s true.
Anatoly: I think to some extent, I think a lot of gas will go out of the space, if Ethereum dies because it has so much momentum and community can’t just die for the failure. Right? It’s technology will ship eventually, right? So I’m, I’m very much hopeful that Ethereum succeeds. And the problem that they’re solving with sharding is trying to build a system that is as decentralized as possible. And that’s a really hard Computer Science problem, and that’s I think that is the trade off, right? What we’re doing is a hard engineering problem. You know, it’s just bandwidth. We have computers, we have SSDs and GPUs, how do we write the code that just makes them work? Right? engineers can do that. Ethereum and these other other sharded solutions are trying to build a computer that is you have this mesh network of arbitrary machines with arbitrary profiles that are all interconnected and somehow contributing to the security of the network. That’s a computer science problem that hasn’t been solved yet. And there’s a lot of designs and design trade offs. And as you see, if you’ve been following Ethereum their iterating and their designs pretty frequently still, it’s not a set in stone. The number of shards changes that when you have a design that’s computer science design, and then you try to make it convert the papers into engineering is that’s when the rubber hits the road. And you starte iterating back, right? This is not gonna work, this is going to be too expensive, you know? So that’s, I think is the main, the main challenge with Ethereum, but to speak broadly why is sharding such a, such a huge trade off?
The fundamental problem and blockchain is this thing, this trust thing, I don’t trust you, you don’t trust anybody else. And when you start the chain, the shards don’t trust each other, and verifying and the consistency of all the other shards in the system. That’s the real hard problem. And how do you do that efficiently? How do you do that, such that the network doesn’t stall there and weird attack vectors from one from for data availability between shards. There’s a bunch of really complex algorithms that folks have designed and protocols. Nobody knows if they’re going to work, right if the incentives are going to align well, and the Reality may be that there’s you know, what, 2-300 professional validators out there right now, at most. So is that even needed? Right?
Meher: In Ethereum, in one of the recent designs, they’re 1024 shards. And there are 600,000 validation slots that you can all you can occupy. And I think Ethereum is targeting the world where I have my laptop, and I should be able to put my 32 eth on it and go and validate maybe one shard or one and a half shard or two shards out of the 1024. And the assumption appears to be that lots of people are going to do this on their laptops. And somehow you want to add the security of all of these machines together to have a very secure machine that secure across all
Anatoly: the amazing right, right that’s that’s a an amazing design goal. there’s no doubt about that.
Meher: Yeah, that’s an amazing design goal. Whereas, now we run validators at Chorus right so we we see the underlying systems of Tendermint and Solana, close eye. The assumption is is different than the world, the imagined world is different. The imagine world is there’s maybe 200 professional validators, you can assume that these professional validators will go and source good bandwidth connections they can, they will go and source really great hardware individually. So you can rely on the validators to have really high performance hardware and really good bandwidth. And there’s 200 of them that the network and source and so how do we make these 200 high powered well connected machines work together really well and process as many transactions as possible. That seems to be the tendermint’s main goal and the Solana goal.
Anatoly: Yeah, so one caveat there is that 100 megabits is not high powered. Right? That’s the home connection right now in United States data centers will give you 10 gigabits. A lot of data centers now will give you one gigabit for free. And the amount of compute you need to actually process 100 megabits for the transactions is $5,000 gamer machine, we use off the shelf GPUs. So that barrier to entry is actually quite low. If you look at how much people invest in mining Ethereum at home, it’s typically around that amount. So anyone that’s mining Ethereum can actually mine on Solana. Right, and we can have a very large set of validators we’ve designed the software the algorithms to go to 20,000 computers, it’s been there designed goal because realistically, I don’t think we’ll see 20,000 computers for five years. That’s crazy exponential growth.
Meher: At some level running these validators, you start to realize that the barrier to having a lot of validators isn’t hardware. It’s economics. It’s the fact that if I am, let’s say, the thousand to validater in the system, in terms of stake, I still need to invest all my weekends to get the machine up and running well. But is it going to produce enough money to compensate me for all the weekends that I’m going to lose to this validation? And this economic trade off? I don’t think blockchains have solved this economic trade off for 1000 validators, even with cosmos, the 16th validator does not make enough money to justify their effort they’re putting into validation today. So somehow, there’s this economic barrier that would prevent a system from going to Thousand validators today, in my opinion.
Anatoly: What’s the slashing percentage on Cosmos?
Brian: 5% for double signing.
Anatoly: I think slashing needs to get to 100% and then use you guys as investors in Solana… If we have 100% slashing how many validators you’re going to distribute your investors investment to?
Brian: You mean that would lead to more distributed, that people would split more validators?
Anatoly: You have to, and validators you guys will run more nodes. I think what I want to see operationally, that you have separate nodes geographical location and that separate humans have access to the keys. you have some firewalls operationally between those. Then building out those processes and launching more nodes will be worthwhile because the market will demand that because investors even like a small network $200 million market network that’s doing a million dollars for the value. How many nodes would you distribute it to? If 100% could be slashed? Right? I think the economics there I think will will work out. Because I think what we see now is that the top 20, validators, and Cosmos have 80% of the stake, right for what I can tell, last time I looked, it looks that’s the problem.
Brian: They’re definitely they I think their whole economics around proof of stake is is a very interesting and largely unsolved issue also, of course, with the challenge of exchanges coming in and staking and allowing trading at the same time.
Anatoly: So I honestly think that 100% slashing will push exchanges to stake across the network to there’s no way binance wants to have an egg in their face that all of their users tokens get slashed, from one issue with their validator. there’s no way
Brian: Yeah, it’s a good point.
Anatoly: And if it happens Things that they that’s, that’s their fault, right for not being decentralized, right?
Brian: I’m really excited, if Solana goes ahead and you guys not gonna have 100% slashing penalty?
Anatoly: Not from day one. But it’s I think, so we need soak time and I think Cosmos folks as well, I think that’s why we don’t have 100% slashing networks, it’s brand new code, right? We need, we need some time for it to actually run out there and see what the bugs are that cause these accidental slashing or operational failures. I think in these initial days, a lot of the folks that participate in these networks are they’re cooperative and non adversarial. We want to make sure that when there’s operational failures that we can recover from them and fix them and then people can have high standards for how they run these things.
Brian: Yeah, I mean, if you do want to have start low and go to 100%. Over time, I do think you have to basically programming in from the start. That automatically, maybe it goes from 5%. In the start every month it goes up some amount, because otherwise, you can just have it’s almost a parameter choice right there from the start it’s in Bitcoin you have your block reward into happening you can have a doubling of the thing. It’s a predetermined schedule, and then Okay, you can change it. But it’s always a question Where is sort of the default, and it’s hard to move from the default.
Anatoly: So that’s an on chain versus off chain governance thing. We have a very firm governance model, we have a motto, which is “optimize everything about the system that doesn’t sacrifice performance”. So the direction of where the project is going is based on how do we make any parameter better, as long as it doesn’t sacrifice performance, meaning that everything maximizes performance.
Brian: So how is that related to governance?
Anatoly: What you’re suggesting is we encode this increased slashing percentage, On chain. Effectively, making an on-chain governance decision that is programmatically encoded in there, versus us just having an off chain governance model where everybody knows that’s what’s going to happen, and the reasoning for that is because that maximizes decentralization.
Brian: Which is I think works just the same, right? Because part of these networks is not just the attack, it’s the people involved, right. And if this is what we’re saying every day, then the community we’re building around us is that community that believes in it, and that’s what will happen. Because even if we encoded it, and there was a misalignment between the community and the code, they will just the remove it. But that would be too hard to change that contract, if all the validators really wanted to.
Brian: Yeah, no, that’s that’s a fair point. Well, so what Solana has done know is that basically sort of took the blockchain what you do is a smart contract locks in and then you went to every part of this system and says Okay, how can we optimize that? How can we optimize that? How can optimize that. And then what you’ve ended up with is a system that really looks very different. I do a lot of these innovations and novel approaches to lots of different things. So it makes it not easy to wrap your head around. I think one of the things that’s particularly interesting is to consensus. So we will dive into that in depth. There’s a bunch of other stuff that you’re doing. You know, we can maybe speak about a few of them, one is this thing off pipeline VM? Right. Can you talk a bit about that? I think it’s about paralellization. Right as one feature.
Anatoly: When you’re dealing with operating systems. When you have, you have this distinction between user space, which is where your regular programs run and the kernel. And when you transition between the two. You pay this penalty because the kernel doesn’t trust any of the data from user space. It needs to Do a bunch of verifications. And also guard itself from any registers any memory that’s coming out from the user side, right? There’s this transition between user space to the kernel. So how operating systems deal with this problem is bashing.
When I make a system call, and I needed to be high performance, I send in a bunch of memory at the same time. And the kernel does a bunch of these operations in a row, so they only pay for this penalty once. So that’s a very simple technique. Also, kind of, you can expand that into something called DMA, where you also do you reorganize the memory. And you can know ahead of time which memory is going to be read and which memory is going to be written.
If you know that ahead of time, then you can flush some of the caches and clear out some of the other ones such that when the memory comes back to the user space that all those operations can be done in parallel. That, I have a have remote device, right? the GPU, user space sends me a program “Hey, go run this in the GPU”. I, in parallel, in the, in the kernel, I clear up memory and then send the job to the GPU, the GPU does its job at the time it comes back, everything’s ready to go. Those techniques is what you do as your bread and butter to make things faster on your mobile phone or desktop or server.
That was actually my job for 10 years working in that layer. When we built this thing, the natural thing that it did was designed that transaction format such that it can tell the chain ahead of time, hey, I’m going to do a bunch of instructions execute a bunch of different contracts. And I’m going to read all these memory locations and I’m going to write to all these other memory locations. So that’s their transaction format, is it’s a vector of instructions here. I’m going to run these contracts. And to execute them, I need to read this memory. From the state machine, and I’m going to commit these results to the state machine. Once you have that, you can look at a bunch of transactions and basically analyze or find all the ones that don’t read and write the same memory. So you have a million transactions, you sort them by the memory addresses are going to read and write, and all the stuff that doesn’t collide, you execute all of it in parallel. And you have this guaranteed isolation. And that’s really very standard approach to optimizations and operating system. That’s, that’s their way of optimizing our virtual machine layer.
So pipeline VM is this. We call it really the runtime, pipeline runtime. And I think we’re going to brand this whole thing and call it C level, because we’re in an all beach theme. And the goal of it is to run arbitrary transactions, as many as we can, that are language agnostic in parallel. So this runtime allows us to hoist additional virtual machines we just Libra released Move, and it’s a cool language. So we ported it over, it’s just rust. So we can run Move transactions in parallel alongside Rust C C++. And we are agnostic to the language that’s executing, all we really care about is that this format is so explicit about everything instruction or transaction is going to do that we can analyze it and execute everything in parallel.
Brian: So let’s say you have these contracts, they’re written in the move thing, right? That Libra released and some are written in some other language. And they can all talk with each other as well. They’re all fully interoperable are there limits on that?
Anatoly: So the way you would do this is you would use a bounce buffer. You call a move program, and then you can read it state from the native rust code. You can read the state of everything. And then you you know, you copy it over and converted from move into the format that your EVM contract is going to expect and then you pass that memory into the VM. So that would be an approach to do that. And you can do this all atomically because transactions are atomic, you have an instruction vector that says runs the move stuff, do some memory formatting on the data, then pass that into the VM. And then that does the state transitions. And what’s cool is that the virtual machines are not loaded by us. They’re user loaded. So, move is compiled into effectively shared object and elf. And that elf is loaded into Solana’s Berkley Packet Filter (BPF) byte code. That byte code is marked as executable now you have a virtual machine. And you can do the same thing with the EVM. Us as the company or as the project, we’re just trying to make that faster, but we don’t really care what languages are run on there. So it’s a multi execution environment that’s user self serving. If you want to run Python, there’s 100 different Python implementations that would work well in an embedded system, so you can load one of those.
Meher: Yeah, so just trying to translate the key insight here. So maybe the one way to imagine this key insight is, in Ethereum, you have the state, right? You can think of this the state as the “the ledger” The state stores, what account, how much balance and what code that account runs, if it’s a smart contract, and things that. And so the state is this, it stores this data for each of the accounts in the in Ethereum system. And so maybe you can start to think of this, this state as some chessboard a normal chessboard is eight cross eight. You don’t need any process, you’ll probably need a massive chessboard, maybe the 3000 plus 3000. And maybe each square on that chessboard corresponds to some parts. That stage so maybe the first square corresponds to account one and it stores the balances and quarter second one the next square stores stuff related account to and so on. So each of these squares have this massive chessboard a storing data corresponding to a different account. Now when you see them the nature of an Ethereum transaction is that when a transaction comes in is a validator cannot predict which so any transaction is going to alter some square some data in some of these squares. But as an easy to validate when a transaction comes in, I cannot predict which of these squares have on the chessboard this transaction is going to alter. But in Solana somehow when a transatlantic transaction comes in, I know which part which square which part of the state this transaction is going to alter. And I can do that for each transaction that comes in so if 1000 come in, I know that these thousand impact different parts of This stage, they impact different squares. So essentially, if let’s say transaction one comes in and it’s impacting square 10, then I can take transaction one and I can take that part of square 10. And I can ship those two to one element of the GPU and some other transaction transaction hundred comes in and impact square 100, I can ship those two are different element of the GPU. And these two elements of the GPU can process these two transactions in parallel. Whereas in Ethereum, because you cannot predict what squares the transactions are going to impact, you need to do all of them serially, one by one.
Anatoly: So databases do this technique. They’ve been doing this they think since the 80s. Right? This isn’t new, you can actually predict what addresses what state Ethereum transactions will touch. And it’s a very simple way to do it. You just run theory of transaction locally in your client. And then that gives you the all the addresses Read and Write to and you submit that along with the transaction. So you can optimistically guess that the state of the system will remain the same. And you submit this data alongside of it. So you pay a bit of bandwidth for this parallel execution.
Meher: So the neat thing here is so because a first of all, you can execute them in parallel. But then, of course, when you can execute things in parallel, you want hardware that is designed to efficiently execute things in parallel. And that’s not the CPU that’s the GPU because in, in CPU, you might have only four cores or eight cores. So you can execute only those many things in parallel. But in a GPU where you can execute thousands of things in parallel.
Anatoly: As long as they do the same thing. So there is a caveat with GPUs. It’s not a free lunch. So you have, you have 60 threads that can take different branches and GPUs, you basically have 60 different threads. In each one, maybe controls at different sim D lines. So each thread can execute different things. But all those at different things have to do the exact same thing. So for example, if we if we’re running a decentralized exchange and Solana, what does an exchange? Do? I send a price, right? There’s memories loaded for the current price. It compares against the price that I just said, right, and there’s a branch. And then it’s, that’s whether I’ve won the bid or not. But all the exchange transactions that do the exact same thing, right, just over different memory. So Kernel that, you can load it on the GPU, and then you can process every different thread, you can do at price updates at the same time, right and every cycle. So that’s really the power of the GPU is that when you have contracts, I don’t know what people think of what what is the contracts the day imagine it will run on chain. I think the way people design this stuff is it’ll be this way. lowest amount of code in the system will probably run on chain. So it’s not going to be complicated. It’s going to be a couple of branches, maybe a cryptographic operation and a one or two memory updates. And those kinds of things. If there’s a spike in usage this crazy crypto kiddies spike that everybody wants to breed their cats, we can run all of those, the exact same contract on GPUs and parallel at the same time.
Meher: So what what the GPUs help with is, in some sense, burst capacity. So when there’s a very successful tap on Solana, and suddenly you get 10,000 transactions a second for that particular dApp, and all of these transactions, the internal logic of all these transactions is shared. And you can in some way shape their internal logic to the GPU and execute these 10,000 transactions in parallel GPU so that’s what gives Solana a burst.
Anatoly: Yep. Which is really cool because then you have this computer capacity for the most important use cases, right? the most important use cases can actually scale higher. And everybody else that’s participating in the network is sharing the same security that’s coming from this right. So I think it’s a, it’s a really awesome, powerful thing to have in the system. So that’s not done yet. TPU offloading is not done yet. But pipeline VM, that’s totally done. And that’s running. And you can go run move contracts, if you want to, alongside or us, then if you want to help any VM, we’re starting the work and not
Brian: Just in terms of performance, right, I think you guys have in test-nets now was it sort of 100,000 or a bit less,
Anatoly: We can talk about the designed in a configuration where we don’t rotate the block producer. We could saturate it in hundred megabits switch and that was going about 200,000 TPS when we added fault tolerance and rotations. That’s performance that go ahead. So I think right now, we’re seeing about 50,000 on our internal networks. And we’re been doing dry runs, you guys have participated and four dry runs, I think so far. The next one, we’ll see if those numbers hold on the open network.
Brian: Let’s talk about what one of the most interesting aspects of Solana. And one that’s not easy to wrap your head around, which is the consensus layer. So one of the key ideas that you seem to have there is this idea of having a distributed clock. And so there was this this quote, I think, was in an article or blog post, which is that one of the most difficult problems in distributed systems is agreement on time. So can you explain, first of all, why is this so valuable? To have this you know that all of these different validators have a shared agreement on time and why is it difficult to do this?
Anatoly: In a distributed system where people trust each other, it’s not that hard. So people can use an atomic clock they do in Google for Spanner. They just synchronize atomic clocks by hand, not by hand. But with into trusted engineers make sure that they’re synchronized. Or you can use a time server NTP. But the problem is when you have a system where nobody trusts each other, that we don’t have a common source of time. And you know, when you have two different watches, you never know what time it is. And if there’s an opportunity to be malicious and lie about time, then you can’t really trust the submitters clock either. So the way that tendermint deals with us, and correct me if I’m wrong, if this is the latest design, but tendermint, receives messages and if the time in those messages is marked ahead of what I think the real time is, I queue them up, right I basically wait to process those messages until My clocks are get to the point where whatever the timestamp in this messages. So this introduces delays, obviously. And you may think that those delays are not important. But where these delays start having an effect is when you’re trying to make sure that the system has the most fastest possible response rate. That means consensus and messages and state transitions between all these validators have to occur as fast as possible. You know, Tendermint i think is what three second block times right?
Brian: It depends. On cosmos, I think right now it’s five or six seconds, there are some Tendermint networks like Loom and Binance, that I think are running with one second.
Anatoly: The nodes are typically co located, right?
Brian: Not co-located but I think a few nodes, I think in looms case you have around 25, but they’re still globally distributed.
Anatoly: Okay, so We’re trying to cut down our time to 40 milliseconds and lower. And that’s when those those things start making a huge difference is if anytime the clock drifts is 50 milliseconds, that’s extra 50 milliseconds that every state transition has to pay in the network. And that means we can’t start from sending data right ahead of time. One way to think about clocks is just from that approach is how fast can the system respond? If and that’s going to be based on the air between all the clocks in the network, the drift?
The other way to think about it is the classic problem. Bitcoin’s block production, right? How fast can we produce blocks, and that’s based on this difficulty adjustment where you have a node that can come into the network randomly every 10 minutes. And because 10 minutes is such a long time, there’s the lower probability of two nodes trying to produce at the same time, right so you’re, you’re creating this very A slow resolution clock where you get a you know, the second thing ticks once every 10 minutes, right, your smallest resolution is 10 minutes effectively, on average.
Imagine, like first time people started building radios and radio towers, to radios transmitting over the same frequency at the same time you get noise. The radio waves collide in the in the air and they interfere with each other, and nobody can understand what the transmission is. That’s the exact same problem as with Bitcoin to block producers produce at the same time. Nobody knows what’s the fork, right? What is the actual best fork right now?
So how people solve this with radios is they synchronize clocks, and then alternated based on how well those clocks are synchronized when somebody can transmit. So if we had let’s say, one second, resolution clocks, right, we could rotate once every two seconds. So I get, let’s just make it simpler, I get every odd second, Brian gets every even second every time we transmit. The data doesn’t collide in the air, right? There’s no interference and everybody can hear our transmissions. So we know scale the number of participants based on this one second slot.
So you can have more nodes transmitting, more block producers, Because right now with one second resolution, right, I can fit 60 block producers in a minute. 60 different nodes can be the block producer and transmit. And that’s the importance of clocks for our specific system is that we’re trying to rotate leaders every slot in a slot as far in a milliseconds. Right now the best we can do is rotating them every four slots, so about 1.6 seconds. under load, right? So imagine you have one block producer every 10 minutes versus I don’t know this is 600 block producers in the in the 10 minutes. Which system has more censorship resistance? Right? The one that there’s 600 nodes that get a chance to add transactions are only one.
Brian: Okay, so the idea of rotating leaders often is because you think that makes the system more censorship resistant?
Anatoly: It’s actually the main benefit of it is responsiveness when a user submits the transaction in the foreign a millisecond block time, we get a confirmation for this block and then the application can take action. And with one confirmation we can actually calculate a very high probability that the block will be confirmed fully by the network.
Meher: So the way I tend to see this is need for very small block times and therefore a clock is you know, it it harks back to what we talked about earlier in the in the podcast. We talked about imagine this person with a home connection of hundred megabits and we said, well, hundred megabits, you can get 50,000 transactions into your node. Now you need a way of processing those 50,000 transactions. Now, if you look at Bitcoin, what’s the nature of Bitcoin? So if I have one of these nodes, I might get a block. And then I might spend, my machine might spend something 200 milliseconds or something that to process that block, validate that block. And then yes, my asics are going to mine, but then this machine that is processing transactions, it’s going to wait 10 minutes to hear the next block. And then 10 minutes later, it hears the next block, and then it spends 200 milliseconds processing that block. So if you imagine the utilization of this machine that is processing transactions, it’s it spikes up for 200 milliseconds then goes to zero waste 10 minutes. sits there idle, and then spikes up for 200 milliseconds and then saves it then takes 10 minutes idle.
If you have a system that, on the one hand, you’re getting 50,000 transactions a second yet you have a transaction processing system in which most of the time the transaction processing element is sitting there idle. So what you want is a system in which the transaction processing element never sits idle. It’s almost a system where it gets a block, process it for x time and maybe as a short delay and then gets another block process it for x time. And it’s nearly at hundred percent utilization. The machine is always thrashing, trying to keep up with all of the transactions that are coming in with the bandwidth. In Solana, that’s the nature of a validator, right. So every 400 milliseconds, it’s getting a block and so my machines shoots up 200% performance for milliseconds for me seconds it validates the block. But then as soon as that block is done, maybe in 20 or 30 milliseconds in next block has come in. So the utilization again shoots up to 200%. So the the machines are always at full capacity, that is the thing that you’re trying to do by shortening down the block times to perform in milliseconds.
Anatoly: And I mean actually having a larger group, right? If you have 600 block producers, over 10 minutes, you do have more censorship resistance, because there’s 600 block producers to get a chance to encode transactions becomes much harder to censor any particular client.
Meher: And the need for the clock in this system is that because this these leaders are rotating so quickly, it’s almost let’s say there’s like Anatoly is the leader first, and then Brian’s leader second, and Meher’s leader third. There is an inherent problem in this system where when does Brian know that he has to produce? Brian needs to know when he has to produce? So in Bitcoin the when is obvious, it’s when you receive a block from a different miner with the right proof of work that is when after try to produce. In tendermint the when is when the previous block has been confirmed, that is when I need to try to produce. So the question of this when is dependent on the consensus algorithm when the consensus algorithm finalizes something that is when the next person in line produces something. So Brian needs to wait for the consensus on Anatolia’s block before Brian can produce. And so this introduces delay between the production of two blocks in invariably will introduce delay. And what Solana appears to say is that Brian should be able to produce without there being consensus On Anatoly’s block. But in order for that Brian needs an accurate clock, right if he could give Brian a sub second clock.
Anatoly: I don’t want to say accurate. I think that the challenge here isn’t that Brian needs a accurate clock, Brian needs to convince the rest of the network that he’s actually waited the correct amount of time. So Brian, is to provide a proof that time has passed. And that’s where clock comes in, is that we don’t have an NTP style clock or a atomic clock, where everybody knows the time, what we have is a way to prove that time has elapsed.
Brian: And can you explain why is that the important part? why is this proof that time is elapsed, so valuable?
Anatoly: Right, so if I’m taking too long to produce my block, you don’t have to wait. You simply prove that you waited the appropriate amount of time and then you submit yours to the network, and the network, verifies that proof and then take action on your block immediately. But it doesn’t actually have this delay of waiting, hey, did the appropriate timeout actually occur anywhere in the network? It’s simply just validates your proof.
Brian: Right? And then and if you if you compare that to Tendermint, you have this whole communication right at the end of it, people the proposed to send the block and people have to vote on it and commit. And you know, let’s say there is a timeout, or it’s too late, right? Again, you have to have this round voting on it. Whereas in Solana, because I can prove I have waited that long. I can just move ahead. And there’s no need to have this round of communication around that.
Anatoly: Yeah, exactly.
Brian: And so let’s talk about how this clock actually works.
Anatoly: This was my 2017 fever dream. I had too much coffee and it was talking to the friend of mine that we were building this deep learning thing… about… “proof of work sucks… it’s mining, it’s using all this horrendous amount of energy.” But it has this amazing, feature to it that it’s based on a real world physical constant. That’s physics, right? It’s amount of electricity is bounded by the physics of the universe, right and on the world. And it’s really, really cool that is the civil resistance mechanism. And we’re trying to figure out, is there another physical constant that we can use? And we started thinking about time, okay, so time is one, can you build something single threaded mining?
So that’s what I had, I had too much coffee, and a beer and was up to 4am in this weird alpha state. And realized that you can build a recursive sha256 function, and it’ll generate data structure that represents the time has passed. So you have sha256, Same as Bitcoin proof of work. You use the output as the next input. So because it’s recursive, you can’t predict any other future states. But if you sample it, if you just simply record… Count 1 million “it was state X”, count 2 million “it was state y”, you generate the samples, you have a data structure that represents time passing somewhere for somebody. And you can verify this data structure faster. Because you can take to start and end of every sample and run that sha256 recursively. But you can take all those samples in parallel, right? So GPUs or your phone have over 1000 SIMD lanes these days, so my phone has over 1000 SIMD lanes, GPUs of 4000. So one second can be verified and sub one millisecond.
Brian: Right. And so in a way, the interesting thing here is that in Bitcoin, right, I’m a miner and I’m trying to find a block and I basically have this input of the block data, right I add some sort of random nonce, and then I’m trying to create a hash, right? That has enough zeros in it. And so I can do that in parallel, right? on a million times, right? So I can have this huge farm that all the same thing, but in parallel, but what you guys are doing is okay, I’m doing this once, and I’m using the hash of that as the input for the other thing, so obviously, you can parallelize that, right, because you have to sort of build on top of itself. But then when it comes to the verification of it, let’s say you have these 4000 iterative loops, in the end, what you have to verify is every single sort of jump from one to the next right, so if you can just verify all of the individual jumps, then you know that the entire history is Correct, correct. So you can basically take that and split it up. So what you do is you parallelize the verification which in Bitcoin, you don’t have to paralyze because it’s only one hash you have to verify but to production, you cannot paralyze, but in Bitcoin, the production is exactly what you paralyze. So it’s almost the exact opposite of Bitcoin.
Anatoly: Yeah, that’s a good good explanation. So a verifiable delay function. This is set of algorithms that do this. Technically, I think ours is not in the family VDF because the same amount of computational power is necessary for verification as generation. So more sophisticated PDFs have these cryptographic properties that allow the verification to be maybe the log time of verification computationally, but they have trade offs. So our VDF or if you want to call it a VDF, I just don’t have a better acronym delay function, we can verify DFV is probably the most secure construct because it’s based on the preimage resistance to shout to 56. And it requires no trusted setup, and it’s just guaranteed to not be parallelizable. So it is very, very simple to think about and code around with because things with trusted setups and those more complex cryptographic configurations are just, they’re hard, right? This is just brand new research that’s still in the white paper stage. And I would love to switch to those when when they’re ready. But what we have right now is just works really, really well.
Brian: Yeah. And then, I guess did the sort of idea is that because the existing hardware is pretty good at sha256 hashes. The system works even if somebody is a little bit faster than others, maybe you can explain that a bit right, because in the end, you still have differences in the speed that these hashes, these chains are generated. So how can you make a global clock, when there exists these differences.
Anatoly: So, Intel and AMD defined sha256 in specific instructions, for the AMD64 platform. So Thread Ripper has sha256 specific instructions, which perform a round of sha256 and 1.75 cycles. So and the speed of the cycle is really based on the cure and manufacturing process. So we get a TSMC or Intel’s fab. So no matter what chip you have, you basically are bound by the speed of this physical process, right, this physical bound. And because these are hardware instructions, they’re basically running as fast as possible. So Justin Drake from Ethereum and research and I have argued about this, but we disagree a bit. My view is that the high level way you can think about it as Bitcoin is the driving force was to optimizing sha256. So it’s optimizing for slightly different thing, right performance over power. So you’re trying to get the most dollars out of your electricity if you had a sha256 instruction ran faster for power than Bitcoin, then somebody would take that setup and use it in Bitcoin, right? So what you have with Intel is sha256 that does one round and 1.75 cycles. If you could speed it up by 10 X, that means that you’re driving 10 times more power through this circuit. And that is a physically impossible thing to do right now. It’s maybe possible but you have to design something that is super esoteric, optimized for cooling and extracting heat out of this thing. And any trade off you make in space, you make those trade offs with time because the circuits get longer. So I think a design that would be super, super hard to pull off. And the where a consensus model is set up is that 10 x speed up doesn’t really get you anything. So we’re not worried about an attack vector coming from a faster Asic, but we’re more concerned is that we want the network to be somewhat homogenous, and the speed of the SHA 256. So we would everybody in the network to have GPUs that have those hardware instructions, because then they all roughly run about the same time. And then the network behaves well, in terms of messages are delivered. They’re delivered on time, an unreal time.
Meher: In some sense, in a production, Solana network is a bunch of validators. And they’re saying, Hey, this is the CPU I’m running, and you know, how fast do the clock… maybe in the beginning of the start of a difference to us, but CPUs, but overtime, they’re going to naturally synchronize around the same speeds.
Anatoly: And every two years as TSMC improves their process everybody will upgrade, right? Then we get more performance and the clocks will basically stay about the same speed. And clock speeds haven’t really changed much in the last 10 years. So that’s something that is really, really hard to optimize how do we make a faster clock chip is, it’s just not something we’re seeing improvements on.
Meher: And so now okay, so these validator, at least the honest validators, which presumably is actually a high percentage of network end up synchronized and synchronized blocks, but then an alien from tower SETI comes to earth and they have hundred x faster hashing, and they get 5% of the voting power, what could they do?
Anatoly: So here’s the thing is we would have to jump into how our consensus works. But basically, they could potentially censor the previous validator, or maybe the previous two, but really no more than that. And they could try to propose blocks that censor the previous validator. They basically generate a proof that they waited the appropriate amount of time, but because their asic is so much faster, it means they didn’t. From the perspective of the client, they will see the network behaving exactly the same way, right? Because blocks are generated at a high rate no matter what. So they submitted transaction gets encoded, clients don’t see real time drops. The previous validators that this producer’s censoring will see a reduction in reward because they’re missing the reward that comes from being a block producer. Right now we burn half of the fees and the other half goes to the validator. The reason we burn half of the fees is partly for this attack vector, that burning the fees basically rewards everybody in the system, right? And it’s their secondary for choice rule that if you try to produce a block that is contentious, in this case, right, Brian outran Anatoly I’m still producing my block but Brian has faster asic so now we have two concurrent blocks. So how does the network choose between mine and Brian’s is based on the amount of Wait, that each block is proposing for consensus. And the secondary choice rule is how many fees are burned in each one. So that’s how we’re able to do that determination. So is this censorship attack vector? Terrible that we should stop, stop what we’re doing? I think we’ll figure that out. Right as we go along. From my perspective, what you’re censoring is the small number of rewards because every block has a very small amount of rewards, right? And for Brian to be effective at this attack, he has to reveal what he’s doing and do it for a long time. And therefore, everybody knows that Brian has the faster asic and the network and respond, we can then take action.
Brian: Make sense, there is a whole other part to the consensus right, which is basically people are valid sort of betting on the forks are on which chain will persist.
Anatoly: Yeah, so the problem with consensus, right? Is that You receive a block in 400 milliseconds, you have no clue what the rest of the network thinks about this block, whether they’re going to vote on it or not, whether they receive this all of this data or not, right? And you have to make a decision to vote on it or not. And if you make the wrong decision, then you pay a penalty, right? Because maybe you’re now in a minority partition. And if you would have waited a little bit, you would have seen the real, the better block that the network is voting on. So how we deal with this is that whenever you vote, you have a small commitment to safety. You know, CAP Ethereum is safety versus liveliness. there’s always this trade off when I vote on a block a commit for two slots for its or two blocks worth of safety that in the next two blocks, I will only vote if those blocks are children of this of this term block that are voted on. If they’re not children those block then I can’t vote and then it can be slashed, if I do so, but Every time I do this vote, all the parents, all the parent votes, they double their commitment to safety. So that commitment doubles. And that’s the exponential curve. So after 32 votes right to the 32, at 40 milliseconds is 53 years worth of commitment to safety effectively infinite, because there’s no way I’m going to stop my node for 53 years, and then wait for an alternative fork and switch, right. That’s effectively how our consensus mechanism works. It’s this very small amount of risk for any decision that you currently make when you have your information. But then that risk increases as you get more information from the network, right? Because as you see the future blocks being produced, all those blocks contain votes, that are representing everybody else’s commitment to safety and the network for all the parents. And that’s how you make progress right? When I choose to increase my country commitment from 10 minutes to 20 on any particular block, I want to make sure that the rest of the network is at least on 10 minutes or five minutes. I configure some threshold where some threshold function that I observed the rest of the network is on five minutes. Therefore I go from 10 to 20.
Brian: Yeah, so you almost have this shelling point right? where you say, okay, we’re trying to agree on a particular particular block. And if I’m sort of early, and say I think its going to be this block right? then I earn a little bit more money and then if other people see that and they follow that and then everyone follows that then very quickly, the cost of changing your mind is so high that you know this is going to be block right so yep, that’s great. So let’s let’s talk about you know, a little bit more on the high level stuff. So what you guys are trying to do is create a smart contract blockchain with you know, huge amount of capacity and so, one of the interesting thing is that I think I read this in the multicoin blog, that’s a nice article. So they phrase it okay Solana creates abundance where there is currently scarcity, which is this trust minimized computation. So we were thinking about this Marin and I, we had a company retreat roofing is okay, let’s say you wanted to build on a smart contract chain today, what is there? There honestly, is almost nothing right, that currently live out there proven work. So it is Ethereum. And obviously, there’s a overcapacity and struggling so but you know, many people working on this on bring more capacity online. Let’s say you guys bring a huge amount of capacity online. I’m curious is, is the idea for the Sol, the Solana token to accrue value because people needed to pay for transaction fees? And if if the supply of transaction space is so massive, doesn’t that mean the fees are going to be really low?
Anatoly: That’s totally That is an unknown. How does how does the token accrue value? I don’t know. I think that’s that’s a fundamental problem with the space that maybe we’ll be the first to break it and then see what happens. And I think better economic models need to be built. This is how I think of it on the high high level. What we’re trying to do right is build a network that basically scales up with bandwidth. And the scary part about it is that you run your home node, say it’s 100 megabits, and your steady state, maybe they were wildly successful network is doing 5000 TPS, steady state. Right. That’s the crazy amount of transactions honestly, right? That is absurd amount of transactions right now for crypto because of how small the spaces and all of a sudden, there’s a spike that requires 200 megabits, you can switch to a cloud provider, within seconds, right? And then process that data and then come back to your node you can just scale up and demand so the costs of actually running the validator are going to be the lowest cost possible, right? For the steady state, with spikes up to the more expensive hardware when you need it. But the price for the fees are going to be based on the maximum capacity of the network. Because if we can actually handle two 300 megabits, and that’s what the fees should be priced that right, because we want to fill it up.
So effectively, there’s almost two forces pushing at it, that it’s cheap to run the hardware. So the barriers to entry if you do the software, right, if you do the auto scaling well, are pretty low. There’s the high barriers to entry or the know how, right how to run the system is a lot more harder than running something in your laptop that is just a desktop app, right? You actually have to build a configure it and do some monitoring, right? So maybe the operational costs are higher, but the actual hardware barriers to entry I think are low and capacities high right as high as we can get People to auto scale up to the best available hardware to them, which is 10 gigabit connections in AWS and Google Cloud, right? There’s nothing we can do to stop people from doing that. So what does that mean is they think fees can be basically at the lowest possible point above spam.
So how does the value accrue into the token? I think, ultimately, there has to be a better business model in the space for economics in the network that depend more on the volume of what’s actually being done on the network. And I think part of that is going to come from basically program stakes being treated as collateral in defi applications, that this underlying collateral that’s right now driving the security of the network, that can’t be just stationary and static, right? It can’t just be staked and do nothing. We actually have to start using it for collateralizing BTC channels Right building all these more complicated applications to where it’s being utilized for something more. And then if I have 10 BTC, that’s being transferred through Solana, right and high frequency people trying to do high frequency price discovery for this BTC. And the collateral that ensures that wants to price is agreed upon, that it’s going to go through, right, that has to be based on the amount the volume, and that’s where I think we’ll start seeing more realistic models,
Brian: So you’re saying that soul is money.
Anatoly: Sure. I don’t know, right? This is I have no clue, right? I’m just being honest.
Brian: If you look at a traditional business, right, so then a traditional business doesn’t try to price things based on how much does it cost to produce the thing right, it tries to price being on how much value is it is being delivered and trying to you know, capture a significant portion of that value being delivered. So if you think of this as a blockchain network, I mean, presumably they could do something similar, right? let’s say, the Solana is able to process transaction at a thousandth of a cent, but there is no good alternative. And people want to use that system. And you know, it’s secure enough and people are willing to pay 100% then they could potentially be sort of revenue capture that way.
Anatoly: Except that at any time, people can take our open source code, and our community, a third party validators and tell them to run Solana X.
Anatoly: So effectively, and get everybody just switch, right. If the businesses that use this have a cost associated to them that’s so high that it’s above that threshold and they’ll switch.
Brian: If you look at Ethereum today, I mean, you do sort of have that situation right? You have Ethereum Classic, which is you know, technically I think For the most part, probably equivalent network, but we have much more throughput available and okay may not be as secure. But you do have that but people still building on Ethereum. So I think you have certain network effects that still difficult.
Anatoly: I think the markets basically will decide if fees can be in revenue or not, I don’t know. Honestly, my perspective is, again, our goal is to drive the fees to the lowest possible point, and then build the next generation things. Because I think those are more interesting. Because the cost of hardware’s so low. As soon as we saw solve the software, problems with that make operations slow that also becomes cheap. And then if we have this amazing state machine that’s globally distributed that is as cheap as possible for trust minimizing computation then let’s do cool things there. Right? Let’s see how far we can actually take the space. So to me that’s my dream, obviously the open source project with an open community and a lot of input from a lot of different people with their own what they want to do, but at least for myself, I’m this is what I’m going to drive is the increased capacity to the point where it’s ridiculous. I want it to be absurd how many transactions we can process and how fast we can do it?
Meher: Yeah, it’s, it’s an interesting question. It’s the it’s the question of now you have lots of blockchains with capacity. And if Solana could some way ensure a monopoly, everybody wants to come here, and then maybe the optimal might be only we use only 37% of the capacity, but we charge high and that’s, that’s the business model if they could be a monopoly. But then the opposite world is there’s so many chains, open souce, the different designs, our I sector, that it’s very hard. to secure a monopoly and then there is it tends towards perfect competition. In a perfect competition space where it will be very hard to preserve any transaction fee levels. And so their transaction fees cannot be a business model.
Anatoly: I think monopoly is very hard to build in this space. For proof of stake networks, especially with Bitcoin mining, we have a six that are in this community, it’s really optimized for this one business plan. They have a business, right, the mining everything can it depends on this mining, capital investment cycle. That’s very hard to compete with, because you need so much folks to be on board with this new idea. So there I think it’s hard. And what they’re maximizing is not throughput, its security, which is totally fine. right to have the most amount of security in a single chain. That’s a really, also really cool thing to have. But with proof of stake networks. I think you have Maximize utility and I think the only way to do it is to make it as cheap as possible to utilize.
Brian: Yeah. So speaking of utilization, one keeps hearing about plans for a decentralized exchange in Solana. So I’m curious about what your plans are, and what what dex Would you to build? Because the interesting thing in the crypto space appears to be there is not one but there is five or 10 interesting dex designs. Nobody seems to know which one is going to win out.
Anatoly: Yeah, I think there’s there’s two designs. One is ours, The other one is tBTC, which is where you have a, a distributed group that takes custody of assets and then it’s then it can behave like a centralized network with some different security properties. And that’s cool. What we’re doing is it’s pretty simple. We run the order book and matching engine on chain. So execution and clearing contract, On chain, right? Traders can submit price updates in a real time, high frequency and matching agents run off chain, but matching occurs on chain, so you have an audit trail of what things were matched. So one thing that’s very apparent is very easy to do is that you can guarantee that the matching engine will match the first order entered before anyone else or they get slashed. So things that you can build because all the trading is occurring on chain you have some better properties for for an audit trail and better tools for building something that is more fair to the to the traders involved. We still have this problem of block producers that can basically censor transactions or insert their own ahead of time, if there’s an opportunity for them to do so. That one is hard to fix. There’s some some cool designs out there. Maybe we can talk about them. So We actually have a that built as part of our GitHub. And there’s the bench client that we ran, that we could demonstrate 30,000 price updates per second.
Brian: Cool. So I know originally you guys are planning to launch around now. of course, as all blockchains it always takes longer. So what are the main things remaining to get done until Solana main net launches.
Anatoly: So there’s some tail end stuff that we’re working on, which is making sure that when we launch we have a secure custody solution for everybody. That’s been taking longer than expected. Honestly, I think we should have expected it to take longer, because when you think about it on the surface, that’s an easy problem. But then when you really think about it, it’s very hard. Just delivering the tokens to people and having a secure way for them take custody.
Brian: So the custody solution means that… I don’t know, why companies bit go or one of those that they support it, is that the idea or?
Anatoly: So that’s one of the one of the approaches, but we need to give people a way to do it on their own right in a secure way. That doesn’t involve them storing plain text, private keys in their computer, right? And then who do we trust we trust ledger or trust? Or do we do paper wallet? How does that interface with our command line client that actually does all the staking operations and things that? So that’s one of those things and we want to make sure we get it right. Even if basically optimized towards security there and less convenience, because that’s more important in the early days, and then, start making things more convenient.
And the other one is, the real delay… We’ve been doing dry runs where we fixed a bunch of stuff in our google Cloud based network, or AWS, and then we boot with a validated community and a bunch of stuff falls over. So what assumptions that we made early on was that we would have high MTU paths. So we’d be able to send reliably jumbo frames across the internet. And that assumption was wrong. We had to go from the assumption of being able to send 64 kilobyte packets down to 500 to 1000 byte packets. So the networking stack is effectively doing 50x more work, roughly about 50 times more work, so we had to optimize the hell out of that. And that’s done. Now we want to do a dry run 5, where we want people to beat with GPUs. And then we’ll see what falls over when we’re going to ramp up the transaction throughput and then see if we can match what we see internally, which is 50 to 60k sometimes…
Brian: And so what’s the what’s the goal in terms of the launch date timeline.
Anatoly: I think I’ll probably announce the dry run at the validator meeting and a half hour anyways, so I think it’s going to be October 29. So we’ll do dry run five. And if that works, because we want to launch with at least 50,000 tPS capacity that will do our stage one of tds which is stress test the network for about a week. And then I think we’re ready for main net.
Brian: So maybe sometime q1 probably.
Anatoly: I think I mean, the earliest I think we can do it as dry run runs fine. Right. 29th and then November next week, we do stage one that runs fine and then a November 1 launch, that would be the fastest timeline but no failures. Okay, cool. I’d love to set up an auger bet on the engineering timelines based of blockchain projects. Is there going to be a main that by date x for any team, what we’re doing everything open so you guys can actually go to our discord and bug us about What’s taking so long? We can point you to the remaining issues.
Brian: Yeah, absolutely. Well, Anatoly, thanks so much for for coming on. I’ve taken quite a bit of time. But I think there’s there’s so much to dive into. Yeah, I mean, we’re really excited to see so long ago live and to see what will be, you know how the network will evolve, how it will play out in reality. So super excited for that. And yeah, thanks so much for joining us.
Anatoly: Yeah, yeah, this was super fun. Thank you so much.
- Solana Website
- Proof of History: A Clock for Blockchain
- The World Computer Should Be Logically Centralized - Multicoin Capital
- Multicoin Capital: The Separation of Time and State - Multicoin Capital
- How Solana's Proof of History is a Huge Advancement for Block Time
- Tower BFT: Solana’s High Performance Implementation of PBFT