Episode 283

The Graph – A Marketplace for Web3 Data Indexes Based on GraphQL

Yaniv Tal

At the core of the Web 2.0 stack lies the REST API. It’s the fiber which allows frontend applications to communicate with their backend counterparts, as well as the services on which they depend. But the API model is highly constrained and inflexible. The API is divorced from the data model, which creates a number of restrictions and inefficiencies. Most blockchain clients, including Geth, Parity and Bitcoin Core, use a JSON-RPC model which suffers from similar issues. Several Ethereum DApps maintain high-availability, centralized data indexes which sit between the client and the blockchain. Thought user experience is greatly improved, the practice means most of the ecosystem relies on centralized infrastructure.

We’re joined by Yaniv Tal, Project Lead at The Graph. The project aims to create a scalable marketplace of robust and high-availability blockchain data indexes. Relying on the modern GraphQL data query language initially developed by Facebook, The Graph allows developers to make complex queries to a robust and high-availability data infrastructure. Launched as a hosted service earlier this year, The Graph plans to move to a decentralized model in the future.

Topics discussed in the episode

  • The vision of The Graph and why the team chose to work on this problem
  • The REST API client-server model in the Web 2.0 paradigm
  • The state of the Ethereum ecosystem and the challenges relating to data availability
  • How DApps work behind the scenes and their backend infrastructure
  • GraphQL as the evolution of the API model
  • How The Graph addresses the issue of data querying and availability
  • Their hosted services and plans to move to a hybrid model
  • How The Graph addresses privacy and scalability
  • The incentive mechanisms and economics related to data integrity
  • Early applications and the project’s near-term roadmap

Brian Fabian Crain: So we’re here today with Yaniv Tal and he’s the founder and the Project Lead of The Graph. Now we’re going to speak with Yaniv about you know, a lot of different things about querying and you know, probably one aspect of the decentralized application Web3 stack that a lot of people are not as familiar with so thanks so much for joining us today Yaniv

Yaniv Tal: Thank you so much for having me.

Brian: Maybe where we can start is if you can speak a little bit about how you became interested in blockchain and the perspective you brought a blockchain.

Yaniv: Sure, so I first started getting interested in the idea of blockchains in 2011 I think, you know following, you know, the Occupy Wall Street movement, I became very interested in thinking through how we could improve governance and the economy through digital money, but I didn’t actually make it like my day job until 2017. So a lot happened in between, I spent a lot of time in different startups. I did my own startups with my co-founders here at The Graph and focused on developer tools for quite a few years and the perspective that I took into blockchains, I was trying to figure out how to make it easier for people to build software just in general so we had a start up where we’re doing reactive developer tools to make it easier to build user interfaces. We got into functional programming looking at building on immutable data and along that path what got me really excited about Ethereum  was the idea that for the first time we could have this kind of global immutable data base that isn’t just bounded by a single organization. But so that everybody can agree on a global state of Truth and that to me was just really exciting as being like a foundational primitive for just how software development could move generally.

Brian: What was interesting, I listened to another interview you did and it was interesting to hear you speak about blockchain kind of from this perspective of okay developer efficiency, you know because generally when people speak about developing blocks in applications, everyone complains, so it’s so inefficient so hard to build and you know, there’s these other benefits like maybe decentralization and you put up with all of the hard time of developing something because there’s other things that make it so much better and now you’re kind of looking at from the perspective, oh actually it can make it easier to develop so I thought that was super interesting to hear.

Yaniv: That’s exactly right and I think that this immaturity is just because it’s a brand new platform and we’ve spent like 20 years building up the web platform to what it is today. So at the beginning people were building their own web servers in like C++, and that wasn’t easy but they put up with it because you know, the web was exciting for them. And so similarly I think we are in the infrastructure phase and I think as we build out more of this infrastructure there’s no reason why it wouldn’t be even easier to build on Web3 then it is on Web2 and I’m quite confident that that will be the case, but there’s plenty of work that needs to be done before we get to that point.

Sebastien Couture: So why did you guys decide to start working on The Graph?

Yaniv: Well, we got interested in Ethereum seriously in early 2017 and we started building different daps and quickly stumbled upon this problem that it’s actually really difficult to get data that you need to power a web or mobile app directly from an Ethereum node. You know, I think we got into the space probably for a lot of like idealistic reasons, the same reason a lot of other people got drawn to the idea of decentralization, but The Graph specifically was basically partially realizing that there’s this indexing and query layer is completely missing from the Web3 stack today, and we’ve actually spent a lot of time working on this part of the stack at previous companies. So for example at our last startup, we built a custom framework on top of an immutable database called Datomic, and we built you know, this graph ql like query language on top of this immutable database and we just did that kind of out of passion just because you know, we’re always trying to refine what’s just the best way to build software, what are the right abstractions just for building applications. And so, you know, we’d already spent a lot of time thinking about that before we got into Ethereum. And then when we saw that this part of the stack was missing for Web3 we decided that’s where we focus our efforts.

Brian: One thing I thought would be interesting to speak a little bit about and that that I thought was really fascinating is the way you guys described kind of the problem of you know, the way databases and API’s work in Web2 and you know, the kind of effects of that such a rigidity of API’s, duplication of data and effort. Can you explain a little bit how that works. What’s the status quo here and what are some of the downsides of the status quo?

Yaniv: Yeah, so today every Web2 company is basically running a fully vertical stack. They manage the infrastructure, the database, the application, the user interfaces and you really have to trust these companies to continue to manage this data and make it available to their applications and then you end up with these kind of data silos where you know, the model for Web2 is you build some kind of, you know process around some data and then you restrict access to that data and that is generally called like a moat and you know how well you’re able to build a moat basically determines how successful of a company you are in Web2 and I think that’s just very limiting for enabling innovation to continue. And if you kind of look at this larger arc of you know software development, it’s taken maybe 20 years to really figure out how to build applications that people really want to use, you know, even just on the web and we’ve seen the transition from you know, PHP to you know, Ruby on Rails to client-side frameworks and micro services and data science and all of these different disciplines where we were just figuring out how do you actually deliver software its scale globally on the web and you know, the companies that you know, figure this out we’re able to build these modes and data silos where they control access to that data, but we’ve kind of gotten to a point now where first of all these are more or less solved problems. And so people now know actually like how to build a full stack web applications that scale globally and yet we’ve continued to have these kind of, you know monopolies that now kind of control access to that data. And so it seems to me that the next kind of phase here is to basically commoditize what we’ve collectively learned and to actually make that data more generally available so that we can continue to make progress.

Brian: And then the thing that you guys described is if you have like the current paradigm where you know, I have some data and maybe a make an API available, but then I have to decide you know, what gets exposed in what way and so I have to somehow anticipate what people are going to try to build with it. Right? And then if they try to do something else, it doesn’t really work and so..

Yaniv: Yeah, so that’s a big problem with REST. So basically the way REST is kind of evolved is initially the idea is you have one endpoint per resource. And that way if you have you know, some interface that requires you to combine multiple resources together, then you can make several round trip calls to each of those different endpoints and that’s how you get your data and as we try to build applications that load really fast, we realize that you actually can’t make all of these round trip requests because otherwise you’re going to be staring at a spinner for too long. And so what companies have ended up doing is building custom endpoints for their different user interfaces so that you can load the data that you need really quickly. But the problem is that now you have this tight coupling between that one API and the client and then so if I’m building a new feature suddenly, I need like, you know a new field to put up in the interface I have to go talk to the server engineer ask them to add that field and it really slows down the pace of development. And so Facebook is a massive company, they’re constantly adding features and building new products and they like everybody else, you know was grappling with this limitation and because of that they invented GraphQL which is a query language that is really built with clients in mind so that you can specify a schema upfront on the server and then the clients can request exactly what data they need and get just that data back in a single response.

Sebastien: Okay, that’s that’s a good explanation. I think one way of looking at this issue is that API’s themselves are sort of data silos. So for every resource there is a data silo if the company building the API has actually need for that silo they’ll build it into the API or if they think their users will have need for that silo they’ll build it into the API. So when we look at this is to say okay if I need to know the data relative to the organization, which is attached to a user, well first I have to make a query to the user endpoint which is kind of a silo in itself and that will return the data with the user so name, you know, bio, photos etc,and then an ID for an organization, which I then need to query to retrieve the information about that organization, so in this case it would require two queries, whereas with something like GraphQL you could do this in one query. So at this point, maybe let’s dive into GraphQL and how that sort of changes the paradigm with regards to SQL how doing things for the last 20 years plus?

Yaniv: Yeah, so GraphQL is a query language that is really built with application developers in mind. And so you can define your schema which is basically all the different entities and how they relate to each other. So in the example you gave you say this is an organization, an organization has many members and these are the different fields on the organizations in the members have and then depending on what screen I’m building, what particular user interface I want to build, I can specify the fields that I want and I can seamlessly traverse across these relationships so that I can get whatever data I want back in just a single response.

Sebastien: So it’s effectively putting a query language at the API endpoint itself. So rather than having a database that’s within the company, you know siloed, closed off and then you need to build the API endpoints for each of the queries that you think your users are going to be using, its effectively putting the query language right at the at the edge where the user can access it directly and make those complex careers in a very seamless kind of way.

Yaniv: Yeah, that’s that’s exactly right and you know, there are a few, you know restrictions on what you can do. So it’s not as generally powerful as clear language like SQL, but it actually maps really well to what you need if you’re building an application. So one of these limitations is that you basically need to know how all of this data is structured ahead of time. So you have to define that schema and if you have data that you want to be able to query you can only query according to that schema, but you know that actually is how most applications are built. So you generally know how your different entity types relate to each other ahead of time and you can for example denormalize your data to make you know, aggregations and various computations that you want available ahead of time. You can also parameterize these things. So so GraphQL allows you to for example, if you want to query a collection you can include parameters so you can do for example pagination or filtering or any of these kinds of things, but essentially what you get back is the JSON object that exactly matches the structure of those entities and feels that you requested.

Brian: So there is a nice blog post, we will link to it in the show notes, to where your co founder talked a lot about, you know, kind of the benefit of GraphQL and one of the ways he explained it was that with SQL-based databases you have this over fetching in under fetching problem. Can you explain what that is and how GraphQL solves that.

Yaniv: Yeah. So this is a problem with most REST API endpoints. So typically with REST if you request a resource you get back all of the fields that the server engineer thinks you might want for that resource. And so it might be that I want to render a card that just has user’s name and profile photo, but if I request the user objects, I actually give back, you know, a hundred different fields everything that the server knows about that user. And so that’s not very efficient. The converse is maybe I want the user’s name and photo and also the count of their friends but the only way that I can get back the friend information is if I you know actually request all of their friend objects, and so now you’re also sending me back, you know, one friend for object for each of those friends. And so these are the types of problems that you end up with with REST API’s all the time. And so, you know GraphQL solves that very elegantly by just sending you  back the data that you’re requesting, no more, no less.

Brian: Okay. And so then this is something that Facebook basically built because okay, of course they have so many queries internally they were just okay we need to save costs, save bandwidth, make it more efficient and built this kind of QL language because of that,

Yaniv: That’s right. The other thing that facebook deals with is, you know with mobile apps people tend to run old versions of these applications and so they’re supporting you know, probably hundreds of different client versions if not more, and so that makes it really difficult if you have this tight coupling between the user interface and the API because you know, the backwards compatibility becomes a nightmare to support and this is actually quite similar to the situation that we have in Web3 where you have a protocol that is going to be supporting a large number of clients. You know, one of the big benefits of Web3 is you can have a lot of different applications built on top of these protocols. And so to be able to support multiple applications and multiple versions of those applications you end up having the same exact kind of situation.

Sebastien: So in one of the blog posts that you’ve published you mentioned that GraphQL will likely be the preferred query language and database of the decentralized web. What did you mean by that? And why do you think that is the case?

Yaniv: REST QL has always been rising in popularity, just In traditional web development. So there’s a lot of companies like GitHub and Twitter and Yelp and many others that you know have already switched to GraphQL. And so this is already like a really big trend happening in web development generally and really what you need when you’re building applications is essentially like a standard for how you want to access your data and you basically need an abstraction and we believe that GraphQL basically is just the right level for this sort of abstraction where you don’t want to have to know what blocks some data was updated in and all of the like internals of a blockchain if you’re trying to build an application. And so having an abstraction on top that is just the data model is we think the right level for how people will choose to build their apps and then you can kind of deal with all of the implementation details on the back side of that GraphQL API.

Sebastien: All right now switching over to daps and so how do these problems relate to the dap space? I mean because in the dap space we don’t have databases or SQL or any of that stuff. We’re just turing the blockchain right? You know, why would we need a database on top of the blockchain?

Yaniv: Yeah, so, a lot of the applications that were built in the early days of Ethereum were very simplistic, but you’re exactly right that basically this indexing and query layer was completely missing from blockchains. And you know, I think it’s just a function of these blockchains have a lot of work to do, it’s a lot of work just to maintain consensus. And so Ethereum for example is focused on these scalability problems and how to make it so you can build these smart contracts, but indexing is very much its own kind of layer in the stack and its own problem all together. And you know, I remember when people started building daps on top of Ethereum, maybe there’s only ten transactions that have gone into the smart contract and so you want to view whatever your application, and it’s easy to just show 10 things on a screen and feel like everything’s okay. But if you remember, the applications that start to take it off suddenly, there’s like a hundred things on the screen, a thousand things on the screen, and suddenly you just have this really long list and you’re scrolling really far and anyone who’s used like a good web or mobile app would never think to themselves that that’s acceptable experience. But we’ve been kind of building toys and it’s kind of this growing up process to go from having like a proof of concept to having an application of people are happy to use, and I think in that transition a lot of people have realized that like, wait a minute, you actually need to be able to you know, search for specific data in filter and sort and paginate and these are all things that you need to have indexes to enable.

Brian: And so one company that I think has a quite significant in the Ethereum space is Infura right, which is a consensus project that serves tonnes of API calls about you know, is that basically the problem that they aim to solve as well?

Yaniv: No, so Infura doesn’t do any indexing, Infura is just kind of a managed Ethereum node service. So it is nice to not have to sink a whole node in order to start interfacing with Ethereum. And so that’s the problem that Infura solves. But then the indexing problem actually happens a level on top and so assuming that you can talk to an EThereum node, you basically have access to the JSON RPC interface. That’s the interface of the Ethereum nodes exposed. And this JSON RPC interface allows you to get certain fields that are in storage in your smart contracts, but you’re very limited in how you can access that data. And so for example, if I’ve got some some field that’s account that’s stored in my smart contract then I can get that count. But if I have a list that’s for example, let’s say that I’m a marketplace and I have a bunch of different listings of things that people can buy and sell in the marketplace and I want to filter to find a specific category of listings and then I want to sort that to just get the most expensive items for sale in that category, for that I need to be able to do filtering and that’s not something that is exposed in that JSON RPC interface and it’s not exposed because it would be too expensive to actually expose that without having indices and Ethereum nodes don’t maintain these indices. And so that’s that missing indexing layer.

Sebastien: So how are companies building the daps and all these projects that need indexing solving this problem now because a lot of daps are relying on indexing of data in one form or another and if Ethereum is fully decentralized, so there must be a way that we’re dealing with this issue.

Yaniv: Yeah. So every project that has gotten to the point where they’re trying to build a really good web or mobile application has hit this issue and what most of them have done is built these custom proprietary indexing servers. So they’re like, you know, crap we can’t actually, you know run the queries that we need. And so let’s build a server that sinks data from Ethereum, stuffs it into SQL database, and serves it over an API and then our friends will just hit that custom API. I’d say that that’s probably what like 90 plus percent of the projects have done so far. The other option is if you want to keep your app completely decentralized then you can try to just sync everything on the client. So basically, you know in that case where you want to let people filter particular listings, you could load up all of the listings on the client and then filter it locally on the client and there are a lot of applications that do that also. So, that second option only works when you have some small amount of data or if you’re willing to make the users wait a very very long time before you can show them a screen. So some applications have chosen to do that. But you know, the alternative is you have to run and operate your own servers and you know, it’s think it’s one of these cognitive dissonance things where we’re trying to build daps and a big part of building a dap is this idea of it’s completely serverless and you don’t have to trust anyone to operate servers and infrastructure and yet in order to build applications that are actually usable we have to do exactly that.

Sebastien: Okay. So let me get this straight just so we’re clear here. So currently what you’re saying is that 90% of Ethereum daps are writing their own proprietary software that’s sitting on top of the Ethereum node that is being hosted on server infrastructure and they are serving SQL queries through API’s allowing users that are using those daps on their clients. The databases of these centralized sort of choke points before they can even access the blockchain database.

Yaniv: That’s exactly right. Very well said.

Sebastien: Okay great. I’m glad we got that settled. No, I’m being a little facetious here because this is something that I’ve kind of stumbled upon recently. I hadn’t really realized what extent this was actually happening and through digging and speaking to a lot of dap developers about both different devops problems that they were having realized that yeah this is a big issue and one of the ways that people have talked to me about potentially solving this is making this proprietary layer open source. So then the idea is that different people out of the goodness of their hearts would host this infrastructure, for example, you have dap A, and dap A has all this proprietary and infrastructure, they open source that stuff and then users of those daps, and maybe those who were particularly interested in making sure that their dap client runs well, and maybe they need to access to certain types of data. Like for example, if you’re running a fund on one of these daps might host that infrastructure. I just thought well if you don’t have the incentive mechanisms there to make that work it’s unlikely that we’ll ever get past this point of hyper centralization.

Yaniv: Yeah, you’re exactly right. We want Web3 to operate on top of this public global infrastructure and if we want that infrastructure to be sustainable then payment needs to be built in, there needs to be incentives for operating the infrastructure and that’s the only way to ensure that it continues to be available.

Sebastien: So my next question, maybe I’m missing something about the technical infrastructure of Ethereum, but why wouldn’t we just have this query language built into Ethereum and why would Ethereum nodes themselves simply not expose GraphQL endpoints so that the clients would have direct access to the node and direct access to these GraphQL queries.

Yaniv: So there actually is an effort to introduce GraphQL natively into the nodes as an improvement for JSON RPC because Ethereum nodes actually do maintain a few indexes. So for example, if I want to get the ether balance for a particular account, that’s pretty fast because that is something that’s indexed internally. So that is something that we’re going to be seeing. But the reason why the Ethereum  nodes don’t maintain more indices is that maintaining indexes is actually really expensive and which indexes to maintain is a function of which applications are getting built on top. So, you know with traditional SQL databases usually have like a DBA for example that’s looking at like what are the slow queries, let’s add these specific indexes to make these queries faster and there’s no way for the Ethereum nodes to know which indices are the right ones to maintain and there’s no incentive for them to do that. So generally with software where we look to build things in stacks, where there’s layers that build on top of each other and I think it really makes sense to separate out basically layer one from these, you could almost call this like a layer two problem. Layer one is really concerned with things like consensus and data availability and then what they produce is blocks and a single global state that everyone can agree on and then the problem of how do you organize that data so that it’s easy to access for all the different applications that want to build on top, really sits very cleanly as a separate layer on top in the stack.

Brian: Cool. Well, let’s dive in a little bit in The Graph. I mean we’ve spoken about the problem of databases and querying in traditional web application in the decentralized context, we spoke a little bit about GraphQL. So what is it that The Graph brings to the table?

Yaniv: So the graph is an indexing and query protocol for blockchains and storage networks. So we index data from these different Web3 data sources and make it available over GraphQL. So the very first thing that we set out to do was to basically build a standardized way of doing this indexing. So we’ve already kind of talked about how you know, most of the projects in the space have done their own custom proprietary indexing servers and the first step towards being able to kind of introduce this indexing layer in the stack is to basically come up with a standardized way of doing that indexing. So we launched our standalone graph node, July of last year, and open sourced it and basically that defined the developer API’s for building what we call a sub graph. A sub graph basically defines how to do this indexing work in a way that can eventually run on a decentralized network. And essentially what you do is you define here the data sources that I want to listen to, here’s a mapping script, so it’s a turing complete language or way that you can transform that data at ingestion time. And here’s the GraphQL schema for how I want to be able to query that data and with that subgraph definition you now have a standardized way of indexing that data and making it available over gradual.

Brian: Basically the way it was before is people built their own SQL databases and their own way of turing that database. And now you guys say okay there is basically a standard way of doing it, you can use this graph node and then use the sub graphs which is basically kind of like, okay I want to have this data as long as I comply exactly to this format I can sort of add it and in the end is the idea we will have a graph node that contains all of the different sub graphs or would I have like, maybe my local node with different sub graphs than you do.

Yaniv: Eventually the goal is to combine all of these graph nodes together into a global decentralized network. So any graph node itself can choose which sub graphs it wants to index. And what we want to do is actually open this up to a global marketplace and use market forces to do the resource allocation for which nodes are indexing what data.

Brian: So yeah, let’s get to the public decentralizing network in a little bit. But first right now you guys are running basically a hosted service. Why did you decide to start with that?

Yaniv: So, you know, we really believe in just shipping early and often and that the best way to build software is to get it in people’s hands and to work closely with users in developing that software and so it was really just kind of a pragmatic choice for basically coming up with what are these intermediate milestones that we can hit, where we can you know ship something, make sure that we’re solving real problems for developers and then improve it over time, so the first milestone we had was just open sourcing the graph node, and we learned a ton after that milestone because projects were able to build on us and so we were able to collect quickly improve the software at that stage, but it was still kind of there was this barrier to entry where people would have to run their own nodes if they wanted to use it. And so the second milestone for us was launching the hosted service where we could run a bunch of nodes for you and you can just deploy to our nodes and we have a really nice user interface where you can see the status as the nodes are like indexing these sub graphs and you can easily run queries in the browser so you can like test as you’re developing and that’s something that we launched at Graph in January this year. And the next milestone for us is the hybrid network. And so it was really just kind of a practical intermediate step for making it really easy for people to build on the graph today.

Brian: Right. So basically the choice that would have then is they can either build their own SQL database and host it on their own server or they build their own querying an API for that or they do the sub graphs which is a little bit like developing their API sort right, they define the data format and stuff and then you guys would host it and they can just query that and you basically offer it as a hosted service.

Yaniv: Exactly and really, we’ve been very kind of transparent about our goals to build this decentralized network from the very beginning. So our model for the hosted service is we are going to be releasing a paid tier, and the idea is to essentially just kind of cover the costs of running the infrastructure because as I mentioned it’s really important that you know payment is built-in and that this infrastructure is sustainable. And so, you know, this is kind of one step along the path of launching the decentralized network, but this way it’s really easy to kind of get started and we can start proving out a lot of these pieces that need to be built on top.

Brian: And so what about the hybrid network? What’s that going to look like?

Yaniv: So the hybrid network is our next milestone where we no longer are running all of these indexing nodes. So in the hybrid network we’re going to introduce our work token. So at Graph Brandon Ramirez gave a research talk where he described a bunch of new details about our decentralized network design and we published the specs for the hybrid network. So you can check that out if you go to github.com/graphprotocol/research. The token has two uses in the network, one is a work token for people that want to run these indexing nodes and the second is for data curation and staking on the sub graphs themselves. So for the work token model anyone can come in and stake tokens to run a graph node, and then they can charge fees per query and that’s done in an open marketplace where they can set their own prices per query and so in the hybrid network, we’re going to open it up so that other people can run these nodes, but we’re going to be running a centralized service and Graph Protocol the company is going to be very involved in kind of enforcing security in the network in its kind of like intermediate phase.

Sebastien: So a couple things here, and I guess regarding scaling. How does the scale because it occurs to me that one, you know, the Ethereum blockchain is already quite large in several hundred gigabytes. Now. If you have to build an indexing service that has all of this data, but then structured in a way that presumably makes it much larger. How do you deal with that? But also the other question is, you know, you mentioned earlier that with GraphQL you have the schemas and it’s unclear to me how we’re going to come up with a generally agreed upon schema that everybody agrees with, like this is a schema that Ethereum should have and that is optimized for all of the different daps. Right? So unless I’m missing something here, but one dap might require a different data schema than another dap.

Yaniv: Yeah, great question. So first on the scaling part. You’re exactly right, from the sheer amount of data perspective, the Graph is probably going to be like one of the largest networks in terms of just how much data we need to you know store and make available and that’s really kind of what our layer in the stack is focused on and so for example, that’s why we kind of have sharding which right now is kind of at the subgraph level and then it’s actually going to be down to the individual index level. Each graph node only needs to index some very small subset of the data. But yeah to make data very quickly accessible to clients all over the world, you would really need to scale that out to many many nodes that are geographically distributed at the edge and that’s a core part of what we need to do. Now as far as coming up with these, you know schemas that’s where governance comes in. So right now anyone can define their own schemas for their particular application for their sub graphs and so you can think of sub graphs as being like kind of a unit of governance but over time one of the problems that we would love to solve is to add this layer of governance for these sub graphs and for these data types to exactly solve the problem that you’re describing which is you know, if I have like a very custom application, I should be able to define my own schema not have to talk to anybody, but one of the big benefits of Web3 is to enable interoperability and if you want to provide these API’s and you want to make data available so that many different applications can be built on top then you do need to have some coordination. So, you know, we’ve seen like standards bodies in the past that try to come up with standardized data formats and these standards bodies tends to move really slow. And you know, this is something that really excites me about Ethereum, that we actually now have a platform that enables kind of large-scale coordination and my hope is that we can build great governance systems on top where for example people can you know vote on changes that they want to make to these globally shared schemas and then this can create a scalable way of evolving standards that enables for a lot more interoperability.

Brian: So you mentioned that you guys will need sharding in that you know, there could be initially some sort of shard that you have, you know per sub graph, but what’s the timeline here? I mean sharding seems to be not super close, right? If you look at all of the sharding efforts. What’s the sort of timeline for like the hybrid network when that’s going to launch and a sharding network?

Yaniv: Well, luckily the problem is quite a bit easier for us than it is for these layer one blockchains. So you know layer ones are responsible for solving things like data availability and a whole set of issues that we don’t need to solve because our layer of the stack, we only accept these like, you know strong layer ones as inputs as our data sources and so we get to assume that that data will always be available and all we’re doing is processing that data and indexing it to make it more easily accessible and so at our layer of the stack all we’re concerned about is kind of quality of service, you know, if for some reason you know underlying shard kind of goes offline, we can always rebuild the index. And so that makes sharding a lot easier for us than for some of these other ones.

Brian: Yeah. Let me see if I understand that because you know, I guess if you look at Ethereum and the sharding there. Well, what’s hard is you have these different shards and then they have to basically be able to communicate with each other that you have these smart contract calls across charts and you know, it gets a bit complicated, but here it would be like, okay. I want a node on the graph and I can just go in and say okay I’m going to go and the auger sub graph and I just take that schema and I just have that data and I don’t have to care about any other sub graph, I just have to be able to serve at that information. Is that roughly correct?

Yaniv: Yeah. So as long as you have access to the underlying data sources that you need to process your sub graph, then you’re good to go and you can build up your index and you can start serving clients.

Brian: Another question that came up for me is that even at Chorus One we’ve been doing some of the stuff, like getting the data of the blockchain, putting the SQL database, and let’s say different projects will have their own, you know, maybe analytics or insightful things like proprietary stuff that they’re doing and that they may not want to expose exactly what they’re creating because you know, I could imagine if you can see all of the queries that Coinbase is doing maybe you can figure out what they’re going to develop next. So how does privacy work in the Graph network?

Yaniv: Yeah, so I think there’s a lot of use cases like you’re describing and people are doing like prop trading things like that where they don’t want to make that data available to others and for that you wouldn’t need to use The Graph. It might be that you actually still want to build on the Graph because it just makes it easier for you to use those tools internally, but then you’ll just use it offline. You can run your own graph node, it’s all open source, and so that option is available to you, The network that we’re building is all around public data and we think that there’s going to be more and more orders of magnitude more open data that’s going to become available as part of this move to blockchains. And so there if you have data that you want to make available for other developers and for different applications, that’s when you would use the graph.

Brian: Actually I think that’s a good point right because you could just like run your own The Graph node with this necessary sub graphs and then query that and then that’s not visible to the network. Right?

Yaniv: Right. Yeah. So you have that option available just like you can run your own private Ethereum network yourself if you want to do that.

Sebastien: So moving now to economics we talked about the hybrid network a little bit. So talk about some of the you game theory and economics that go into the hybrid network switch which is sort of the next phase in the life of the product.

Yaniv: Yeah, so, you know the first step is making sure that you’ve got indexing nodes that are incentivized to index data sets. So the main incentive for the nodes is that they can set their own prices per query and so they can set those prices based on essentially demand. So if there’s a sub graph that isn’t being used a lot and they need to cover their costs of doing the indexing then they could set the price is higher. If you know, it’s a really profitable subgraph people are querying it all the time, they can set the price lower and that’s essentially the incentive. Then you want to have an incentive for people to add data onto the graph and so we’ve got this kind of curation which is basically you can stake on a subgraph and that entitles you to future revenue that’s proportional to the number of queries on that sub graph, Now there’s actually a lot of things that could make doing that quite difficult and this is one element of the game theory that gets quite interesting. So you would want to make it so that if I create a valuable data set, then I can make more money and that creates an incentive for people to add more and more valuable data onto the graph but that can actually be quite easy to game. So, you know, for example, maybe I could find a sub graph that’s doing really well and I could copy it and then try to get people to use my sub graph instead of yours. I could choose to run my graph node and then query myself or I’m just paying myself so I’m just moving money from my left hand to my right hand to make it look like there’s a lot of demand for this data set when really I’m just paying myself. So in order to actually make it so that you can create like a long-term incentive for adding valuable data that the graph we have to like solve these kinds of problems.

Brian: Yeah, that reminds me a lot of ocean protocol that I spend some time looking at because they have a lot of the same problems and I think pretty tricky to get the game theory and incentives right there.

Yaniv: Yeah. Now, you know one thing that we’re doing very different is that all of this is for public data and so we’re not trying to solve the problem of someone’s keeping this data private and you have to kind of like set up an agreement in order to access that data because we think that ultimately most information is going to be public. Information wants to be free and for the next evolution of the web we should just assume that that’s going to be the case. But just because information is free and available it doesn’t mean that you are not willing to pay for fast and efficient secure access. And so what you’re paying for isn’t really for the information but it’s for the work that somebody is doing to organise it and make it easily accessible to you. So one way of thinking about it is maybe Uber, So if I want to travel somewhere is San Francisco I can walk. That option is available to me, nothing is stopping me from tying my shoes and walking 5 miles, but if I want a car to take me to my destination a lot faster, then I can pay a little money and get in a car and get there significantly faster. And how fast I can get to my destination is basically a function of the number of cars that are moving around the city, so if Uber only had 10 cars then it would take me longer to get a car and if Uber has 1000’s of cars in the city, then the average time I have to wait to get a car is going to be significantly shorter. And so this is kind of a way to think about liquidity in indexing market place where essentially have this 2 sided market where you can have the data producers and the data consumers and you want to have a thriving market place to provide a consumer service.

Sebastien: That’s an interesting point then to look at this as a market place which ties into my next question which is what are the costs that we are looking at here. So already as a dap user the costs associated using the Etereum network and making transactions and people are working on ways to improve things like reducing the cost of maintaining stake for instance, but there’s a bunch of costs there and then for dap developers there are costs as well, whether those be development costs or costs to store data, so there is already a bunch of costs here and with this market place is like a layer on top you are introducing another set of costs which is to access data and index things like this, what are your thoughts on the increasing costs of using a blockchain and adding more layers of cost there?

Yaniv: Yeah so blockchains today are expensive but I would expect to see the cost to go down significantly, if you have computation that you need to have replicated on tens of thousands of machines that’s going to be expensive and if you have data that you need to make available forever, that’s going to be expensive but I think we’re already seeing a lot of new designs for blockchains that should be able to make it significantly cheaper and maybe different applications need different levels of security and so I think we’ll see those costs go down. For the data layer we are introducing a cost but it’s significantly cheaper, so I would be surprised to see the costs end up somewhere around 0.001 cents per query. Like if you actually think about the cost of computation it’s actually one of the cheapest things that a company spends money on. So the traditional startup, maybe you’re spending a thousand dollars on server costs at least in the early days, whereas maybe you’re spending over a hundred thousand dollars a year for engineering. So I think actually the social scalability costs are much more needed to consider than the raw costs of computation and I think that when you actually build the payments into the protocols, in the end we are going to see that it ends up being cheap enough when all said and done. So today most people are getting used to spending thirty dollars a month maybe, actually it’s probably more, like seventy dollars a month, for your mobile broadband, you’re spending on your different services like Netflix and whatever else, and I think that with Web3, it’s kind of a new category, a new expense that people need to get used to, paying to access Web3, but I think we can look at how much it’s going to cost it total and what users are going to get back in return and I think it’s going to be more than worth it to basically have an explosion in the number of applications that you use that all interoperate and I think that the kinds of apps that will be able to come out of that are going to be more than worth the cost.

Brian: So right now you guys are focussing on Ethereum, is the idea that this is going to be that The Graph is going to the network that’s going to index all of the data of all of the blockchains and make it queryable, or whats the plans there?

Yaniv: Yes that’s the vision. Yeah we are going to index all the data and I think this is going to become increasingly important as we see more and more blockchains, already we are seeing just with Ethereum a lot of people are choosing to go to Infura for example because it is just an easier way to access data and imagine what’s going to happen when there are 10 or 20 different blockchains and each one has many different shards and you have to figure out which full node to go talk to get what data or you yourself are going to have to start running 50 full nodes, it’s not really going to scale, so I think that makes it even more important to be a single network that’s indexing all of the data across all of the different data sources and making it easily available. One thing that I would want to mention here is that we are essentially acting as a sort of aggregator. Web3 is all about decentralization and these different networks and we want to kind of be decentralized but from a user perspective there is a reason why aggregators exists. If the web itself was decentralized but if i could go to google to search for something and find a website, those web pages wouldn’t be very useful to me, similarly there’s lots of different items that are for sale in different merchants on Amazon but if I couldn’t browse and search for items for sale, those merchants wouldn’t be much use to me and so you always have this aggregation that becomes important to make things usable and typically those aggregators become your centralized points of failure and that’s exactly why decentralization is so important because we want to make Web3 usable but we think it’s really important that if you have this global API thats indexing all of the information and making it available and that every application can use this global API to access this data, then it’s really important that that API is not owned and controlled by a single company or a single entity and so that’s why we’re so passionate about making this a decentralized network.

Sebastien: So some projects are already using The Graph, you recently had this event in San Francisco and several projects participated and spoke there and you unveiled the hosted version of The Graph and the Graph Explorer which from a product perspective I thought looked really nice, I played around with the xplorer and it looks like you guys spent a lot of time on design and making sure it’s quite usable and developer docs and things like this, so can you tell us a little bit about the projects that are using The Graph now and maybe highlighting one that comes to mind and how it’s using the product.

Yaniv: Yeah so we do have a bunch of projects using The Graph already. One that I’m really excited about is Moloch. So if you go to molochdao.com that’s actually using The Graph to power that interface so as I’m sure many of your listeners know Moloch is a DAO for funding Ethereum infrastructure and it’s got members and those members vote on different proposals for how they like to spend their funds and all of that is being indexed on The Graph and powering their user interface. You can go to The Graph Explore to find all of this data that is being indexed and its available for people building applications at thegraph.com/explorer, and you can see a bunch of featured sub graphs right now and then a bunch of community sub graphs so it’s really easy for anyone who wants to build a subgraph you can deploy to the hosted service and it’ll show up in the explorer, and we just want to grow the amount of data that’s being indexed and is available to people that want to build applications.

Sebastien: Cool. So let’s talk about the timeline and the roadmaps and how long should it be before the hybrid network is released and what are some important points on the roadmap?

Yaniv: We have a bunch of features that we are going to be adding this year. One of the big ones is expanding to multiple blockchains, we’ve got things like pending transactions and something called the confirmations API which we’re really excited about, so right now when you query the Graph it just gives you back the latest stake that it knows from whatever Ethereum node its connected to, but we think that one element that’s been very overlooked in building daps is communicating to users the finality of various actions that are taking place in the interface so a typical UI that you’d see today is you perform some action and then it sends you a link to Etherscan where you pop over to this other website to see if this transaction has been accepted in the block, and we’ve all kind of put up with this but I think it’s a really great example of a UX paradigm that is not going to scale to mainstream users and so being able to actually show in the interface when a transaction has been accepted, when an action has been performed, if Im looking at a crypto collectable and it says i own it, have I owned it for a while, or has this just changed as of two blocks ago. So that’s the confirmations API coming later this year. We’ve made a lot of progress on the hybrid network so as I mentioned that the stacks are already available, we’d love for people to jump in and provide feedback, we’ve actually built the first version of the smart contracts to run that, so we’re doing a lot of the testing and development internally, I would expect it in the early part of next year but generally try not to comment on dates too much. But that’s the roadmap. What we’re really focused on this year is really just getting adoption because we think that the most important thing is to help daps build new usable experiences as soon as possible and theres something nice about this progression of actually starting centralizing and decentralizing over time is it allows us to iterate a lot faster and if we get to a point where there’s a hundred or more applications that are built on top and the Graph works really well for them, then we know that by the time we launch the decentralized network and open it up to many more nodes and do the indexing then we have something that works really really well.

Sebastien: It was very interesting to be speaking to you about The Graph today and great to have you on to discuss this topic. I think as I was alluding to earlier, too much centralization in Ethereum obviously isn’t desirable and that’s really the situation that we’re in at the moment at least for most daps, so the vision of The Graph is one that I think will resonate with a lot of people and goes in the direction that the original ideas behind Ethereum meant to lead us towards. I wouldn’t want to call it a criticism but one thing that we should be careful about is creating too much dependency on one service, for example for any additional layer that sits on top of Ethereum, I don’t think it would be much better if every dap in the ecosystem would not depend on that one layer and should that layer cease to function or the economics are flawed or for some reason no longer really work, then you have a situation where it’s not a whole lot better than the centralized platform, but hopefully this standard will allow for a multitude of market places and similar types of products to emerge where people can choose which version of the Graph they’re pointing their daps to.

Yaniv: Yeah I think this idea of having it open, which means open source so you can run your own nodes, and you can verify everything yourself and you can experiment and that nobody is locked into any one particular solution is really important and so we agree with that vision completely.

Sebastien: Thanks for coming and we look forward to seeing the developments of The Graph. We have links in the show notes but is there anyone else you want to point people to for more information.

Yaniv: Yeah we are most active on twitter and medium.

0:00:00 | -:--:--

Subcribe to the podcast

New episodes every Tuesday