Data Talks on the Rocks

Data Talks on the Rocks 7 - Kishore Gopalakrishna, StarTree

Michael Driscoll

Author

April 4, 2025

Date

minutes

Reading time

CONTENTS

Example H2

Example H3

Data Talks on the Rocks is a series of interviews from thought leaders and founders discussing the latest trends in data and analytics.

Data Talks on the Rocks 1 features:

Edo Liberty, founder & CEO of Pinecone
Erik Bernhardsson, founder & CEO of Modal Labs
Katrin Ribant, founder & CEO of Ask-Y, and the former founder of Dataroma

Data Talks on the Rocks 2 features:

Guillermo Rauch, founder & CEO of Vercel
Ryan Blue, founder & CEO of Tabular which was recently acquired by Databricks

Data Talks on the Rocks 3 features:

Lloyd Tabb, creator of Malloy, and the former founder of Looker

Data Talks on the Rocks 4 features:

Alexey Milovidov, co-founder & CTO of ClickHouse

Data Talks on the Rocks 5 features:

Hannes Mühleisen, creator of DuckDB

Data Talks on the Rocks 6 features:

Simon Späti, technical author & data engineer

Data Talks on the Rocks 8 features:

Toby Mao, founder of Tobiko (creators of SQLMesh and SQLGlot)
Jordan Tigani, co-founder of MotherDuck
Yury Izrailevsky, co-founder of ClickHouse
Kishore Gopalakrishna, founder of StarTree

We've had the opportunity to speak in depth with founders of cutting edge technologies. The past few interviews include the creators of real-time analytical databases. For our seventh installment of Data Talks on the Rocks, we found it fitting to interview Kishore Gopalakrishna, co-founder and CEO of StarTree and creator of the wildly popular database Apache Pinot. Kishore and I were able to have a deep technical discussion on the following items:

Three things Pinot looked to differentiate on - freshness, latency, concurrency
Pinot's real-time benefits and actual use cases across verticals - Uber, Stripe, Walmart
Pinot's unique architectural decisions - "index is a first class citizen"
Understanding users' needs in product development

I’ve noted some of my favorite highlights below:

On real-time data: (15:42)

‍“There is this amazing potential in data in the first few seconds, or even first minutes, and if you are able to leverage that and see the value, then it's endless.”
“Uber Freight [is] tracking the progress of truck drivers… providing real time insights to them on like, hey, this is the route you should take, you shouldn't be spending time here. They saw a huge improvement in business in terms of being on time, and then they saved millions of dollars.”

On the value of the open-source community: (39:54)‍

“When you hear from the users… if you keep an open mind, there is so much value that you can actually get from them… I still spend a lot of time on the Pinot Slack... We wouldn't have done upserts if it was not for Uber.”

On index support: (26:30)‍

“Most [real-time analytics databases] use the regular indexes, which is like the inverted index or bloom filter… some of them like ClickHouse, for example, have skip indexes, but it doesn't even have the row level indexes. We go way beyond that.”

On query efficiency: (24:51)‍

“Look at all these databases… what is the actual work that is being done [per query]? Pinot does the least amount of work, and that was the goal that I was actually shooting for… being able to have that [low] latency curve maintained, as you add more data, as you add more queries per second was a challenge that I took on while most people said… don't try to solve that problem.”

On updates: (20:27)‍

“[Update support] took multiple tries for us to actually get this architecture. Especially on the upserts… we did one version that didn't actually work out well, so we had to redesign that again…. we are glad that we finally got it right.”

On SQL join support: (35:00)‍

“One of the things that we focused heavily on [was] user facing, external facing applications. But over the last 2 years we have seen huge pull on all the internal applications as well… all of these started with us adding the support for joins. [This] has become a huge strength, and we are pretty much beating every other system out there in terms of join performance.”

‍

Michael Driscoll

Ladies and gentlemen, developers around the world, welcome to Data Talks on the Rocks. I am delighted today to be with my guest, Kishore Gopalakrishna, who joins me from StarTree, where he is the founder, CEO, and creator of the wildly popular database Apache Pinot. Kishore, thanks for joining Data Talks on the Rocks.

‍

Kishore Gopalakrishna (00:34)

Thank you, Mike, and it's my pleasure to be on the podcast.

‍

Michael Driscoll (00:39)

I want to start by talking about the past and present of the Apache Pinot project, how it came to be. I've listened to a number of your interviews over the years talking about your past at LinkedIn. I know you were a Senior Staff Engineer at LinkedIn, previously worked at Yahoo, where a number of innovations and distributed systems came out of. Rather than go back and retell that origin story of Pinot that began at LinkedIn. I'd like to refer to it, but I'd really like to focus on … I think the question that I ask anyone who's created a net new technology innovation is why did you choose to build a new database? I'll start with a quote I heard from another database creator who said “every database is an overnight success, after 7 years.” Common wisdom says it takes about 7 years to get a database to a level of maturity. Maybe just tell our audience and our listeners a little bit about what was the real motivation a decade ago now to create Pinot. What were the pain points that you were addressing that were not addressed by other technologies at the time?

‍

Kishore Gopalakrishna (02:12)

I think that's a great question, Mike. I think one of the things that I probably mentioned in a few interviews, but I think it's the reality is, I actually didn't want to build a database. Because this was not the first database that I built. I had built multiple systems before this. And I also built Espresso, which is a key value, NoSQL distributed system. I mean, 7 years is probably an understatement. I think it takes a lot longer than that, and especially the last 10%. You can get from 0 to 90 very quickly, but I think beyond that is what takes a lot of years. So for me, when I actually got into the Pinot project there was already a solution which was kind of built on Elasticsearch. So for me, I was like let's take this and then take it to the next level. So I was hesitant in terms of building a new system. But then I think the challenge is always, as you get deeper and deeper, you kind of feel that you can't really make this work. You can't really take it very, very far. And I think that's when you go to the fundamentals and say, how are these systems actually built from ground up? And in this case I'm referring to Elasticsearch, which was the original solution. And that's where we had to take a step back and then see like, I can probably make this work a little bit, push it here and there, put some hacks around it, but then I can't really go very far. I think that was one of the things.

‍

Michael Driscoll (03:51)

Well to get specific Yahoo and LinkedIn, a place where a lot of really powerful innovations were created. Obviously, Hadoop came out of Yahoo. Kafka came out of LinkedIn. The engineer who actually initially created Apache Druid was also a LinkedIn engineer at one time, Eric Tschetter, who's now at imply. And one question I would have is, and of course ClickHouse was probably quite early, but ClickHouse did exist at that time. There was Vertica, there were some open source projects out there. My question would be when you started looking at creating Apache Pinot inside of LinkedIn, what didn't exist in those other tools, whether it be Apache Druid, whether it be early versions of ClickHouse? I don't know if it was open source at that point. What was missing in particular from those other architectures that led you and some of your colleagues to say we should, let's start something new?

‍

Kishore Gopalakrishna (04:58)

Yeah, I think that's a great question. I think Druid was definitely something that was available at that point. I think the main biggest thing that we wanted to do at that point, and if you look at the whole real time system that was coming up and Kafka being originated there. We wanted to take a different idea in a different approach in terms of ingesting data, and if you go back and look at Druid, they had this Middle– I forgot what.

‍

Michael Driscoll (05:25)

MiddleManager.

‍

Kishore Gopalakrishna (05:26)

MiddleManager. Yes, they had really creative names. And then it was pulling data from Kafka and then writing it to their other servers. I think that for me, given the importance of real time, and how quickly they wanted to actually reflect this to the dashboards, either be it in the products that we are building was always and going to be a kind of an impedance mismatch. Because as soon as something shows up in Kafka, you should be able to query it immediately. There shouldn't be any micro batching, or there should also be the ability to get the best performance. The partitioning concepts and all those things actually matter a lot. So, being very tightly coupled with these new kinds of systems which is push, not the push based systems, but kind of pull. And if we can embed those consumers directly within Pinot, then we can actually do a lot more than what the traditional systems can actually do. So that was the first thing, I want zero latency between the source and being able to provide insights. So that was the first part.

So then, the second part is, where to do it? While architecturally it was the same, I wanted to get into the how part of it. I think that is where the optimizations came into picture, which is, how can we do less work on a given query? And one of the things that we noticed is, even though these are all time series data, each data segment can actually have a very different data profile. If you actually dig deep into Pinot, the query plan is actually different for every segment. It's not that, oh, we just look at the SQL layer, and then at a very logical level, we are actually coming up with a plan. So we went deep into optimizing physical. Because if you just try to optimize at a logical level which most databases do, it was not good enough because we are not really looking for second level latency. Our goal was a p99 should be less than 100ms like Jeff Wiener would actually call if “who viewed my profile”, didn't load within 100ms. So we were not looking at, hey! Here is the dashboard, let's make that faster, right? We were not going after building faster dashboards. We were really bridging…There is the concept of old OLTP, the workloads and the kind of expectations that people have, but for OLAP functionality. So our real thesis was OLAP at OLTP scale. So that was what we actually want. And then for me it was always about having the strong fundamental foundation so that I can go very far.

We may not choose to solve all the problems at once. But how strong are our core foundations, which is really about optimizing to the physical plan, making sure that we don't do even a single. If a road doesn't need to be scanned, it shouldn't be scanned. Index as the first class citizen. It's not the other way around where you just do a brute force scan. So a lot of these fundamentals were very important for us, and we saw the problems. And I could clearly see that taking any of these existing wouldn't let us go to the scale that we really wanted. And if you look at today, LinkedIn is serving 650,000 queries per second.

‍

Michael Driscoll (09:11)

Off of their internal cluster, Pinot.

‍

Kishore Gopalakrishna (09:14)

Who viewed my profile started with 1,000 queries per second right? And that's really where I wanted it to get to is truly OLTP scale. And this is generally, when you think of half a million to 1 million queries per second. That's all OLTP. It’s just key value stores. You just keep looking up. And then I really wanted to push it to the fact that an OLAP engine can actually do this, and very, very happy that we were. We are there, and I'm looking forward to reaching the 1 million queries per second.

‍

Michael Driscoll (09:45)

1 million mark. Well, who knows? The thing about open source is, there might be a company out there using StarTree and Pinot at that scale.

I think zooming out a little bit, and then I'm gonna zoom back in and talk a little bit about the index support that's very unique about StarTree. In fact, the origin of your name is the StarTree index. I've heard you say there's really 3 things that Pinot looked to differentiate on.

The first being data freshness, that tight integration with Kafka and making data available kind of as soon as it hits. You know, the moment an event occurs, you want the analytical applications to be aware of that data, not to have this kind of high latency between when events happen and when the database system can be queried, so that OLTP scale. And frankly, OLTP maybe latency as well.

Second, I heard you just talk about another form of latency, which is just query latency. Getting that P9 to a place under a hundred milliseconds. I think I've heard quotes for different systems, but it feels like in general Pinot tries to target, around sub 100 milliseconds per query, which is again a level of latency that certainly you don't see with more traditional cloud data warehouses like BigQuery, Snowflake, Redshift.

Then maybe the third one which I'll just throw out there. And then I'm gonna zoom out and ask you about use cases. The third one I've heard you talk about before is concurrency and certainly user facing analytics, embedded analytics. When you have something that's facing the world, you can get a million queries a second, because there's a lot of users out there on the planet Earth that are interacting with these platforms that are running Pinot in the background. I think given that architectural profile - low latencies, high concurrency, freshness of data - given that architecture…if you could name names, that's great, but what are some of the companies, or the use cases that tend to be attracted to an engine like Pinot? Given what it can do, what are some of the concrete things, whether in fintech or crypto or retail or social media. I would love to just hear a couple of stories where this was beyond LinkedIn, where this was really the right tool for the job.

‍

Kishore Gopalakrishna (12:32)

I definitely talked about a lot, but I'll start with Uber. That was the next biggest place where Pinot became widely popular. And it's just amazing the kind of use cases that they have been using. I mean, Uber Eats is one of the things I think you talked about, the concurrency part. And again, this was one of the visions for us to truly democratize data. And I think this data democratization has been talked about a lot, but it's always limited to people within the company. Our goal was always like, hey, take it beyond the company to your users, and Uber Eats did that fantastic.

If you look at Uber Eats, you see the time, and then the time is not a stale time, that is what the restaurant took yesterday to serve the order. But it's querying the live orders in the last 30 min. And how long is it actually taking? So that's a great example of live query being run whenever you look at an application.

Same thing with Stripe, think about all the merchants of Stripe, hundreds of millions of merchants. They're looking at their dashboard like live analytics. And then they're getting these numbers from Pinot behind the scenes.

Walmart you mentioned about retail, every order that you place on walmart.com actually gets into Pinot. And you can think of the state transitions like the order was placed, it was in the inventory, it was shipped. The entire fulfillment tracking is done behind the scenes with Pinot.

So I think the key that we found was what we saw on LinkedIn. And then Uber pretty much every company went through that transition 5 years later, because they had the Kafka adoption, and they had some sort of systems that either it is Kafka or Redpanda or Pulsar, or any of these things where they say, hey, now, I have access to these events, live.

But then always the first approach most people do is use this system more like an ETL system, where they take the data from Kafka and then just dump it into a data lake. And then it's a fine thing to do first because that’s one the best in the uses of Kafka and widely used as well. But then, suddenly you realize, wait a minute. I have this very, very fast system, and then I'm taking that and putting it into a slow system. My value of this data is completely lost.

And the reason I started with this architectural shift is to be able to capitalize on this data, and that's where Pinot comes in. And then people start doing this like 2, 3 years into their journey after adopting something like some streaming system like Kafka. Then they start thinking, okay, now, can I actually get the value out of the system immediately? We have some of the use cases in Citibank, for example, they went from 10 min to actually 10 seconds in terms of the freshness that they wanted, huge value that they were able to get right.

So there is this amazing potential in data in the first few seconds, or even first minutes, and if you are able to leverage that and see the value, then it's endless. You will start coming up with so many applications that you had not before.

I mean Uber, for example, had this Uber Freight which is a very interesting application, which is, they're just tracking the progress of their drivers, the truck drivers, and they kind of gamified that and then provide real time insights to them on like, hey, this is the route you should take, you shouldn't be spending time here, or whatever it is. They saw a huge improvement in business in terms of being on time, and then they saved millions of dollars.

And so I think it's just one of those things where most people don't think, don't realize the possibility of having a technology like Pinot.

‍

Michael Driscoll (16:43)

Truly real time.

‍

Kishore Gopalakrishna (16:44)

Truly real time, and the kind of impact they can have on their users. And once they see that you can't unsee that, that's when it starts blowing up. And they say, I want this in every application.

‍

Michael Driscoll (16:54)

More and more.

‍

Kishore Gopalakrishna (16:55)

More and more of it. That's kind of how LinkedIn has come to this scale where we just started off with who would view my profile, and then every team wants it, and every product, every vertical, is like any number you see on LinkedIn is now being powered by Pinot.

‍

Michael Driscoll (17:16)

There's essentially an exponential decay that occurs with data. So knowing something about 10 min later is much less valuable than knowing about it within 10 seconds, especially for use cases like banking or logistics routing drivers. I think in some ways, Pinot might be an example of a technology where sometimes the tail does wag the dog where the existence of a database that can deliver real time insights can actually reshape how the business operates. Because it changes what's possible.

Talking about the architecture of Pinot and some of the decisions that went in one of the things that when I've talked to technologists and creators like yourself, there's often some contrarian decisions that have to occur in the early days. And just by way of example, because I think they're obviously addressing a different segment of the market, when I talked to Hannes, the creator of DuckDB, one of his contrarian decisions was, we're going to focus on single node architecture. We're going to focus on scale up, not scale out, which at the time was very much, not in vogue, and of course you know, fast forward a decade or more, that decision has been borne out for a certain class of use cases that you can scale up quite a bit.

What were some of those contrarian decisions that you felt were going against the grain when you were thinking about the architecture of Pinot, both at its creation? And then I'll ask a second question, what are some decisions that if you could go back you might have made differently?

And I'll add there, which is that I saw today Kafka announced that after many years of service, they said, we're finally getting Zookeeper out of Kafka as a dependency.

So those are the two questions, it's contrarian decisions you made that you were happy with, and maybe decisions you made that you would like now, maybe to contradict if you could go back in time.

‍

Kishore Gopalakrishna (19:37)

I think that the first one was definitely there was a huge pushback on building this on a columnar store. If you go back and look at it, people get scared of like, hey, how are you going to do updates? And how are you going to do things in column store? And how are you going to make sure that you are not compromising on the freshness? I mean, that's kind of when most people resort to micro batching so that they can actually flush. So that was a big, big challenge. And most people thought they couldn't do that.

‍

Michael Driscoll (20:08)

How have you? Because I do know something that Pinot is known for is your support for upserts and updates. Tell us a little bit about, even though you just decided to go with this columnar architecture, how does Pinot support updates to existing segments?

‍

Kishore Gopalakrishna (20:27)

This took multiple tries for us to actually get this architecture. Especially on the upserts. Because, again, this is a contradicting thing that we had to go for, which is, if you look at most databases that support upserts they do LSM trees, which is like they do the merge during write, and sorry during read, and not during write. But I had to make this decision that no, I'm going to resolve that conflict at write and not at read, because I don't want to compromise on my read performance, so I want each partition to be as fast as possible in parallel. Because if you kind of put this additional step in your query that you have to do the merge before you actually start processing that record. Then your latency is gone for a task you won't be able to serve at that millisecond. So for us, we actually resolve that. So we maintain, where is the actual row for that particular key? And then we maintain a bitmap within the system to make sure that it gets updated. And the nice part about this is, we now get a change log as well. So we're not only able to support what's the latest record you can actually find, but what was history as well. It came across as a second secondary benefit that we didn't anticipate. But that actually became another interesting feature. It said, hey what was the value of this record at this time, and you can actually even skip the upsert. And then you can look at all the changes for this record as well. So we had to make that huge big decision on how to solve upserts.

‍

Michael Driscoll (22:07)

And I'll just say, from having you obviously played in this space. As much as from a technical perspective, we love the idea of a world that is append only, and the past is immutable. The messiness of the real world is such a that we are always dealing with late arriving data that was maybe even incorrect.

‍

Kishore Gopalakrishna (22:30)

It's one of the most popular features of Pinot right? And that's scaling like crazy in terms of the number of records that needs to be maintained. So I think we are very happy. I mean, again, we did one version that didn't actually work out well, so we had to redesign that again. These are all the fun stuff in distributed systems. You value the vision you have. You take the challenge, and you make a bunch of mistakes along the way, but we had to spend one year, and then throughout the entire work, and then restart that. But we are glad that we finally got it right.

‍

Michael Driscoll (23:06)

So the other contrarian sort of decisions you felt. First was choosing a column store columnar you know, what are some of the other architectural decisions?

‍

Kishore Gopalakrishna (23:16)

So the second one was the indexing itself. I think there, while on paper, people think that indexing is kind of at a high level. It's good, but most people hesitate. I mean, you talked about DuckDB right? I mean they are, let's scan very very fast, right. So I think for me that was a good enough solution.

But then, as you scale for me. I was always thinking about, hey, what will happen as we scale the data? What will happen as we scale the number of records in this? Is my latency going to be proportional to the number of records that I scan? And yes, that's going to be the case. Because, yes, you can try to do all sorts of vectorization, all sorts of compression, everything.

But then you're always limited by physics, because you are getting faster, not because you're doing something unique. It's because your SSDs are becoming stronger, faster. Your network is becoming strong. I mean, if you look at anything that is going over the network, and then looking at S3 and things like that, there is very little innovation.

There is some innovation in the application layer, but the majority of them are coming from S3. And S3 being able to provide a faster latency. And then the networks are actually getting improved. But you're not really doing anything cool from your application to actually leverage that even more so for me, it was always about, this is something that I got inspired from Kafka, which is really I wanted to look at the cost per query.

So when I look at cost per query, it's not the pure latency, but it's the actual work that is done during the query. And if you look at Pinot, Pinot actually does the most minimal. If you look at all these databases, and then you try to go deep and then say, what is the actual work that is being done?

Pinot does the least amount of work, and that was the goal that I was actually shooting for, not just to say, like, okay, let's throw a bunch of machines, and then we will still be able to get the latency because you can have the parallelism.

But then, what's your total work done, because that comes and bites you when you have more concurrency, because you might read just one query, and then you might get a better performance. But what if you run hundreds of queries here?

So, being able to have that latency curve maintained, as you add more data, as you add more queries per second was a challenge that I took on while most people said, like, you don't try to solve that problem.

‍

Michael Driscoll (25:50)

So I guess the thing there being indexes help StarTree and Pinot work smarter, not harder. I mean again the eponymous, the name of your company that is, you know, the commercial leader, for Pinot is StarTree. Tell us a little bit about some of those indexing structures in Pinot, and maybe I'll even say more specifically what are those indexing structures that do not exist in, let's say, other OLAP engines that people might evaluate like DuckDB, StarRocks, Druid.

‍

Kishore Gopalakrishna (26:30)

Yeah, most of them use the regular indexes, which is like the inverted index or bloom filter, and some of them like ClickHouse, for example, have skip indexes, but it doesn't even have the row level indexes. We go way beyond that. So we have range index, for example. So if you are looking at a performance log and say like, how many or what are the requests that were less than 3 seconds or greater than 3 seconds? You don't want to scan every record. And then, even if you're just using min, max, that's not so. We have a very amazing range index. We actually look at the bitmaps and then do some crazy stuff there to figure out, like, what are the exact rows that actually match. So we don't actually inspect every record to say which record crosses 3. So the range index is one. And then we also have within the JSON, we can actually get to the next level in terms of any nested field, can actually be indexed as well in terms of geospatial like we work with Uber.

And then we actually instead of using lat long (latitude/ longitude), we actually have the H3 index. So if you search for like 5 mile radius, it should be 5 mile radius. It shouldn't be like some approximate squares around that. So very, very accurate. I think that's another. So the thing that we had was we built index as a first class citizen. It was not an afterthought. I think that's the major difference from when you look at other databases vs us. So when we keep coming up with new indexes, vector index came, it was very easy for us to add, because we kind of start with that as first class.

So a lot of things are actually designed around that. And last, but not the least, is also the StarTree index that you mentioned, which is, we kind of call it as an index.

But it's really smart, very, very smart. A materialized view. So that's really the concept there. But if you look at most of the materialized view concepts, they're dumb. They basically say, okay, these are the dimensions. It basically goes and creates all possible combinations.

And there's nothing innovative there. That was the old world of OLAP cubes. Right, you would pretty much compute every answer that is possible, and you would put it in a key value store, and probably query that upfront.

But the StarTree index was really being smart about understanding the data and then creating aggregates only for the things that we think are going to be expensive at runtime.

So it's not just based on the workload. So it's a very cool concept. And I generally try to explain this with an example of if you look at an ads world, or something like that, and say, like how many impressions happened in US, you might have an inverted index, but that's going to match 50% of your records, because most of your ad impressions might be coming from US.

So the query is going to be slow, right? If you can start counting the revenue of those impressions. But if you ask for something like Kenya, it's going to be fast, because not many records match Kenya. So now you have this index, but then your query latency is highly variable.

If you ask US, it's very bad. If you ask for Kenya. It's very fast, but that's a very tough world to be living in from an experience, from an application point of view.

‍

Michael Driscoll (29:59)

Having consistency, right.

‍

Kishore Gopalakrishna (30:00)

Consistency. So that's kind of where StarTree comes into picture. So now, what StarTree index will do is it will automatically compute and aggregate for US, but it will not compute anything for Kenya.

‍

Michael Driscoll (30:14)

Oh. So one area, maybe this gets the kind of future thinking. I think there was a paper that the very widely admired Jeff Dean of Google wrote several years back, maybe before the current AI boom, about learned indexes. And I think maybe many of us look at what's been happening in the world of large language models. It's pretty inspiring to look at what Deepseek has been able to do in terms of — let's put aside whether they're honest about where their training data came from — but just in terms of the amount of information that can be compacted into a large language model that you can actually run on a single machine. Now, beefy, you know. Beefy M4, M3 machine . It feels like there's an opportunity for those of us thinking about the database world to just do a better job with our indexes. We think about the nature of LLMs is these vector embeddings that really compress the world down into much more efficient data structures.

What do you think about what kinds of indexes might be possible in the future for an engine like Pinot to work smarter and not harder? And you mentioned StarTree indexes. Is there a sense where we could start to think about the empirical usage of the database and forming what actually gets indexed, and how do you know we maintain it on disk? The physical layout.

‍

Kishore Gopalakrishna (31:57)

This is a fantastic question. I'm really happy that you asked this. And I think you should actually be building databases. The fact that you're actually thinking along these lines. The reason why I say this was actually a project that one of our interns did last year for us, and we found really interesting, very exciting performance improvement. Without getting into the details. I think the way to think about this is like more of if you were to increment the count of every record that got hit, and then zoom out, and then you have a heat map of all the things now, think about all the red ones are your hot bits and anything that is not actually accessed frequently. Now, if you feed this to an LLM model and then ask, hey, these look, these are all the different hits that I'm actually getting. Now, can you actually tell me what to pre-aggregate, what to not pre-aggregate upfront? And that's kind of what we did. And it's actually fascinating to see those results. So we are working along some of those things which is, I mean, you mentioned about the learned index. Learned index is fantastic, but it's really just more solving about the hash lookup, which is like, I need to look up this key, what is the probability? But in OLAP you can't really just do that. You need to give exact results in terms of the answers. So that and you are scanning multiple rows as well. So you're spot on in the sense that models can actually learn some of these things and say, like, hey, this is the query that is coming in. The potential of finding answers in this part of the data is quite high. That can actually reduce a lot. I think the challenge with models is always the false positives or the false negatives, rather where it probably points to a region. And there is data in other regions as well. And you can't really compromise on that. So that last part is what is going to be challenging. But it can definitely simplify a lot of things for us. And even if it can solve bloom, filter, right? It's fine. That's actually still a positive.

‍

Michael Driscoll (34:20)

Yes, I was going to mention bloom filters. Of course. Just sort of excluding, just knowing where I know for sure my data doesn't exist. That can drastically reduce how much, how many segments I scan. Well, maybe before I shift to my last set of questions. I guess other things that you are excited about on the roadmap for Pinot, for folks who are using Pinot, considering Pinot, what are some of the other directions that your team and colleagues and the community is looking to take the project in the coming. Let's say the coming year.

‍

Kishore Gopalakrishna (34:59)

I think one of the things that we focused heavily on like the user facing, external facing applications. But over the last 2 years we have seen huge pull on all the internal applications as well. Think about product analytics like people were using tools like Mixpanel and Amplitude and other stuff. Now they're doing a lot of funnel analytics on top of it.

And I think all of these started with us adding the support for joins. Which was not there in our previous version. So the joins have become something that was a weakness for us has become a huge strength, and we are pretty much beating every other system out there in terms of join performance.

‍

Michael Driscoll (35:42)

I'm gonna drill down on that. Tell us about the architecture that joins sound similar to upserts where you know certainly not a small feature to implement in a large scale analytics [platform]..

‍

Kishore Gopalakrishna (35:55)

Yeah and then the same thing as upserts. We didn't want to do exactly the same thing as what Presto or Trino would be doing. Because then you are just trying to play the same game with the same strategy as well. So for us it again goes back to the first principles of can we reduce the amount of work done in join right.

So the two things in joins that kills joins performance is the scan and filtering. And so there is a lot of push down and other stuff, but we can leverage our indexes there. But the second part is the shuffle. How can we avoid the shuffle? So that's where we actually have, we leverage a lot of our indexes as well in terms of avoiding the shuffle, and also all the placement strategies. So we have a lot of ability to do co-location of these different partitions together. So again, going back to the same thing is how can we leverage the indexes that are there in Pinot? So instead of pulling both the left and right and then doing the join,we actually just pull one side and then push down the other side back into the join instead of doing the traditional joins. We call that dynamic filtering kind of a concept. Maybe that's pulled from one side and pushes it. And then you can actually leverage the indexes quite heavily. So our goal is still to be able to solve even the joins in like the second range subsecond range so that you can. And use it for interactive applications.

‍

Michael Driscoll (37:31)

Of course, which is a huge advantage. Again, from the experience that we've had over time, especially in analytics, the requirement to actually materialize all of the information that you want to make available in an application can sometimes be challenging. I think a more flexible system where labels may change or attributes may change. I would say, of course the joke people will talk about before with, you know, everything is much harder when it's distributed right? So joins are easy. Single node systems. MapReduce is three lines of Lisp, right? All of the code for MapReduce is making it distributed, parallel, robust. Frankly, you know, fast is another angle.

I want to just shift a little bit to talking about your journey as a technologist, as an open source creator. I know we've got a few minutes left here, and one of the things I heard in an interview you did a couple of years ago was when you think about the roadmap, and you think about what to build next. I think your quote was, and I think I wrote it down here was, “you don't need a product manager when you have a community.”

‍

Kishore Gopalakrishna (39:05)

Oh, my god, that's gonna get me in trouble!

‍

Michael Driscoll (39:09)

And you know product managers are very, very valuable. But maybe the answer is not either or, right?

Maybe you don't need as many product managers when you have a community!

But maybe just talk a little bit about that Pinot community and how it's informed the growth of the project, and really, just in your own words, what you meant when you gave that quote.

‍

Kishore Gopalakrishna (39:33)

Yeah, I think it's more about understanding your users at the end of the day. Because they see things sometimes that you can’t. As founders and creators, we come with our own opinions and mindset and say this cannot be done. This shouldn't be done. This is not how the world should be, and things like that.

But then when you hear from the users, and then, if you keep an open mind, there is so much value that you can actually get from them.

So for me, I still spend a lot of time on the Pinot Slack. And then whenever someone asks a question, I get intrigued by their use case and ask how did you think about using Pinot for this because I never myself would have thought about it. And then they come up with, oh, you see this use case there is a blog on this, we are actually doing something very similar, but for a different domain. I'm like, wow, that's crazy, how they are actually able to think along those lines.

So I think for me, it is just about getting ideas from every part of the world, and from every developer, because generally they are thinking in completely different directions than what we would be thinking, and that opens up a lot of possibilities. And that's kind of how we got here. It was always listening to the community, and we wouldn't have done upserts if it was not for Uber asking us to like, hey this the fare of a ride changes after the ride, and how do we make it. Some simple things like that. And then it just sometimes you take it as a challenge and see like, Hey, how can you actually solve these things, and one thing leads to another.

‍

Michael Driscoll (41:17)

There's an interview podcast that I heard that referenced [Scott] Cook, who is the creator of Intuit, and he had a quote where he said he asked product managers when they were thinking about a product feature when they went and interviewed customers or their community of users, what surprised them? And if a product manager came back and said nothing surprised us, he would say, you didn't actually do the interviews, because in some ways surprise is the essence of information, right? The unexpected.

And we know and the only way we make progress with our products is when we get an unexpected bit of information from the users, otherwise there's no point in listening to them, if everything they say can be compressed away as expected.

‍

Kishore Gopalakrishna (42:06)

No, no, I think that's a great point. And it's just asking the right questions and also being open. I think I had made that mistake myself. Being very dismissive of like this person doesn't know what he or she is talking about, but then they come up with really cool things. And I think just having that open mind helps a lot.

‍

Michael Driscoll (42:28)

I think the key, the key element for maybe product managers listening to the community is always separating between what people are asking for versus why they're asking for that.

‍

Kishore Gopalakrishna (42:39)

I think that why is such a critical piece. And then most people don't get it. It's about the why, Because that's something that I am very conscious about as well now, even if it is customers or if it is users.

I don't want them to come up with solutions because that's not their...I really want them to talk more about what's the problem that you're trying to solve. Why are you trying? And then we will figure out. I think there are people who are living and breathing these databases for like multiple decades, a long time. So they know how to come up with them. And again, sometimes you get really good answers from, but that's generally there, because if you don't work on this day in and day out, this is so deep, and what feels like a simple thing doesn't really work at scale. I think you hit on it right, it's asking the why. I think that's the most important part.

‍

Michael Driscoll (43:39)

I'm gonna end with one broad question before we let you get back to your busy life as the founder and CEO at StarTree. We're in the broader space of data infrastructure. This is where StarTree is playing, and we've got some behemoths on the landscape. We've got Databricks and Snowflake, most notably.

I think one of the things we've seen in the last few years is this shift towards object storage as an incredibly disruptive technology, namely S3 and Amazon, but certainly Cloudflare has their version, R2. For folks that are open source on prem Minio continues to make progress.

We saw that Tabular got acquired for 2 billion dollars after a bidding war between Snowflake and Databricks. And I'll be maybe specific given the rise of data lakes and object storage. We saw WarpStream got acquired by Confluent, an object store backed streaming system.

With regard to object stores and Iceberg, where is StarTree? How do you look at this trend of embracing object stores? I know we talk about low latency and freshness. How does the Pinot community, the project, and StarTree as a commercial business, how are you thinking about these macro trends… and really that maybe the convergence of the data warehouse with regard to Snowflake and data platforms backed by Delta Lake or Iceberg on the Databricks side.

‍

Kishore Gopalakrishna (45:24)

No, I think this is a great time to be in when advancements like these are happening right, and especially on the object store part,

I think S3 is definitely setting some of the standards there. I can see that there is still a huge gap from S3 to some of these other file systems as well, but being able to work at a low level API rather than just at an object. Because we are still trying to push the boundaries of analytics.

I think Databricks and Snowflake are working towards a slightly different objective in terms of just enabling anyone to be able to query data on object store.

But if you get down I feel very angry sometimes in terms of the number of computes that get wasted in terms of reading data from this object store. But a lot of these limitations are actually because of either how object stores are built, because then you can't append. You can't upsert on those things.

‍

Michael Driscoll (46:24)

You can't. Although recently S3 did offer, they do now have support for appends on S3 objects.

‍

Kishore Gopalakrishna (46:31)

Yes, I think it again comes down to in terms of the cost, on what they're actually, what is the cost under the hood. But those are all great enhancements right in terms of, but if you now go back and then look at the formats are they capitalizing on that. So they're not really leveraging the concept like, if you look at Iceberg and Hudi and delta lake all these formats, they're rewriting the files. It's just there's so much of. I mean, it makes sense.

But me coming from where we are trying to optimize every microsecond and every nanosecond. And you see on the other side the amount of CPU that gets wasted in terms of fetching all this data and then filtering, that I think there is still a long way to go in terms of how object storage can actually become mainstream.

And that's some of the things that we are doing in StarTree. So we have a tiered storage version of Pinot, where we are actually able to provide a second sub-second latency on the data in S3, but not going through the regular APIs.

We are not just trying to do a lazy load of like, okay, let's fetch everything. And then we actually go to the byte level because of indexes again. So what we do, we use the index. We know exactly the byte range that needs to be queried.

And then we just fetch only those byte ranges because you're limited by the throughput on S3. So you can do, let's say if it is 1 gbps per second. You don't want to pull in trash and then use up all your bandwidth to actually get all the data that you probably don't need for your query, and then waste all the bandwidth.

So that's kind of the enhancements that obviously we are doing. And that's what I'm really proud of in terms of how we have architected, because it was actually not that hard for us to go from something that was relying on local file systems to something like S3.

So we were able to actually abstract out the buffer concept and then be able to directly work on S3 without having to pull the data locally, and we are doing a lot of pipelining and other concepts as well.

‍

Michael Driscoll (48:43)

So many of these architectural innovations that started with how do we make Pinot smarter in terms of how we access data, you know, that's in a coupled architecture, are now paying dividends in terms of how you access that on object stores, and in an efficient way versus right pulling gigs and gigs across the network.

‍

Kishore Gopalakrishna (49:08)

Yeah. A lot of these decisions. I mean, I think there is a little bit of craftsmanship, but there is also a little bit of luck. I think we never anticipated some of these things would actually pay as dividends like this, because if you look at the segment format. We separated out the index from the forward index from all these different entities, so that allowed us to actually leverage other than enhancements that happen outside of Pinot as well. Like whatever is happening in S3 and other systems. So now we can control, we can actually keep just the index local at a column level, at a table level, at a time level. We have full flexibility and say, like, only this data will be starter, index will be local, everything else will be remote. So we get all these flexibilities, and you can kind of pick and choose, and have the right trade off in terms of latency and cost, which was a very important thing for us, because most people come and say Pinot is too fast. I don't want this. I don't need so much speed. It's ok for me to be second. But can I just work on tiered storage.

‍

Michael Driscoll (50:17)

Right can they take advantage of some of the efficiency that Pinot's indexing structures allow, but not necessarily need to have it? Maybe the local storage.

Well, I've really enjoyed this conversation, Kishore. I appreciate taking time to chat with me and hopefully with some of the engineers and developers around the globe that will tune into this interview.

I look forward to doing some work together. I know we're colleagues in this data ecosystem, so really have enjoyed getting to know you and the StarTree team over the last a year, and thank you for joining us.

Thank you for sharing your wisdom, and I look forward to more in the future from you.

‍

Kishore Gopalakrishna (51:05)

Absolutely, Mike. It was a pleasure being on the show, and I think this was one of the shows where I could go deep, so thank you, for bringing out the engineer in me. I thoroughly enjoyed it.

Thanks again for having me here.

Ready for faster dashboards?

Try for free today.

Get started

Related Articles

Data Talks on the Rocks 8 - ClickHouse, StarTree, MotherDuck, and Tobiko Data

Data Talks on the Rocks 6 - Simon Späti

Data Talks on the Rocks 5 - Hannes Mühleisen, DuckDB

Ready for faster dashboards?