On June 4, 2024, ClickHouse hosted a meetup with presentations from OpenAI, Streamkap, Braintrust Data, and Rill Data.
Below is the video and full transcript of Mike's lightning talk where you will hear how:
- Rill powers fast, exploratory dashboards for 1000s of business users across e-commerce, advertising, and AI firms.
- Rill's dashboard service leverages ClickHouse capabilities such as parallel data ingestion, fast aggregation, and partition-aware queries.
- Developers can easily get started with open-source versions of Rill and ClickHouse.
00:00 - Today I'm going to talk about how we use ClickHouse at Rill to power fast, exploratory BI. I’m going to do my best to use some of the products from the companies that preceded me. This morning I asked ChatGPT to make a data center with a ClickHouse pallet, so turns out that some of this Vector stuff working pretty well. I’m going to talk a little bit about Rill and what operational BI is and how it differs from traditional BI. I’ll mention why we chose ClickHouse and then be brave enough to do a live product demo.
00:55 - So what is Rill? Rill is an operational BI platform. We spun the core IP out of a company called Snapchat. We didn’t come to choosing OLAP databases easily because the company that we sold to Snapchat was a business called Metamarkets which created Apache Druid almost 10 years ago. So we know a thing or two about building databases. Today we operate at a significant scale. We’ve got about 100 billion events daily. We serve thousands of customers with our BI platform and we mostly work with Adtech and E-commerce platforms, although we do work with a few AI and Fintech businesses as well.
01:47 - So what is operational BI and why do we think it’s different? Traditional BI, the kind that most of us I think have gotten used to working with, is unfortunately slow and rigid. It’s kind of amazing that as consumers we’re used to speed everywhere, but as we saw in Anker demo, people are just amazed when their data applications are fast. It shouldn't be that way, data applications should be fast all the time, but we've gotten used to this sort of slow, canned dashboard report spinners. If you ask most people how they feel about their BI tools today, they probably feel the way they felt about their phones before the iPhone came along. So we think of operational BI as something that’s fast and flexible and allows ad hoc exploratory analysis. Every click should be instant and it requires a high-performance database to deliver that experience. We saw today the announcement that Tabular just got acquired for a few billion dollars by Databricks. We might be approaching a place where we're seeing a bifurcation of basically two classes of data platforms fast and slow, and I think ClickHouse certainly has a chance of being the fast layer and maybe SQL on S3 is enough for everyone else. We don't know where that leaves Redshift, BigQuery, and Snowflake, and some of the other Cloud data warehouses.
03:10 - So why did we choose ClickHouse? I think for a few reasons, but probably you can read what's on the slide I would say that it represented the right balance of having scalability. I think the challenge as anyone who's worked with some of the other in-memory databases out there, is they might be ergonomic and they might be fast, but scale is really difficult to achieve. ClickHouse is probably the most scalable of the in-memory databases that we've worked with and has the best ergonomics in terms of SQL, yet it's simple and it's open-source. There’s a vibrant community represented by all the folks here which makes it easy for us to adopt before we decide to sign a big contract with Mike and ClickHouse Cloud, we can use the open-source version.
04:03 - I think broadly to overgeneralize there are two classes of analytics that I do think are on a collision course of sorts. Historically in a very mature category is business intelligence. We see Snowflake, Redshift, BigQuery, and Databricks, these folks have been building backends for your traditional BI tools for decades. But recently and if you look at where ClickHouse has been adopted I think widely there's another class of tools which are really positioning themselves as observability tools. Traditionally these are tools that are observing IT systems, but I think as we put microchips in everything we are starting to evolve to a place where we're not just observing servers. We're observing fleets of cars, we're observing payment transactions, really almost everything we do in the world there's a mirror digital universe of signal that's getting tracked and we need to observe that signal collected. So I think operational intelligence is actually sitting between these two classes. BI typically looks at historical data, usually days of old data. Observability systems are often looking at real-time data, I think Brain Trust is a great example of that. But there's also a place for near time data and in fact most companies that have humans in the loop looking at data are looking at data that might be minutes to hours old, intraday decision-making by operators, and ClickHouse is really the only engine today that actually spans both BI use cases on the one side as well as observability use cases on the other, so I think it's a very versatile engine. Rill is looking to build a BI tool that also focuses on this middle. I think there's some white space there between what traditional BI systems do and what observability systems do today.
06:03 - So what makes our tool different? First of all, we're faster. Believe it or not, people have gotten so used to slow dashboards they don't know what they're missing until they see it. We also embrace BI-as-code. We really do target a persona of data engineers, folks who are comfortable with code. Analytics engineers that can deploy in a local environment and then deploy, develop locally and then deploy globally. We also embrace a metric's first philosophy. Metrics really are the core primitive in data. Tables are one step too low, and really people shouldn't be designing dashboards. In our view they should be designing metrics and then exploring those metrics. Rill is in essence a metrics explorer or metrics browser for data workers.
06:49 - So I'll get into a live demo with the caveat that three weeks ago by this timestamp, I wrote on LinkedIn that demos using publicly available data sets are useless and no one uses these demos. And a certain individual from ClickHouse wrote back said “I use my demos every day”. So in deference to Alexey who's not here I am going to be using a demo data set. I am going to use one of the data sets that he mentioned in his comment so maybe it turns out it was good he responded.
07:41 - It’s very easy if you want to try this demo at home yourself. We have a very short command. Rill is a binary with no dependencies that you can download and install on your command line. This is Rill, the dashboard tool I’m actually running it right now in local host. Just to give a feel for how we actually got here, the way that you would actually use Rill is you would run this curl command that would bring down this binary, then you would fire it up, and you can specify the port. This is actually looking at a New York taxi metrics data set. It’s about three million rows to show what's possible with Rill. Rill recently built a connector for ClickHouse that's directly creating this data and we can actually go in and basically build a dashboard from any one of these tables with a single click. So we continue to use some of the tools of the folks that were presenting earlier. We're using Open AI to actually look at that table and infer what would be a good set of metrics for the data set in that table. We're going to build a metrics schema and then from that we will get a dashboard. So I'm going to show how we would actually have done that. If you fire up Rill you would go to “add data” if you're running a ClickHouse cluster yourself locally, if you're running a ClickHouse cluster in ClickHouse Cloud, or if you're running it on your own machine as I am, you can actually click here and connect to that ClickHouse table and generate a dashboard. Just to give a brief walkthrough of what Rill allows you to do. Under the covers here there is a local ClickHouse cluster that I'm running. I've connected to it and I've actually already inserted about 3 million taxi rides and done a little bit of data modeling here. Rill really focuses on defining these dashboards as YAML and there's a set of dimensions that are described here and measures. If OpenAI had helped us, we would have generated this dashboard instantly. When these are published, the value of this dashboard is that everything is extremely fast. Every click that we do in Rill is actually creating that ClickHouse database that I'm running locally. An example would be doing some operational intelligence of why on Earth is there suddenly a massive spike here in the average distance traveled for taxi riders in the year 2015. I can zoom into that pretty quickly and basically figure out that there is a rider here who had a single customer that had a very long trip. So I can actually see here I believe average distance, there's s a trip ID. Rill just shows you how quickly and easily you can go from 3 million records in a tool, all the way down to a single record, and identify what might be a root cause. The other thing we've done in Rill is focused on building out pivot table functionality, so if you want to look at this data in the pivot table view that's just a few clicks. Rill’s ammo is that these explorations should be point-and-click. You shouldn't have to build these graphs. You should be able to just look at 7 days of data or 24 hours of data and each of these things is running again against that local ClickHouse cluster and returning these queries instantly. For Rill, analytics is not really a creation task. Again it’s a discovery task and we’ve done a lot of work to make it possible if you want to go and look at data and compare what's the volume of cash versus credit on transactions for taxi rides. Here you can easily navigate to different styles of visualization in Rill and again that's sort of instant. So in essence we've taken the the power of ClickHouse as a backend and we've tried to create a much more immersive, experiential BI tool and frankly it's it's an experience that we feel really isn't possible with some of the more traditional BI tools out there today.