Loading
Interviews with Experts Bonus 25 exercises
interview

Enhancing SQLite with Ben Johnson

Ben Johnson has had a 20-year long coding career, working in roles from Oracle to the creation of BoltDB. In this episode, Ben speaks with Kent about his latest projects, LiteFS and Litestream. These initiatives aim to expand SQLite's utility.

From its embedded nature and the transactional write-ahead logs (WAL), Ben talks in-depth about SQLite's architectural elements. Although SQLite faces limitations like restricted concurrent writes, Ben points out its commendable scalability on a single node, particularly when coupled with LiteFS.

The discussion covers more than just databases. Topics include:

  • Content creation’s impact on career progression
  • The downsides of relying on pre-configured setups
  • And the value of user feedback for refining software projects.

Resources

Loading interview

Transcript

00:00:00 Kent: What is up everybody? I am so excited to be joined by my friend Ben Johnson and Ben and I met through his work on light stream and light FS at over at fly and Yeah, it's been a just a pleasure to work with

00:00:20 Ben and test the limits of LightFS a little bit. And yeah, Ben has just been a really awesome help in me getting my website in a good place and feeling confident with LightFS. And so yeah, Ben works full-time on LightFS. I think it'd actually be good for you to kind of introduce yourself, give us an

00:00:40 idea of some of your background and everything too, Ben. So can you give us some of that, please?

00:00:45 Ben: Yeah, yeah, for sure. Thanks for having me on Kent. I was happy to talk about databases and SQLite. And yeah, so I've been writing code for probably 20 years professionally and just kind of did like a whirlwind journey of I used to be like an Oracle DBA back in the day. I've done JavaScript, web apps, I've done data visualization work,

00:01:05 I've done Rails apps. And then I got into like a weird specialty of like writing databases. And yeah, I wrote a database called BoltDB, which in the Go community is a pretty common embedded key value store. So and that kind of trickled out and a lot of people have used that over the years.

00:01:25 Things like console etcd programs like that. So you kind of find it a little bit everywhere. And then I just kind of slowly kind of I realized like I really love like embedded databases, but I also really like schemas too. So like the natural conversion of that is SQLite. And kind of the realization is like I really want

00:01:45 to use SQLite for most everything, but what are my biggest issues trying to make it work? So I kind of got into figuring out how to do disaster recovery and just make it really usable on the server side, where it used to be more of an embedded device database. So that's kind of what I've been working on the last couple of years. So I made a tool called Lightstream, which does

00:02:06 like a streaming backup of SQLite. And then now I'm working on LightFS, which is more like a high availability distributed SQLite, which sounds really fancy, but we're trying to make it as simple as possible.

00:02:17 Kent: Yeah, yeah. Well, it is pretty fancy. It's pretty cool. So with your interest in embedded, but schema-based databases, was that just like, Was that professional or personal interests? Like, did you have a requirement at work that kind of required that?

00:02:38 Ben: You know, like actually getting into the embedded database side was actually like a weird, it was really more like, so I was actually at this job where we did analytics and we'd pull in like logs from these huge like Fortune 500 companies and we stick all the log information into like a SQL server like a central database and then process it all and it took like a week or 2 to process

00:02:59 this reports and it was brutal And I tried to tell my boss, I was like, hey, this is like when Hadoop first came out. This was years ago. I was like, hey, there's this thing called Hadoop where you can kind of like spread out all your, what is it, processing of the data to all your different nodes. And he's like, no, we'll just keep kind of doing it this way. So like I had in my head, I was like, you know what, I

00:03:19 bet I could make this a lot faster. And I just kind of like in my free time, like even after that job, I was like, I want to try to make like a faster version. So like I built on Redis at first and I was like, you know, it's still too slow. And I built kind of like my own data layer. And I started understanding like all the pieces of how you write a database. And that thing was just like, it ended up being wicked fast. So like the things that were taking a week to run could

00:03:39 run like a second. So you could really do like ad hoc queries, like when you really like lay out your data in a very specific way for the use case you have. So it was kind of interesting. And actually that project, it was called SkyDB. It was an open source behavioral analytics database that kind of didn't go anywhere, but Shopify. I actually, here's a funny story. Actually, I

00:03:59 gave a talk here in Denver about like behavioral analytics and how you can do data analysis with it. And this is kind of like funnel analysis and kind of like where you go on past that. And I gave this talk and I was like, hey, you know, for example, what if you were like Shopify for example, And you know, you've got like people come into your site and they click on

00:04:19 this and they go to this and like, you can see every step, like where people diverge and how people do, you know, make different decisions. And then like 2 weeks later, someone from Shopify, they were like, hey, we saw that talk, It was online. So I ended up actually working at Shopify for a couple of years. And we built out the analytics platform over there.

00:04:37 Kent: Oh, that's pretty cool. Yeah, I actually, so here's a quick little shout out to speaking at events in general. 1 of my early talks at a meetup got picked up by folks at Egghead. This is an online training platform.

00:04:58 And they reached out to me to make a course. And that was the first course I ever made was based on that talk. And the rest is history, as they say. So yeah, speaking and creating content, that can change the trajectory of your career, for sure.

00:05:13 Ben: Yeah, it really can. I always think it's funny too, this isn't really SQLite related, but like, I've seen about this today where like, I feel like there's a community. Like I always think about people I interact with on like Twitter and they're like, most people that are developers, you know, aren't like putting out content all the time or whatnot. So like the group of people, like, especially like in the Go community, right? Hang out a lot. Like, you know, it's

00:05:33 a couple dozen people, maybe, like kind of infrequently do content. So honestly, if you're just somebody that just puts out some content, like good content on a regular basis, you are like far ahead of, like, what, over 99% of developers. And you're like in this little small community of people that just make content. So you can really, I agree, it can accelerate your career, John.

00:05:52 Kent: Yeah, yeah, totally. And the big secret is that it's like, some people ask me how I like stay on top of things or how I have such a deep understanding of certain things and the secret is that like, I create content about it and that forces me to.

00:06:08 Ben: Yeah. So. And if you're wrong about something, everyone will let you know.

00:06:12 Kent: Yeah, yeah, you get these professional people who like really know their stuff telling you, hey, you got that wrong. It's like, wow, you never would have given me the light of day if I hadn't gotten it wrong.

00:06:21 Ben: Yeah, there's this guy on Hacker News, and like every time I post like a SQLite thing on there, he runs like, their company, He's at some company, but he runs like this pretty large, like multi gigabyte database, SQLite database, and they do all kinds of stuff on it and all kinds of tricks and whatnot. But he always, he'll always comment. I always learn something from his comments. His name's like Bob1243

00:06:43 or something like that. But yeah, it's like those, those people out there, like there's some guy that just like knows this knowledge and like, it's a great way to really expand, like challenge yourself really, to expand your own knowledge.

00:06:55 Kent: Yeah, you do have to develop a bit of a thick skin. Sometimes people aren't always kind about how they share their knowledge. And sometimes they're wrong too. Well cool, so that's how you got into embedded stuff. And then SQLite shows up

00:07:15 for you when? When you're at Shopify?

00:07:20 Ben: You know it was after Shopify, I'm trying to think of what actually spurred it on. I think I just started, I kind of started playing around with like, some people have added like schema stuff on top of Bolt. And that works to some degree, but like, like just having the SQL piece, like, you know, 1 of the thing I actually realized that was really helpful is having your your app

00:07:40 like Bolt, since it's an embedded database, you kind of have your application schema kind of embedded into your, but it's not so much JavaScript. You guys are very loosely typed, but like, you know, like typed, structs and whatnot within your program, they're really tightly coupled to your data, which is fine. Except when you want to like migrate data from like an old version

00:08:00 to a new version. It's nice to have this like intermediary data language like SQL to be able to do those kind of translations without having like a bunch of different versions of your structures. So yeah, yeah. As far as when I got into it, I actually don't even know. Like I've always kind of known about SQLite.

00:08:16 Kent: And

00:08:17 Ben: I always thought of it as that database you use when you need to run some tests or something locally. You know,

00:08:22 Kent: like, you actually ship a

00:08:23 Ben: real database later.

00:08:25 Kent: Yeah, yeah. Or like they throw it on a Android device or something to just store, you know, bits and pieces of information. Yeah, really just a toy.

00:08:34 Ben: Yeah, and then once you actually like start digging in like the amount of testing that goes into SQLite is just ridiculous. Just because it's deployed on its most widely deployed database in the world. So like if they make changes, you know it has to work. So it's yeah, their test suite I think runs for days and it's, they have like multiple, what is it? They have,

00:08:54 they don't, it's not certified for like aviation, whatever compliance, but the, the tests that they do are based on like aviation compliance stuff. Like the way that they do like branch coverage and whatnot is really strict around, you know, not just that you test individual branches, but that you actually test like the different

00:09:15 permutations of like Boolean logic within the kind of if statement, you know, the switch. That's too nerdy. That's

00:09:23 Kent: very interesting.

00:09:25 Ben: But yeah, it's really pretty, it's pretty interesting what they do. And you can find some talks too. Like the person that does the project is Dr. Richard Hipp. He's the 1 that started it back in, I think 2000. I'm gonna totally botch that, somewhere around 2000. And I think he originally did it for some kind of military system.

00:09:45 But anyway, he's done some talks on how they test this stuff and he's great, honestly.

00:09:51 Kent: Yeah, yeah. Over time as, so I'll say for me, where I started being interested in SQLite was in a conversation at RemixConf with Kurt, the CEO of Fly. He was there and we chatted about stuff and he said,

00:10:11 have you seen this Lightstream stuff that has been going on? And I actually had seen that. I think Michael Jackson had talked about SQLite and that SQLite was just really cool and there's this Lightstream thing. And so I hadn't really looked too closely into it, but the more, at that point after that, I was like, well, maybe

00:10:32 I should take a look at this. Because at the time I was having a lot of trouble with Postgres and Redis, each of these services on my site, I had like my site and then Redis and Postgres. And then I had those deployed to 6 regions all over the world. So I've got 18 services that I'm worried about.

00:10:53 And this was a couple of years ago now, I guess. So it was not a hundred percent reliable. And so if something were to go down, like the entire thing just kind of came crumbling down and it just was not pleasant with all those services. And so I was kind of in other database curious, I suppose.

00:11:13 And so the more I looked into SQLite and played around with the different options there. And the more I realized, like, this thing is really powerful. It's the biggest drawback was the lack of data replication so that I can deploy it to multiple regions. That was the reason I chose Fly,

00:11:33 was because I wanted to deploy my app to multiple regions, but I didn't want to be limited to the serverless runtime environment and that infrastructure out. Well, Fly is like serverless with long running servers. So that word kind of is weird, but I didn't wanna have like a non long running server. I didn't want

00:11:53 to just have a Lambda function. And so Fly fit the bill with its Postgres clusters. But as I said, the services were not always great for what I was doing. And so once I, once Kurt said, Hey, we're actually, now we're putting together this light of fasting. And so you can do data replication with

00:12:13 SQLite. That's when I got really, really interested because as it's an embedded database, now it's not a service. It's just like running alongside, it's not even running really, it's just like a file that's like sitting in your volume alongside your application that's running. So That was my experience getting interested in SQLite. And

00:12:33 so I guess Ben, what I'm saying is, I can just attribute all of my love for SQLite to you. And the work that you've been doing. So I think it would be good for people to get a little background on what is Lightstream and how did that convert into LightFS?

00:12:52 Ben: Yeah, so Lightstream, so it stemmed out of this whole idea of like how do you make SQLite work on the server? And like Lightstream was not my first idea. Like I went through a lot of iterations and probably spent, I mean, probably a couple of years honestly trying to like try different things. I tried for a little while, I was honestly just like rewriting

00:13:12 SQLite. I was putting it into Go just so I could better understand how SQLite is structured and how it works internally and trying to see if there's like ways I could hook in to how it writes and just different ways to kind of safely pull out data and be able to replicate it somewhere else. And I don't know what it was that kind of like,

00:13:32 kind of hit me that made me realize like, actually, so the whole premise of Lightstream works on this, what would I call it? It's kind of a quirk of SQLite. We love those, right? Yeah, yeah. And it works well. It's not meant to be like, it's a hack, but it's like an actual legit hack.

00:13:54 So like I guess I'll step back a little bit and explain how SQLite works. Really just like a really brief primer. So SQLite is an embedded database. It actually it doesn't run as a separate process. It actually embeds into the same process as your application. So there's no like IPC mechanism where you're like communicating over socket or anything. So it runs inside your process and then

00:14:14 it writes out to a database file. So it's a single file that stores all your data. And that's laid out in these contiguous blocks that are called pages. And then those pages are kind of logically structured inside of the database as this tree in your schema and all that stuff. You don't need to worry about how all that works necessarily. Just know that like you have these fixed size blocks, they're usually like

00:14:34 4 kilobytes. So what SQLite will do is whenever you make updates, it has a whole system of making those updates and they update an entire block at a time or multiple blocks for a transaction. It has a way to do that safely so that when you if you need to roll back, then it can recover all your data and there's no problem.

00:14:55 So there's a mode inside SQLite called the write-ahead log or the wall. And that's where every time it makes a change to your SQLite database, it'll write the pages to a separate file called the write-ahead log. And then once that write-ahead log gets too big, then it'll kind of transfer over all the pages back to SQLite, or the main data

00:15:15 file.

00:15:18 Kent: That doesn't really necessarily to a transaction, right? That's a separate concept.

00:15:23 Ben: So the transaction is kind of like a series of pages written to the wall. And then they're not actually committed until like the last page is written. There's checksums all along the way to make sure that you don't have like a partial right by the recovery mechanisms in there. But they use the wall to do transactions. But the nice thing about the

00:15:43 wall is that since it keeps depending on new versions to the end of the file. Any transactions that have started before that your transaction started can keep running even if more writes happen because they have the old versions of those pages. Oh, interesting. Yeah, so that's why you don't have, you can have multiple readers at the same time and they all have their own snapshot of the database at the given time

00:16:03 they started, even while the database is changing underneath, which is kind of cool.

00:16:08 Kent: That is interesting.

00:16:10 Ben: So what we do with Lightstream, so the trick is SQLite can't transfer pages back from that write-ahead log until basically all the readers, all the transactions are kind of stopped for a second. It has a thing called checkpointing and it'll kind of make sure everything's stopped. No one's actually reading or accessing the write ahead log and you can push the pages back over.

00:16:30 So Lightstream actually runs as a separate process, and it will basically hook on with a long-running transaction and block that checkpointing process from happening, just to ensure that it's able to pick up all those changes on the write-ahead log, and it can ship them out in chunks out to somewhere like S3 as a backup,

00:16:50 and then kind of control that process of writing the changes back into SQLite.

00:16:55 Kent: Okay, that's interesting. So, Let me, I'll restate it in my own terms

00:17:03 Ben: of what

00:17:03 Kent: you have so far. I know you're not done, but so like you're taking, the write-ahead log is just like, these are the things that are going to happen to the database. You know, once these things are committed, then, is that the right word? You'd say they're committed, Yeah. So once they're committed, then that's what the database is. And so you just keep track of those,

00:17:24 the write ahead log, the wall, and that's what you're streaming over to, you know, wherever you're doing your backups. Is that kind of the right? Okay, so you can almost think of the wall as like commits in a Git repo. So if you have that history, you can generate the final version just by processing each 1 of those.

00:17:44 Ben: Yep, yep. So when we do a restore off of that, we can just grab a snapshot, which is basically we'll pull every day or every, you know, some interval, like usually 24 hours,

00:17:53 Kent: you

00:17:53 Ben: can get a full copy of the database, push it up to S3, and then the wall tracking will just basically push up the incremental updates every second. The other nice thing, actually, this is another kind of quirk of not a sequel light, but S3 is S3 is super cheap to upload to, but really expensive to download from. That's where they make their money. So it's a great

00:18:13 like disaster recovery place because usually don't need to download and restore. You just always want to keep uploading. So you can run Lightstream even on a frequently updated database for less than a dollar a month.

00:18:26 Kent: Wow, that's cool.

00:18:27 Ben: That's pretty dirt cheap.

00:18:28 Kent: Yeah, yeah. So With that architecture, though, how do you handle situations where the wall file is created but then the transaction fails? And so, like, some of those walls are not actually going to be applied to the database, right?

00:18:44 Ben: Yeah. So, with Lightstream, Lightstream tracks the last place in the right head log it read from, and then it'll keep reading every second or so. It'll check to see if there's updates, and then it'll try to read forward all the transactions that have occurred since that point. And the way that SQLite works is that it has a rolling checksum

00:19:04 where every time there's another page written, it's basically there's a header on it where you'll have your data and then the header will have the checksum of that data and all preceding data. So that you have to have the checksums of every page be correct in order for it to be a valid transaction. So that's how it actually

00:19:25 keeps track and makes sure that it works. Yeah,

00:19:28 Kent: yeah. Cool. Okay, so I kind of interrupted you partway through, so you can finish now.

00:19:33 Ben: Oh, yeah. So, I mean, that's the long and short of it, is you kind of, you were streaming up all these little tiny updates to S3, and then when you need to recover, you can just simply say restore. Like if your server just completely gets hosed, or even if you like deleted your table or your database, you can go back and say, hey, I want to restore to this point in time. I don't know

00:19:53 a minute ago and it can go and rebuild that file from that state. So it's really nice in that sense. Yeah, that's really kind of all Lightstream is meant to do. And 1 of the biggest, and this might be your next question, but like 1 of the biggest things that like people always wanted, actually 2 different things people always wanted. 1 issue with Lightstream is that it runs on a single

00:20:13 process, like a SQLite database only runs on 1 server. So people can't, if you do a deployment, you have to kind of take down that server and then you're gonna have like a couple seconds of downtime before you can redeploy it up. And some people just need to minimize the downtime. If they do a lot of deploys, that can really add up. So that's 1 thing they

00:20:33 wanted to be able to have, like a primary server that can switch between different servers. The other thing is, like you mentioned earlier, is you really want to, there's a nice part about SQLite is that you can, It's really fast, but if you live in Chicago and your servers in India, then your request is going to be super slow because you have probably 200 milliseconds

00:20:53 of delay just to go send your request out there and get back. So replication was a big issue that people wanted to see fixed too. And I originally tried to put all this stuff into Lightstream, but it wasn't really architected exactly for that kind of stuff. So we built out this project called LightFS, which just kind of does those 2 pieces. You can change

00:21:13 your primary. You can switch between different nodes. And you can replicate your data out globally.

00:21:20 Kent: And that is where I came in to wanting to use.

00:21:24 Ben: You were very, very alpha stage of LightFS, and I appreciate your patience for sure. You brought up a lot of bugs, we fixed a lot of stuff, and it was great, so I appreciate that.

00:21:32 Kent: Yeah, yeah, Well, I appreciate being able to do that. I'll be honest, a couple of years ago, I was full on front end dev. I did not want to have to do anything with backend stuff. And even when I was at PayPal, I did plenty of backend stuff. It was all Node and everything there, but I still didn't get anywhere close to the database and

00:21:53 I didn't wanna touch that. And then, Remix kinda forced me into the backend and I wanted my own database. And I created all of these requirements for myself that like this is the way that I wanna teach people how to build sites. I feel like this is the right way to do it. And unfortunately, they're

00:22:13 just for me, there wasn't a whole lot available there. Like, they're just, I wanted to limit the number of third party services I was using. And so like, I'm not gonna be using, even though there are cool things like FanaDB that allows your data to be replicated all over, like that's cool, but I don't wanna use a service for that. I wanna use

00:22:33 my own database. And so, but I also want to deploy all over the world. And if you deploy all over the world, but your data is not all over the world, then like, why are you deploying all over the world? Like it doesn't necessarily make things faster anyway. And so, and then I can't use third party auth either because they're not deployed all over the world. So you have to resolve user authentication

00:22:53 sessions, you know, and so like, they're just, this actually, honestly, the 1 constraint of, I want my app to be globally fast everywhere, that 1 constraint just made it so that I had to use stuff like SQLite, LightFS, or Postgres clusters, but then, yeah, Postgres clusters just

00:23:14 didn't do it for me. So anyway, I'm thrilled that LightFS came to be. And so no, I don't mind the bumps that we had early on with LightFS and it's just been really awesome now, like from that point. I definitely do

00:23:34 some weird things. I'm definitely the kind of person who gets into edge cases very easily. So I'm glad that my weirdness has been helpful to LightFS.

00:23:46 Ben: No, yeah, I mean, you took it to another level. I mean, you started deploying to production on the very alpha build, so it was...

00:23:52 Kent: Yeah, yeah, so...

00:23:54 Ben: It really helps the ante.

00:23:56 Kent: Yeah, for a long time, there was this notice at the top of the docs that said, you know, not recommended for production use. And I made this meme of like, I don't know what movie this is from, but there's like this clown in the sewer and there's this girl looking down at him and he's like, come on down here. And then she like climbs into the sewer

00:24:17 in the next frame of the meme. And I had it saying, like the clown would say, not recommended for production use. And then I have me be the girl going down. Like it's so obvious you shouldn't be doing this because it says right there, don't use it for production. That won't stop me.

00:24:34 Ben: No.

00:24:35 Kent: But that now has been taken off the docks. I use it for production. I've got, I should look at how big my database is, but as far as like volume size, but it, yeah, I shouldn't say a number, but I can say for sure that I have over half a million rows in my database.

00:24:56 And I'm querying to that database like constantly. So it is very, very active database, and it's just, it's humming along great. I love it So

00:25:07 Ben: yeah, yeah, I'm glad to hear that

00:25:09 Kent: yeah,

00:25:09 Ben: it's nice to I mean when you do queries you're doing locally So it's not like yes You know like if you look at like like I love the whole idea of serverless databases that are coming out, but really you're pushing off your compute and your queries off to them and you have to pay for that every time.

00:25:25 Kent: But the

00:25:25 Ben: nice thing is when you have all your data locally on disk, the queries are super fast for 1 thing and there's no cost to it either.

00:25:32 Kent: So yeah. And no latency. Because like the, they, they not only do they have to pay for their compute, but they also pay for their inbound and outbound traffic too. And they have to forward those costs onto you so they can make money. And yeah, so like so many things are simpler and cheaper by just using SQLite.

00:25:55 And so LightFS is super cool, and that's what I recommend people use. But SQLite is being used in a lot of really interesting places for web stuff too. Do you wanna talk a little bit about that and maybe what you think, like why you think SQLite is getting so much attention recently?

00:26:11 Ben: Sure, yeah. I mean, interestingly, I think that SQLite, it's a great database, but it's been a great database for a long time. So I think, you know, the bigger question of like, why now? Like my gut feeling is that it's not that SQLite has gotten better. I mean, it has gotten better, but like, I don't think that's the reason. I think it's really more like computers have gotten better.

00:26:31 Like, you know, I don't know how long people have been in the industry, but like if you even go back 10 years, 15 years, like, you know, running a server was really expensive. And like you got, you know, a little bit of RAM and it was like a tiny little hard disk and like everything was super slow. And just the rate of change or just improvements of

00:26:51 you know, we have like NVMe disk drives now that are like crazy fast and you can get a couple gigabytes of RAM. You can get, you know, multi-core CPUs for pretty cheap. So I think the combination of all those things just made it so that more and more people can run even just a moderately, small to moderate sized website,

00:27:12 even on a single node, honestly, if you want to start there and not have any problems. Like you used to be, like, you know, back in the day, this is where I sound old. Back in the day, you know, you'd have like a dedicated database server and you put so much money into that thing to make it as fast as possible because that was really where you needed to optimize and you have a bunch of little app servers around it. But

00:27:32 now those app servers these days are as fast as that, you know, big clunky expensive database server back in the day, or faster really. So you just, I think it's just a, yeah, it's just expanded what apps can do on a single node, really.

00:27:47 Kent: Yeah, yeah. And actually that's very true. The Epic stack is built with LightFS support from the get-go. And that's not just for multi-region. There's like cloud backup and other reasons you wanna use LightFS, even if you have 1 region. But I do recommend that people start with just 1 region and 1 instance,

00:28:07 because I would say the vast majority of people don't actually need multi-region. For many of the apps that we're building, it's like US only or Europe only or whatever. And so having just a single instance, like these apps are fast and that network latency is really only a

00:28:27 differentiator for a subset of apps that people are building. So yeah, SQLite, I like your perspective as the aged, I'm just kidding. But yeah, I think that perspective is really helpful. And what's cool is,

00:28:47 yeah,

00:28:48 Ben: go ahead. Well, 1 thing I will say on that is like, it's funny when I was writing Lightstream, you know, people would always complain that like, oh, I'm not going to use SQLite because it doesn't scale and like I can't, you know, they need, you know, whatever, web scale, you know, MongoDB or, you know, Postgres or whatever, people complain it wouldn't scale. And it does, honestly, on a single node, but you can't scale, it

00:29:08 couldn't scale before to multiple nodes. And then once I wrote LightFS, everyone's like, who needs multi-node SQLite? Like, I don't need to scale. So it's always like someone always has a problem with 1 or the other. But I mean, I think having that gamut of like, sure, you can start small and then you can grow large. You can scale horizontally

00:29:28 pretty well for reads. And I think that's really where people need it typically. And I think having the story in place and like being able to do that, even if you don't need it now, I think is a big deal when you're choosing a database initially.

00:29:40 Kent: Yeah, 100%. And relatively recently, SQLite isn't done, Like they're still actively developing it. That's why they have that huge test suite, of course. And so relatively recently, there was something that caught my attention, which is that SQLite

00:30:00 now supports an exabyte-sized database. It's just like, say what? It's, yeah, this thing can scale in volume as well.

00:30:14 Ben: No, yeah, you're gonna hit your disk drive size limits before you hit SQLite limits for sure. Excuse me.

00:30:21 Kent: That's pretty remarkable, honestly. Okay, so I would like to talk a little bit about some of the trade-offs though with SQLite. So like we've talked about how amazing SQLite is, and it totally is, and it satisfies most use cases for most web apps. And in fact, I have gone so far as to say that most web apps

00:30:41 should use SQLite, because, like, for the vast majority of us, it is enough, and it is simpler than other solutions. And simple is better so long as it satisfies your requirements. However, there's no panacea. Like there's no silver bullet that can do all the things and is the best at that. So

00:31:02 I'll lead in with 1 that I think is kind of ironic a little bit So you got into embedded systems with that job of analytics? That the or embedded databases with the analytics stuff sequel light. I think would probably not be a great database for capturing the analytics rights. Is that,

00:31:23 would you say that or am I off?

00:31:25 Ben: You know, I think that, I think it depends on the scale of analytics you're looking at. Honestly, you can get really far just throwing it into SQLite and calling it a day. If you start scaling up, there's definitely more optimized things you can look at. Like DuckDB is kind of like the SQLite version for analytics. It's like an embedded database, but instead of...

00:31:47 So SQLite is row-oriented, So every time it stores a row of your data, it puts it in like 1 contiguous little block of data. Whereas DuckDB is column oriented, which means they split up all your columns and store them separately for every row, all the different values. And you can do a lot of really interesting compression stuff and optimize analytics queries a lot with that.

00:32:07 So if you do start scaling up, if you really do start hitting limits of SQLite, honestly, DuckDB is great. And there's, I think, MotherDuck is a startup that's started up around them, doing analytics with some super smart people. So, yeah, that's definitely

00:32:21 Kent: a cool

00:32:22 Ben: 1 to grab.

00:32:22 Kent: The reason that I said that was because it's my understanding where SQLite, or at least my thought of SQLite's biggest trade-off is in rights and the fact that you can't have concurrent rights. Can you talk to that a little bit?

00:32:39 Ben: Sure. Yeah. And yeah, so concurrent rights is you can't have. So sure. Let me back up. So you can have a single writer at a time with SQLite. So if you do a transaction and you write some data, any other transactions, if you set something called a busy timeout on there, they'll just queue up after that 1 and

00:32:59 they'll get processed immediately after.

00:33:02 Kent: So they'll show up eventually. It's not like it's just going to fail. Yeah, or it'll time

00:33:06 Ben: out eventually if you have too many writes. But the trade-off is really like if you keep your writes short, you can do it really fast. I mean, sub-milliseconds. So you can have thousands of writes per second for small to medium-sized writes. I think if you get into big analytics, that

00:33:26 could be a problem. If you're taking in tens of thousands or hundreds of thousands of writes a second, I mean, you're really going to have to scale up to something else at some point, but that's not usually most people's problem. Yeah. And that's also per database too. So honestly, like a nice property of SQLite is, you know, your users, maybe you have a database per user, honestly. Like if you're tracking websites, each website could

00:33:46 have their own SQLite database, it's just a file on disk. So you don't have a lot of overhead with each 1.

00:33:52 Kent: Yeah, and that's not an extra service per user. That actually, so you wrote a blog post that I saw today that mentions that. So like if you have a multi-tenant app, then you could just have a separate database for each 1. And normally that's like this big consideration. You're like, oh, I don't know if I want to do that. Let's just do a workspace ID on all

00:34:12 the row tables and stuff. But in SQLite, it's just a file. So just make another 1. It's pretty cool.

00:34:18 Ben: Yeah, it's kind of cool. I mean, you can, and it really protects you from accidentally forgetting to like limit your queries on a certain workspace. So it's pretty nice in that sense. And yeah, And honestly, like, so you can also, you can optimize your write throughput as well with that. And there's some other settings in there within SQLite you can adjust to make it so that you can get faster writes.

00:34:39 So it's all, you know, things you can tweak.

00:34:41 Kent: Yeah, yeah. So you can get quite a few writes, but with LightFS though, isn't there like quite a bit stronger limitation on the number of writes per second you can handle? And can you explain why?

00:34:55 Ben: Yep. So LightFS right now, so it runs through a layer called Fuse. So we, LightFS exists in the file system layer. So we actually mount it as a file system. You write databases onto there, and it'll track. It basically passes through to the other underlying database files, but it'll track when the application is taking locks and doing

00:35:15 some other things to actually determine where transactions start and stop. And when they start and stop, we can grab the information, kind of the data that's changed within that, package it up and can ship it off. And we have a lot of control because they're right there in the file system layer. The downside of that is it's this layer is called FUSE, file system

00:35:36 as in user space. I think that's what it stands for. It's a Linux system where you can build your own file systems basically. And that's notoriously not fast, which is interesting. It seems like a terrible thing to write a database layer on. And for the right side it does slow things down. You tend to get limited

00:35:56 around like 100 writes per second, which is still above and beyond what most people probably need.

00:36:01 Kent: Yeah, but by a long shot, Like if you're listening right now and you're like, oh, only a hundred writes a second. No, no, no. Your app probably does not have that load.

00:36:10 Ben: No, yeah,

00:36:11 Kent: for sure. It's possible, like you're an event ticket, like you're selling tickets and the tickets go on sale and it's Taylor Swift, yeah, okay, go ahead and use Postgres instead or something. But yeah, for most of us, that's not even close to how many writes a second we're getting.

00:36:27 Ben: Yeah, and when you start adding it up and multiplying by number of seconds in a day, like 100 times whatever is 86,400 seconds in a day, like doing like 8 billion or whatever million, I don't know, some crazy number of rights a day. It's like, you're not doing that. So yeah, most people honestly get. A small handful and that's, that's

00:36:47 most web apps. I think everyone always looks at Facebook and Google and they're like, oh, what do they do? They have a gajillion rights a second or requests a second. It's like, that's not the vast majority of people out there, so don't build your systems like that.

00:36:59 Kent: Yeah, yeah. That actually, I think that's such a key point that like people look at those apps and they say, I wanna build my app that way. And it just, that does not make sense. That's kind of like saying, I need to go to my neighbor's house. And so I'm gonna get charter a plane.

00:37:18 Ben: Like, no. Yeah, like Bezos does that, yeah.

00:37:21 Kent: It's very efficient at getting you places. It's very good at that, but you don't have to go that far.

00:37:28 Ben: Yeah, yeah, for sure. There's a lot of context for sure when you're choosing things. But yeah, so like on the right side, LightFS is slow right now. We are building out a virtual file system layer, so we will avoid that fuse layer entirely if you want to. The fuse layer is just nice because you can treat it just like SQLite, and it just works like the regular SQLite

00:37:49 on disk. But so we have plans to make that faster on writes. On the read side, you really shouldn't notice almost any difference from regular SQLite. What happens is the Fused Layer does serve out pages from the database, But then the operating system has a thing called the OS page cache. And as it reads pages in, it keeps them in memory and like a

00:38:09 little inside the kernel, like the file system kernel or the OS kernel. And it'll automatically just reserve those pages super fast. So you generally don't notice much of a difference on the read side.

00:38:22 Kent: Yeah, yeah. And the reads are like going to be pretty dang fast anyway. In fact, there's this post on the SQLite site that's titled 35% Faster Than the File System, which is very interesting. So in the Epic stack and in the workshop, we store all the user upload

00:38:43 or image uploads in SQLite, which is like, at first glance, you're like, whoa, that sounds wrong. But in some cases, it actually can be faster, just because of the way that paging works. Is that right?

00:38:58 Ben: Yeah, yeah. And you honestly, a lot of times you won't even notice.

00:39:01 Kent: Yeah, if it is lower, it won't be noticeably slower.

00:39:04 Ben: Yeah, for sure. I think once you get to like certain size, like you probably shouldn't store videos in there. Yeah. That kind of thing, but like most images are probably fine.

00:39:12 Kent: Yeah, and in fact, you can correct me if I'm wrong, but I don't know if you can stream data into SQLite, or this is probably a little outside of my depth. But would that be possible to set a video? Like, let's say it's a gigabyte in size.

00:39:31 Ben: Yeah, there is an optimized way to write in blobs. It's called the blob AVI. And not every client library supports it, but

00:39:39 Kent: yeah,

00:39:40 Ben: some do. It's definitely a much more optimized way to like read in and out blobs without copying so much data into memory.

00:39:46 Kent: Yeah, yeah. So in the workshop, we take the file uploads and load them up into memory and then we save those blobs all together at once, which works great for an enormous number of use cases. But that actually brings us to another topic that I wanted to touch on. I know we're getting toward the end of our time, but

00:40:06 LightFS being used potentially for more than just SQLite, but files in general. Can you speak to that at all?

00:40:16 Ben: Oh, sure. That's been like 1 really big request to have, like people always ask, hey, can we also stick other stuff in this thing and distributed out to edge nodes as well? So there's an issue out for it. I don't think it's a super hard problem to solve. We really kind of have the very standardized packaging format for all the changes

00:40:36 and when you're uploading a like a file it's really just packaging into this format and shipping it off to your other nodes. So I think it's definitely doable but that would give you the ability just to instead of using an S3 or something like that, you know, for at least initially, I mean, if you don't have a ton of blob data that you really need to worry about or that you're trying to optimize

00:40:56 costs for, I think it's a Good alternative where you could just put it into LightFS. It'll distribute out to your edge nodes and you can serve them from there.

00:41:06 Kent: And distribute it out to your edge nodes. It's nothing to sneeze at either. Like you put something on S3 and you're choosing a region, but with LightFS, it's gonna just be deployed where you're at, so, or like whatever regions you're in. So that is cool for some use cases still. Like, again, this is not for everything, but

00:41:26 1 of the really, the things I really wanna focus on with the Epic stack is just let's see how simple we can get this to be, how few services we can use and have it be reasonable still. So that you can get started with something. A lot of people will say, let's just use as many services

00:41:46 as possible so we don't have to build all this stuff ourselves so that we can move fast. And I can appreciate that, but when the thing does end up working out, you're gonna have to migrate off of all that stuff to save money or for whatever, they don't satisfy your requirements, all that stuff. And so I would rather be able to say,

00:42:06 oh no, you don't have to worry about building stuff, but you also don't have to worry about setting up these accounts because you just generate this project and it has all of these features that don't require paying another separate service for all that stuff. So I am definitely looking forward to having that support. I think that it would satisfy

00:42:26 an enormous number of requirements for people with existing apps and then for pretty much everybody just getting started as well. So yeah, LightFS is pretty cool.

00:42:38 Ben: Thanks, yeah, thank you. Yeah, no, I agree. I think that every new service you add is just another liability. So I think if you can minimize it, I mean, like you slowly become an expert in all those services, which is painful. And every time you, if you hire somebody and you bring them on, they have to slowly become an expert in all those things. Cause like, they just, they all break in like weird little ways. And

00:42:58 the older I get, the more I just want something simple that just kind of works.

00:43:02 Kent: Yeah, yeah, 100%. There are definitely trade-offs upside down and sideways.

00:43:07 Ben: Like,

00:43:07 Kent: you know, now you're managing this, but if it's built for you and like you just get started and there it is, then I don't know, it seems kind of nice to have it that way. So this has been awesome to chat with you, Ben. Thank you so much for giving me some of your time. Is there anything that we didn't touch on

00:43:27 that you really wanted to make sure we talk about?

00:43:30 Ben: No, I think that's a great overview of kind of where everything's at. No, I think you did a great job. Yeah, thanks for having me on too. I always love chatting SQLite. So, I mean, if anybody ever has any questions, happy to reply.

00:43:41 Kent: Yeah, yeah. What's the best way for people to reach out to you?

00:43:44 Ben: Oh, you can find me on Twitter. I'm Ben B. Johnson. That's a good way. I would say Discord, but you just messaged me earlier and I missed it, so yeah, probably Twitter's the best. Or GitHub. Awesome,

00:43:55 Kent: cool, cool. Well, good. And if anybody wants to contribute or help in some way, Like what's the best way that somebody could support the work that you're doing?

00:44:05 Ben: You know, honestly, the best thing is just tell us, like, how are you using it? And like, what things do you find confusing? What things could make simpler? Like, that's really the goal. You know, I live my world for probably the last decade, like in low level databases. So, you know, different perspectives. I always love hearing, and I always just want to make things as simple as possible. I think that's kind of the best

00:44:25 software out there. Is a software you don't really think about. Yeah, you just like, You type a command, it's in there, and you forget about it almost, because it's just that easy. So, yeah, I would love to hear what people think and how we can make it better.

00:44:40 Kent: Awesome. Well, thank you so much again, Ben, and thanks everybody for listening. We'll talk to you later.