Loading
Current section: Query Optimization 4 exercises
solution

Query Performance with Indexes in SQLite

Transcript

00:00 To be able to optimize this query, we need to explore it a little bit and have SQLite explain this to us. I'm going to run SQLite 3 on my Prisma data DB file, and I'm going to copy the explain query plan from the instructions. We have explained query plan. Here's our query.

00:19 We filled it in with some pre-filled values, and here's the query plan that SQLite is planning on for this particular query. Let's start out with actually simplifying this a little bit, because there's a lot going on here. We've got a scan.

00:36 Right here, we have this use temp B-tree, all of this stuff. I think it'd be easier if we scope it down to the specific parts and then we can slowly expand it. We're going to start actually with when we originally wrote these queries, we had a much simpler query before we added the order by and the left join and all that stuff.

00:56 I'm going to say, don't ask again, and we're going to paste this right here. We're going to select the user ID, the username, and the name from the user, limit 50. Here, we're getting a scan of the user. The scan is not awesome, but the reason that the scan is happening is because we don't

01:14 have a where clause, so there's nothing to even use an index for. That's why it's scanning, and that doesn't really apply to us anyway. Let's go ahead and add a where clause and or just like an order by clause. It's actually what we're going to do here.

01:33 With the order by clause, here, let's bump that up right there. We are going to have a scan of the user, but we're using an index this time. With this order by user.username, we're going to get that order or that index,

01:51 and so that will be plenty fast. That's just fine. Using a scan with an index is not going to be a problem. Next, we're going to add the name column. The name column is not indexed, and so now we no longer get the index because we

02:08 have to look for things by the name. We have to go through every single row to order it properly by the name, and then we can take the top 50 and return those. We're getting a scan of the user, and the SQLite is using this temp B-tree for the order by.

02:24 This is a special data structure that it's creating. That's just an in-memory thing to handle ordering this stuff. Certainly not something you need to yourself worry about, and it's not 100 percent always a bad thing. This is just the best way that SQLite feels to organize this,

02:43 but would be nice if you could avoid creating that temporary data structure while it's doing this order by, and that's because we don't have an index on the name. Probably wouldn't make much sense to do an index on the name because we don't have a lot of queries where we're actually going to be ordering by the username.

03:02 On top of that, there's actually not a lot of similarity between different rows anyway. You'd basically be duplicating that column for the index, which would be probably not the best. But it is important to know that the B-tree structure is going to eat up

03:20 some of your memory and some CPU. If you ever see a spike in your memory or CPU, it could be because some of your queries are using this structure, which again, not always a bad thing, but can be problematic. Let's do another query. We're going to select from the user and do our left join with the image.

03:39 Then we're going to order by the username, something that was indexed. Because that was indexed, we can actually do the scan using the index there, and that works just fine. This left join is also on these columns that are the primary and foreign keys, and those are indexed as well, so we're safe there.

03:57 That's nice. That doesn't have any issues. Let's go a little further. Now, we're adding the image ID here. Actually, I should have mentioned before, we didn't even have the image ID, and so that's why we weren't even using the left join. In fact, that wasn't even in our query plan.

04:16 But now that we are selecting on the image ID, it has to do a scan on the user because we're ordering by the username, so we can use that index. But then we're searching on the image using the index because that is an indexed column for us,

04:33 and so that also works just fine, no problem. Now, we're going to add the where clause with our like. Here, we're going to be able to use the index for this order by on the username,

04:50 and then we're searching on the image using the index. Actually, our query plan didn't really change at all because this username is indexed. What's interesting though is that this is probably not what's actually happening because according to

05:08 the rules that SQLite has, it says that once you have a wildcard search like this, it cannot use an index for that particular column. That one is a little suspect. I'm not sure that the query plan is giving us the right thing,

05:27 but it is saying that we are using an index. I'm just not sure how it could do that. Let's add the or here and see what that does to us. Now, we're adding this or user.name, like Cody. Again, it's still saying we're able to use the index for both of these things.

05:44 I'm not totally buying it, but what are you going to do? If you can't trust this, well, you can't trust anything. But I'm pretty sure we're not able to use an index, but let's keep on going. Now, we're going to go to that order by sub-query that we had. We're going to select the updated app from the note where

06:03 the owner ID is whatever ID it's going to be, and then order by updated app and limit one. We have an index on the owner ID. We added that in the last step. But we're going to use this temp B-tree for the order by. We have to create a temporary data structure

06:21 to handle that order by on the updated app. Because of course, that's not going to be indexed in, probably be a really useless index anyway. There's definitely something not entirely awesome about this. Let's put that back as our sub-query and look at the whole thing.

06:41 Here we have our select on user with the user image where the username and name are like this, and then we're going to limit 50, and we've got our order by. We're doing a scan on the user. We're not able to use the index, and that's because of this order by. Yeah, that's not super-duper awesome.

07:01 So we're getting no index used on the user table. So as we have more and more users, we're just going to be scanning more and more of those users. This query is going to continue to grow in the amount of time that it takes.

07:19 And then we're also showing the B-tree for the order by because that's a sub-query that will happen for every single user. So we thought the B-tree, this temp B-tree thing was sometimes not great and probably not always great, but probably fine. Well, the fact is that every single user is going to get

07:38 this temporary data structure in memory, and the CPU is going to have to create that and then destroy it and create and destroy it for every single one of our users. So how do we optimize this query? What are we going to do? I think the query we really need to optimize is this sub-query right here. This is the one that's going to be taking the longest.

07:57 And so if you recall back with our photo album metaphor, you want to first have an index for the bigger buckets. And in our case, I think a multi-column index makes a lot of sense. So you want to have an index for your WHERE clause first, not always, but in our case, the WHERE clause

08:17 being the first thing that you have an index on, and then the order by being the second thing you have the index on. So we're going to have a index on the owner ID and the updated at, and hopefully that will allow us to get rid of this temp B-tree and allow us to use that index.

08:34 So we're going to open up the schema Prisma, and we're going to make the single line code change that's going to make all the difference. So this is going to be an index on two columns. This is our owner ID and updated at. And yeah, you definitely want to add at a comment explaining this.

08:54 This speeds up our user search query by a lot, hopefully. Maybe write the comment after you're sure about that. Okay, so then what we're going to do here is I'm going to get out of here and we're going to write npx Prisma db push, and we're just experimenting.

09:17 So we're not going to actually do the migration. And there we go. It did take a moment because indexes, you have to keep in mind that indexes will slow down write speed. Like it takes some time to create that index and every record you add, you have to update the index. In practical cases, that's not going to have an impact. We already talked about that.

09:36 So yeah, let's give this a whirl now that we have an index. Whoa, oh my word. Look at how much faster that is. Come over here. 47 milliseconds from 900 to 47. Thank you, index. That's awesome. And also if I do a search, yep, that's plenty fast.

09:55 Yeah, like there you go. It's amazing to me that just by inspecting a little bit and thinking a little bit about the stuff that we, the indexes that we could have and how the query is going to be executed, we can have such a drastic influence

10:13 on the speed of the query. Now I'm not going to clear this. I'm going to open up a new tab so we can compare the before and after. So SQLite 3, Prisma, DataDB, and then here, let's stick this on this side and that way we can compare these two. So let's grab that explain query plan again.

10:32 And now what we've got is kind of interesting. So we are still scanning the user. We do have to look at every user record, but it's going to be quite fast because the rest of these parts of the query are going to be fast. So we're going to search on the image using an index that's unchanged there.

10:50 And then we've got this correlated scalar subquery one, which is exactly what we had before. Before we had this note using index, we're now using this search note using covering index,

11:08 note owner ID and updated at IDX. So we no longer need to have this temp B-tree. We do still have to do that for our users, but again, that's for the entire query, not like one B-tree for every single user. So that's fine.

11:26 But yeah, doing this covering index on our note means that the database doesn't even actually have to look at the records at all. It only looks at the index, which is going to be screaming fast. So yeah, that works out. Covering indexes are also an interesting thing. We just happened to create a covering index

11:46 based on our analysis of this order by subquery. But even if our index hadn't been a covering index, it would have still definitely made things quite a bit faster. But yeah, and feel free to dive in. I've got a link that explains what a covering index is.

12:04 But the point is that you're going to analyze your query using the explain query plan. And then you're going to look for things like scans and these temp B-trees. And especially when you've got these subqueries, if you're doing that, you have to keep in mind

12:22 that that's going to happen for every record that matches. And so that's going to be a subquery that you want to optimize. And yeah, by adding this multi-column query or a multi-column index, we have drastically sped up this page.

12:40 And I wouldn't think about adding a cache to this page until we get way, way further down the line, which is awesome. The idea is like kick the can further down before you have to start adding a bunch of complexity. And now all of a sudden you're x.com and you have to like deal with the many, many users

12:58 who are constantly posting and all of that stuff too. So kick that can down the road as far as you can by using the performance optimizations that are offered to us by the database. You've done an awesome job that our product manager is really happy with it. So well done on analyzing this and have, yeah,

13:18 like it just feels so good to have such a huge impact, especially when the amount of code necessary to make the change was so little. So at this point you would definitely want to do a NPX Prisma, whoops, Prisma migrate dev. And we're going to, yes, we're gonna delete all the data. I don't need this stuff anymore.

13:38 And we're gonna call this migration fast index. That's not what I would call it. Maybe a user index or something like that would be better. But yeah, now it's gonna see it again. It'll take forever. So yeah, well done. Yeah, you should feel proud of yourself. Good job.