![]() ![]() The rule of thumb in relational database systems is to never deal with distinct queries. As a planner, you'd be correct in choosing a sequential scan, since walking over an index completely is more costly, as we already saw above. This command does not use the index and resorts to sequential scan. SELECT COUNT(DISTINCT(id)) FROM sample_data Count(Distinct() Now, let's try to count the number of unique rows in the table. The speed depends on many factors, including cardinality, size of the table, and whether the index is cached. This command uses an index-only scan but still takes around 3.5 seconds. SELECT DISTINCT(id) FROM sample_data Distinct() Let's touch upon distinct, which is often used alongside count. duplicate countsīy default, count query counts everything, including duplicates. Partial indexes are faster, easier to cache because of their size, and easier to maintain. CREATE INDEX id_btree ON sample_data USING BTREE(id) WHERE id = 200000 Partial Index If we are going to count rows only with a specific where clause, then we can use a partial index. SELECT COUNT(id) FROM sample_data WHERE id = 200000 Partial Index Partial index Since we already have a full index, a count query with a where clause on the id the column would be very fast. Let's consider a full index and a partial index. ![]() Once there is a where clause involved, then it is very easy for the planner to use the index since it cuts down a lot of rows from the result. There are also increased access costs by directly accessing the index, due to which a sequential scan over normal tuples/table data is preferable. SET enable_seqscan = OFF Ĭounting nodes in a BTree data structure takes O(n), where n is the number of rows and also takes extra memory - O(h) where h is the height of the tree. ![]() We can understand that when we turn off Seq scan. An index helps if there is a where clause but otherwise, scanning the index will be slow. This is because the count has to touch all of the rows of the table. Unlike other queries, the index does not help here. CREATE INDEX id_btree ON sample_data USING BTREE(id) The first thing that any database user would do is add an index to speed up a query. But in a transactional database, we cannot simply rely on parallel queries - as you can see in this example, it still takes 342 milliseconds. They just throw hardware at the problem to enable faster query execution. This uses three workers out of the four we have configured. INSERT INTO sample_data SELECT generate_series(200000, 4000000) Īnd then try to analyze the plan, as shown below: Analyzing the plan Let's insert more data into our sample table and enable PostgreSQL to use more workers. PostgreSQL does not use parallel processing since the row count is too small, and using parallel workers might slow the counting process. ![]()
0 Comments
Leave a Reply. |