Distinct on: a confusing, unique and useful feature in Postgres, Hacker News

We saw how DISTINCT works at the end of this post: Select Statement In Postgres With Examples

Let’s explore it further.

When I saw DISTINCT ON, I was like, there must not be anything new about it, you know, just another similar kind of feature with a different name. But I was wrong! It seems very powerful feature to me at least!

What I did was nothing special. Googled it and went through tons of articles. Most of them are filled with the official documentation but not the simple explanation and some of them are with nice decent explanation. I couldn’t understand most of it, to be frank. I was wondering why the feature is so hard to get hold of! So, I tried to play with it and find out more about it. Before I show some of my findings let’s just go through some crappy theory first!

As per the official documentation, SELECT DISTINCT ON (expression [, …]) keeps only the first row of each set of rows where the given expressions evaluate to equal.

In simple terms, if we sue DISTINCT ON then it will give us the first result from the set of results which are grouped together.

Now the question is, when did we group them? We just simply use DISTINCT ON. Well, with DISTINCT ON, we just want PostgreSQL to return a single row for each distinct group defined by the ON clause.

The DISTINCT ON clause will only return the first row based on the DISTINCT ON (column) and ORDER BY clause provided in the query. For other columns, it will return the corresponding values. Basically, it LIMITs 1 by default when we use DISTINCT ON.

The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY. That means we can decide what row we want in our results-but by only ascending and descending the columns.

Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. That means the results will be different if we don’t use ORDER BY as Postgres is not smart enough to know our minds! Basically, the results will not be in order. For example,

  SELECT DISTINCT ON (location) location, time, report  FROM weather_reports;

Above example is from official documentation. Now, we are not adding ORDER BY clause in it and that’s why we don’t know which of the rows will be selected. If we add, ORDER BY like the following example, we can be sure of a specific row.

  SELECT DISTINCT ON (location) location, time, report  FROM weather_reports  ORDER BY location, time DESC;

The query retrieves the most recent weather report for each location.

Now, let’s just take a few more examples.

Is there any difference between the results of DISTINCT and DISTINCT ON queries?

Yes, there is. That’s why they have both! Take a look at the queries and results below.

I am using this database. https://github.com/ydchauh/yogeshchauhan.com-public

Simple DISTINCT query:

  SELECT DISTINCT country, contact_name, company_name FROM customers;

Output:

DISTINCT ON query:

  SELECT DISTINCT ON (country) contact_name, country, company_name FROM customers;

Output:

What about adding ORDER BY?

  SELECT DISTINCT ON (country) contact_name, country, company_name FROM customers  ORDER BY country;

Output:

There is no difference between the outputs from the previous and the current query. That’s because Postgres is showing results in ASC (Ascending) order by default.

Let’s try DSC (Descending) order.

  SELECT DISTINCT ON (country) contact_name, country, company_name FROM customers  ORDER BY country DESC;

Output:

As we can see, the results are completely different. So, when we use thie DISTINCT ON, we need to be careful what we want in our results.

Now, by this point, we know that it acts like GROUP BY.

Then, why do we need GROUP BY in Postgres?

Well, let’s understand that by queries.

  SELECT country, contact_name, company_name FROM customers group by country;

The query above will raise an error as we are not using COUNT or any other aggregate functions as well as we are not adding all the columns to the GROUP BY clause. So, one of those options we need to choose. Either add aggregate function or add all columns to the GROUP BY clause.

The screenshot of the error:

So, if we add all the columns to the GROUP BY clause then we will basically get the same results as just simple DISTINCT query.

  SELECT country, contact_name, company_name FROM customers  GROUP BY country, contact_name, company_name;

Output:

We can not add an aggregate function to just one column. Take a look at the query and results below.

  SELECT country, COUNT (contact_name), company_name FROM customers group by country;

So, we need to write down query like this:

SELECT country, COUNT (contact_name), COUNT (company_name) FROM customers group by country;

  SELECT country, COUNT (contact_name), COUNT (company_name) FROM customers GROUP BY country;

Output:

So, as we can see in the screenshot above, we are getting the number of rows using GROUP BY but if we want the first result from all those groups, we need to use DISTINCT ON and that’s why I consider it a powerful feature of Postgres.

Sources: https://www.postgresql.org/docs/9.5/sql-select.html (Read More)

Distinct on: a confusing, unique and useful feature in Postgres, Hacker News

What about adding ORDER BY?

SELECT DISTINCT ON (country) contact_name, country, company_name FROM customers ORDER BY country;

Let’s try DSC (Descending) order.

SELECT DISTINCT ON (country) contact_name, country, company_name FROM customers ORDER BY country DESC;

What do you think?

European Accessibility Act: D. Lgs. 82 del 2022

JEDEC Solid State Technology Association launches DDR5-8800 memory standard with peak bandwidth increased to 70.4GB/s

New Burp Extension: JWT-scanner

The water supply system of a small town in the United States was attacked, suspected to be caused by a Russian hacker group

Build a ransomware attack defense line based on a platform and system

A cyber attack paralyzed operations at Synlab Italia

iOS 14 leak reveals feature that lets you use apps even if you haven’t installed them, Ars Technica

Pixel 4 feature drop adds new emoji, scheduled dark mode, faster Google Pay, Ars Technica

Feature Request: option to disable reading from .env · Issue # 6741 · docker / compose, Hacker News

Steam’s next big feature will make any “local multiplayer” game work online, Ars Technica

Leave a ReplyCancel reply

Cheats For Little Alchemy

3TB Of Mega.nz Links For Free Courses And E-Books 2022 (Updated)

How to Earn Money from FreeCash.com, Playing Games, Testing Apps, and Taking Surveys

Udemy Coupon [100% OFF] QuickBooks Online 2020

Amazon FBA Product Research & Find Products for Amazon FBA

Rubot v6.6.7.0 – Twitch Views Bot 2022

A special announcement about SAT Subject Tests | MIT Admissions, Hacker News

Tax Day moved: You now have until July 15 to file your 2019 tax return, Recode

What about adding ORDER BY? SELECT DISTINCT ON (country) contact_name, country, company_name FROM customers ORDER BY country;

Let’s try DSC (Descending) order. SELECT DISTINCT ON (country) contact_name, country, company_name FROM customers ORDER BY country DESC;

What do you think?

Leave a ReplyCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

What about adding ORDER BY?

SELECT DISTINCT ON (country) contact_name, country, company_name FROM customers ORDER BY country;

Let’s try DSC (Descending) order.

SELECT DISTINCT ON (country) contact_name, country, company_name FROM customers ORDER BY country DESC;