2020-03-21

Hibernate - (HQL and JPQL) Distinct

Distinct

For JPQL and HQL, DISTINCT has two meanings:

It can be passed to the database so that duplicates are removed from a result set

It can be used to filter out the same parent entity references when join fetching a child collection

Using DISTINCT with SQL projections

For SQL projections, DISTINCT needs to be passed to the database because the duplicated entries need to be filtered out before being returned to the database client.

Using DISTINCT with projection queries example

         List<String> lastNames = entityManager.createQuery(
"select distinct p.lastName " +
"from Person p", String.class)
.getResultList();
When running the query above, Hibernate generates the following SQL query:

SELECT DISTINCT
    p.last_name as col_0_0_
FROM person p
For this particular use case, passing the DISTINCT keyword from JPQL/HQL to the database is the right thing to do.

Using DISTINCT with entity queries
DISTINCT can also be used to filter out entity object references when fetching a child association along with the parent entities.

Using DISTINCT with entity queries example

List<Person> authors = entityManager.createQuery(
"select distinct p " +
"from Person p " +
"left join fetch p.books", Person.class)
.getResultList();
In this case, DISTINCT is used because there can be multiple Books entities associated with a given Person. If in the database there are 3 Persons in the database and each person has 2 Books, without DISTINCT this query will return 6 Persons since the SQL-level result-set size is given by the number of joined Book records.

However, the DISTINCT keyword is passed to the database as well:

SELECT DISTINCT
    p.id as id1_1_0_,
    b.id as id1_0_1_,
    p.first_name as first_na2_1_0_,
    p.last_name as last_nam3_1_0_,
    b.author_id as author_i3_0_1_,
    b.title as title2_0_1_,
    b.author_id as author_i3_0_0__,
    b.id as id1_0_0__
FROM person p
LEFT OUTER JOIN book b ON p.id=b.author_id
In this case, the DISTINCT SQL keyword is undesirable since it does a redundant result set sorting, as explained in this blog post. To fix this issue, Hibernate 5.2.2 added support for the HINT_PASS_DISTINCT_THROUGH entity query hint:

Using DISTINCT with entity queries example

List<Person> authors = entityManager.createQuery(
"select distinct p " +
"from Person p " +
"left join fetch p.books", Person.class)
.setHint( QueryHints.HINT_PASS_DISTINCT_THROUGH, false )
.getResultList();
With this entity query hint, Hibernate will not pass the DISTINCT keyword to the SQL query:

SELECT
    p.id as id1_1_0_,
    b.id as id1_0_1_,
    p.first_name as first_na2_1_0_,
    p.last_name as last_nam3_1_0_,
    b.author_id as author_i3_0_1_,
    b.title as title2_0_1_,
    b.author_id as author_i3_0_0__,
    b.id as id1_0_0__
FROM person p
LEFT OUTER JOIN book b ON p.id=b.author_id
When using the HINT_PASS_DISTINCT_THROUGH entity query hint, Hibernate can still remove the duplicated parent-side entities from the query result.

4 comments: