2020-03-21

Hibernate Search - Predicate DSL

Predicate DSL

The main component of a search query is the predicate, i.e. the condition that every document must satisfy in order to be included in search results.

The predicate is configured when building the search query:

Defining the predicate of a search query

SearchSession searchSession = Search.session( entityManager );

List<Book> result = searchSession.search( Book.class ) 
        .where( f -> f.match().field( "title" ) 
                .matching( "robot" ) )
        .fetchHits( 20 ); 

Start building the query.
Mention that the results of the query are expected to have a title field matching the value robot. If the field does not exist or cannot be searched on, an exception will be thrown.
Fetch the results, which will match the given predicate.
Or alternatively, if you don’t want to use lambdas:

Defining the predicate of a search query — object-based syntax

SearchSession searchSession = Search.session( entityManager );

SearchScope<Book> scope = searchSession.scope( Book.class );

List<Book> result = searchSession.search( scope )
        .where( scope.predicate().match().field( "title" )
                .matching( "robot" )
                .toPredicate() )
        .fetchHits( 20 );

The predicate DSL offers more predicate types, and multiple options for each type of predicate. To learn more about the match predicate, and all the other types of predicate, refer to the following sections.


matchAll: match all documents

Matching all documents

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.matchAll() )
        .fetchHits( 20 );

Matching all documents except those matching a given predicate
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.matchAll()
                .except( f.match().field( "title" )
                        .matching( "robot" ) )
        )
        .fetchHits( 20 );

id: match a document identifier
Matching a document with a given identifier

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.id().matching( 1 ) )
        .fetchHits( 20 );
Matching all documents with an identifier among a given collection
List<Integer> ids = new ArrayList<>();
ids.add( 1 );
ids.add( 2 );
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.id().matchingAny( ids ) )
        .fetchHits( 20 );

match: match a value
Matching a value

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.match().field( "title" )
                .matching( "robot" ) )
        .fetchHits( 20 );
Example 106. Matching multiple terms
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.match().field( "title" )
                .matching( "robot dawn" ) )
        .fetchHits( 20 );
For full-text fields, the value passed to matching may be a string containing multiple terms. The string will be analyzed and each term identified.
All returned hits will match at least one term of the given string. Hits matching multiple terms will have a higher score.

Matching a value in any of multiple fields

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.match()
                .field( "title" ).field( "description" )
                .matching( "robot" ) )
        .fetchHits( 20 );
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.match()
                .fields( "title", "description" )
                .matching( "robot" ) )
        .fetchHits( 20 );


Matching a text value approximately

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.match()
                .field( "title" )
                .matching( "robto" )
                .fuzzy() )
        .fetchHits( 20 );

Matching a value, analyzing it with a different analyzer

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.match()
                .field( "title_autocomplete" )
                .matching( "robo" )
                .analyzer( "autocomplete_query" ) )
        .fetchHits( 20 );

Matching a value without analyzing it

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.match()
                .field( "title" )
                .matching( "robot" )
                .skipAnalysis() )
        .fetchHits( 20 );

range: match a range of values
Matching a range of values

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.range().field( "pageCount" )
                .between( 210, 250 ) )
        .fetchHits( 20 );

Matching values equal to or greater than a given value
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.range().field( "pageCount" )
                .atLeast( 400 ) )
        .fetchHits( 20 );

Matching values strictly greater than a given value
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.range().field( "pageCount" )
                .greaterThan( 400 ) )
        .fetchHits( 20 );

Matching values equal to or less than a given value
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.range().field( "pageCount" )
                .atMost( 400 ) )
        .fetchHits( 20 );

Matching values strictly less than a given value
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.range().field( "pageCount" )
                .lessThan( 400 ) )
        .fetchHits( 20 );

Matching a range of values with explicit bound inclusion/exclusion
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.range().field( "pageCount" )
                .between(
                        200, RangeBoundInclusion.EXCLUDED,
                        250, RangeBoundInclusion.EXCLUDED
                ) )
        .fetchHits( 20 );

phrase: match a sequence of words
Matching a sequence of words

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.phrase().field( "title" )
                .matching( "robots of dawn" ) )
        .fetchHits( 20 );

Matching a sequence of words approximately
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.phrase().field( "title" )
                .matching( "dawn robot" )
                .slop( 3 ) )
        .fetchHits( 20 );

exists: match fields with content

The exists predicate, applied to a field, will match all documents for which this field has a non-null value.

Matching fields with content
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.exists().field( "comment" ) )
        .fetchHits( 20 );
There isn’t any built-in predicate to match fields with exclusively null values, but you can easily create one yourself using an exists predicate in a mustNot clause in a boolean predicate.

The exists predicate can also be applied to an object field. In that case, it will match all documents for which at least one sub-field of the object field has a non-null value.

Matching object fields with content

List<Author> hits = searchSession.search( Author.class )
        .where( f -> f.exists().field( "placeOfBirth" ) )
        .fetchHits( 20 );
Object fields need to have at least one sub-field with content in order to be considered as "existing".

Let’s consider the example above, and let’s assume the placeOfBirth object field only has one sub-field: placeOfBirth.country:

an author whose placeOfBirth is null will not match.

an author whose placeOfBirth is not null and has the country filled in will match.

an author whose placeOfBirth is not null but does not have the country filled in will not match.

Because of this, it is preferable to use the exists predicate on object fields that are known to have at least one sub-field that is never null: an identifier, a name, …​

For object fields with NESTED storage, the exists predicate needs to be wrapped in a nested predicate.

wildcard: match a simple pattern

Matching a simple pattern
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.wildcard().field( "description" )
                .matching( "rob*t" ) )
        .fetchHits( 20 );
If a normalizer has been defined on the field, the patterns used in wildcard predicates will be normalized.

If an analyzer has been defined on the field:

when using the Elasticsearch backend, the patterns won’t be analyzed nor normalized, and will be expected to match a single indexed token, not a sequence of tokens.

when using the Lucene backend the patterns will be normalized, but not tokenized: the pattern will still be expected to match a single indexed token, not a sequence of tokens.

For example, a pattern such as Cat* could match cat when targeting a field having a normalizer that applies a lowercase filter when indexing.

A pattern such as john gr* will not match anything when targeting a field that tokenizes on spaces. gr* may match, since it doesn’t include any space.

When the goal is to match user-provided query strings, the simple query string predicate should be preferred.

bool: combine predicates (or/and/…​)

Matching a document that matches any of multiple given predicates (~OR operator)
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.bool()
                .should( f.match().field( "title" )
                        .matching( "robot" ) ) 
                .should( f.match().field( "description" )
                        .matching( "investigation" ) ) 
        )
        .fetchHits( 20 ); 
The hits should have a title field matching the text robot, or they should match any other clause in the same boolean predicate.
The hits should have a description field matching the text investigation, or they should match any other clause in the same boolean predicate.
All returned hits will match at least one of the clauses above: they will have a title field matching the text robot or they will have a description field matching the text investigation.

Matching a document that matches all of multiple given predicates (~AND operator)
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.bool()
                .must( f.match().field( "title" )
                        .matching( "robot" ) ) 
                .must( f.match().field( "description" )
                        .matching( "crime" ) ) 
        )
        .fetchHits( 20 ); 
The hits must have a title field matching the text robot, independently from other clauses in the same boolean predicate.
The hits must have a description field matching the text crime, independently from other clauses in the same boolean predicate.
All returned hits will match all of the clauses above: they will have a title field matching the text robot and they will have a description field matching the text crime.

Matching a document that does not match a given predicate
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.bool()
                .must( f.match().field( "title" )
                        .matching( "robot" ) ) 
                .mustNot( f.match().field( "description" )
                        .matching( "investigation" ) ) 
        )
        .fetchHits( 20 ); 
The hits must have a title field matching the text robot, independently from other clauses in the same boolean predicate.
The hits must not have a description field matching the text investigation, independently from other clauses in the same boolean predicate.
All returned hits will match all of the clauses above: they will have a title field matching the text robot and they will not have a description field matching the text investigation.
While it is possible to execute a boolean predicate with only "negative" clauses (mustNot), performance may be disappointing because the full power of indexes cannot be leveraged in that case.

Matching a document that matches a given predicate without affecting the score
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.bool() 
                .should( f.bool() 
                        .filter( f.match().field( "genre" )
                                .matching( Genre.SCIENCE_FICTION ) ) 
                        .must( f.match().fields( "description" )
                                .matching( "crime" ) ) 
                )
                .should( f.bool() 
                        .filter( f.match().field( "genre" )
                                .matching( Genre.CRIME_FICTION ) ) 
                        .must( f.match().fields( "description" )
                                .matching( "robot" ) ) 
                )
        )
        .fetchHits( 20 ); 
Create a top-level boolean predicate, with two should clauses.
In the first should clause, create a nested boolean predicate.
Use a filter clause to require documents to have the science-fiction genre, without taking this predicate into account when scoring.
Use a must clause to require documents with the science-fiction genre to have a title field matching crime, and take this predicate into account when scoring.
In the second should clause, create a nested boolean predicate.
Use a filter clause to require documents to have the crime fiction genre, without taking this predicate into account when scoring.
Use a must clause to require documents with the crime fiction genre to have a description field matching robot, and take this predicate into account when scoring.
The score of hits will ignore the filter clauses, leading to fairer sorts if there are much more "crime fiction" documents than "science-fiction" documents.

Using optional should clauses to boost the score of some documents
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.bool()
                .must( f.match().field( "title" )
                        .matching( "robot" ) ) 
                .should( f.match().field( "description" )
                        .matching( "crime" ) ) 
                .should( f.match().field( "description" )
                        .matching( "investigation" ) ) 
        )
        .fetchHits( 20 ); 
The hits must have a title field matching the text robot, independently from other clauses in the same boolean predicate.
The hits should have a description field matching the text crime, but they may not, because matching the must clause above is enough. However, matching this should clause will improve the score of the document.
The hits should have a description field matching the text investigation, but they may not, because matching the must clause above is enough. However, matching this should clause will improve the score of the document.
All returned hits will match the must clause, and optionally the should clauses: they will have a title field matching the text robot, and the ones whose description matches either crime or investigation will have a better score.


Fine-tuning should clauses matching requirements with minimumShouldMatch
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.bool()
                .minimumShouldMatchNumber( 2 ) 
                .should( f.match().field( "description" )
                        .matching( "robot" ) ) 
                .should( f.match().field( "description" )
                        .matching( "investigation" ) ) 
                .should( f.match().field( "description" )
                        .matching( "disappearance" ) ) 
        )
        .fetchHits( 20 ); 
At least two "should" clauses must match for this boolean predicate to match.
The hits should have a description field matching the text robot.
The hits should have a description field matching the text investigate.
The hits should have a description field matching the text crime.
All returned hits will match at least two of the should clauses: their description will match either robot and investigate, robot and crime, investigate and crime, or all three of these terms.

Easily adding clauses dynamically with the lambda syntax
MySearchParameters searchParameters = getSearchParameters(); 
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.bool( b -> { 
            b.must( f.matchAll() ); 
            if ( searchParameters.getGenreFilter() != null ) { 
                b.must( f.match().field( "genre" )
                        .matching( searchParameters.getGenreFilter() ) );
            }
            if ( searchParameters.getFullTextFilter() != null ) {
                b.must( f.match().fields( "title", "description" )
                        .matching( searchParameters.getFullTextFilter() ) );
            }
            if ( searchParameters.getPageCountMaxFilter() != null ) {
                b.must( f.range().field( "pageCount" )
                        .atMost( searchParameters.getPageCountMaxFilter() ) );
            }
        } ) )
        .fetchHits( 20 ); 
Get a custom object holding the search parameters provided by the user through a web form, for example.
Call .bool(Consumer). The consumer, implemented by a lambda expression, will receive a builder as an argument and will add clauses to that builder as necessary.
By default, a boolean predicate will match nothing if there is no clause. To match every document when there is no clause, add a must clause that matches everything.
Inside the lambda, the code is free to check conditions before adding clauses. In this case, we only add clauses if the relevant parameter was filled in by the user.
The hits will match the clauses added by the lambda expression.

simpleQueryString: match a user-provided query

Matching a simple query string: AND/OR operators
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.simpleQueryString().field( "description" )
                .matching( "robots + (crime | investigation | disappearance)" ) )
        .fetchHits( 20 );

Matching a simple query string: NOT operator
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.simpleQueryString().field( "description" )
                .matching( "robots + -investigation" ) )
        .fetchHits( 20 );

Matching a simple query string: AND as default operator
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.simpleQueryString().field( "description" )
                .matching( "robots investigation" )
                .defaultOperator( BooleanOperator.AND ) )
        .fetchHits( 20 );

Matching a simple query string: prefix
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.simpleQueryString().field( "description" )
                .matching( "rob*" ) )
        .fetchHits( 20 );

Matching a simple query string: fuzzy
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.simpleQueryString().field( "description" )
                .matching( "robto~2" ) )
        .fetchHits( 20 );

Matching a simple query string: phrase
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.simpleQueryString().field( "title" )
                .matching( "\"robots of dawn\"" ) )
        .fetchHits( 20 );

Matching a simple query string: phrase with slop
List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.simpleQueryString().field( "title" )
                .matching( "\"dawn robot\"~3" ) )
        .fetchHits( 20 );

 nested: match nested documents


Matching a simple pattern

List<Book> hits = searchSession.search( Book.class )
        .where( f -> f.nested().objectField( "authors" ) 
                .nest( f.bool()
                        .must( f.match().field( "authors.firstName" )
                                .matching( "isaac" ) ) 
                        .must( f.match().field( "authors.lastName" )
                                .matching( "asimov" ) ) 
                ) )
        .fetchHits( 20 ); 
Create a nested predicate on the authors object field.
The author must have a first name matching isaac.
The author must have a last name matching asimov.
All returned hits will be books for which at least one author has a first name matching isaac and a last name matching asimov. Books that happen to have multiple authors, one of which has a first name matching isaac and another of which has a last name matching asimov, will not match.


within: match points within a circle, box, polygon

Matching points within a circle

GeoPoint center = GeoPoint.of( 53.970000, 32.150000 );
List<Author> hits = searchSession.search( Author.class )
        .where( f -> f.spatial().within().field( "placeOfBirth.coordinates" )
                .circle( center, 50, DistanceUnit.KILOMETERS ) )
        .fetchHits( 20 );


Matching points within a box
GeoBoundingBox box = GeoBoundingBox.of(
        53.99, 32.13,
        53.95, 32.17
);
List<Author> hits = searchSession.search( Author.class )
        .where( f -> f.spatial().within().field( "placeOfBirth.coordinates" )
                .boundingBox( box ) )
        .fetchHits( 20 );


Matching points within a polygon
GeoPolygon polygon = GeoPolygon.of(
        GeoPoint.of( 53.976177, 32.138627 ),
        GeoPoint.of( 53.986177, 32.148627 ),
        GeoPoint.of( 53.979177, 32.168627 ),
        GeoPoint.of( 53.876177, 32.159627 ),
        GeoPoint.of( 53.956177, 32.155627 ),
        GeoPoint.of( 53.976177, 32.138627 )
);
List<Author> hits = searchSession.search( Author.class )
        .where( f -> f.spatial().within().field( "placeOfBirth.coordinates" )
                .polygon( polygon ) )
        .fetchHits( 20 );

Lucene: fromLuceneQuery


Matching a native org.apache.lucene.search.Query
List<Book> hits = searchSession.search( Book.class )
        .extension( LuceneExtension.get() )
        .where( f -> f.fromLuceneQuery(
                new RegexpQuery( new Term( "description", "neighbor|neighbour" ) )
        ) )
        .fetchHits( 20 );

Elasticsearch: fromJson


Matching a native Elasticsearch JSON query provided as a JsonObject
JsonObject jsonObject =
        /* ... */;
List<Book> hits = searchSession.search( Book.class )
        .extension( ElasticsearchExtension.get() )
        .where( f -> f.fromJson( jsonObject ) )
        .fetchHits( 20 );
Example 142. Matching a native Elasticsearch JSON query provided as a JSON-formatted string
List<Book> hits = searchSession.search( Book.class )
        .extension( ElasticsearchExtension.get() )
        .where( f -> f.fromJson( "{"
                        + "\"regexp\": {"
                                + "\"description\": \"neighbor|neighbour\""
                        + "}"
                + "}" ) )
        .fetchHits( 20 );

No comments:

Post a Comment