2023-09-05

MarkLogic: Find documents where at least one parent element does not have a particular child

Using the built-in MarkLogic cts functions, I'd like to be able to write a query which can find a document like the following -- where exists at least 1 element that does not have a <child1> element (parents/parent[2]). BUT I do not want to exclude documents which have the <child1> element (parents/parent[1] or parents/parent[3]) from the search results.

<doc>
   <root>
      <parents>
         <parent>
            <child1>someValue</child1>
            <child2>someValue</child2>
         </parent>
         <parent>
            <child2>someValue</child2>
         </parent>
         <parent>
            <child1>someValue</child1>
            <child2>someValue</child2>
         </parent>
      </parents>
   </root>
</doc>

My thought process was that simply negating the following would return what I'm searching for:

Positive xQuery:

let $query :=
cts:element-query(
   xs:QName('parent')
   ,cts:element-query(
      xs:QName('child1')
      ,cts:true-query()
      )
   )
return cts:search(fn:doc(),$query)

or using the search module:

xquery version "1.0-ml";
import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy";

let $options := 
<options xmlns="http://marklogic.com/appservices/search">
        
  <extract-document-data selected="include">
    <extract-path xmlns:es="http://marklogic.com/entity-services">//root</extract-path>
  </extract-document-data>

  <additional-query>
      <cts:element-query>
        <cts:element>parent</cts:element>
          <cts:element-query>
            <cts:element>child1</cts:element>
            <cts:true-query>
            </cts:true-query>
        </cts:element-query>
      </cts:element-query>
  </additional-query>
  
</options>
return search:search("",$options)

Leading to my attempted query:

Negative xQuery:

let $query :=
cts:not-query(
   cts:element-query(
      XS:QName('parent')
      ,cts:element-query(
         XS:QName('child1')
         ,cts:true-query()
         )
      )
   )
return cts:search(fn:doc(),$query)

Upon further evaluation though, it's clear why the "Negative" query does not evaluate as I'd expect...The positive query returns documents where the path //parent/child1 exists... the opposite of this is "return documents where //parent/child1 does not exist"...

Nonetheless, I am perplexed how to do this in an efficient way utilizing MarkLogic's cts functions. This database harvests billions of documents, vanilla xquery/xpath will be time consuming. I'm really hoping to achieve this using the search module/api -- Please keep in mind (despite my search module example above) that to run this query I'm hoping to make it via an api call to the search REST endpoint, so I will not be able to enhance the server side search with xQuery. Although if it can only be achieved using pure xQuery, it is what it is and I can just use the eval REST endpoint.

While looking for information I did come across this similar post from 6 years ago: search-xmls-which-do-not-have-particular-element-in-marklogic

But it has been a fair amount of time since that was asked, its tagged for marklogic-8, and my question differs to a good degree since I'm hoping to achieve this with the out of the box search module/api.



No comments:

Post a Comment