MarkLogic: Find documents where at least one parent element does not have a particular child
Using the built-in MarkLogic cts functions, I'd like to be able to write a query which can find a document like the following -- where exists at least 1 element that does not have a <child1>
element (parents/parent[2]
). BUT I do not want to exclude documents which have the <child1>
element (parents/parent[1]
or parents/parent[3]
) from the search results.
<doc>
<root>
<parents>
<parent>
<child1>someValue</child1>
<child2>someValue</child2>
</parent>
<parent>
<child2>someValue</child2>
</parent>
<parent>
<child1>someValue</child1>
<child2>someValue</child2>
</parent>
</parents>
</root>
</doc>
My thought process was that simply negating the following would return what I'm searching for:
Positive xQuery:
let $query :=
cts:element-query(
xs:QName('parent')
,cts:element-query(
xs:QName('child1')
,cts:true-query()
)
)
return cts:search(fn:doc(),$query)
or using the search module:
xquery version "1.0-ml";
import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy";
let $options :=
<options xmlns="http://marklogic.com/appservices/search">
<extract-document-data selected="include">
<extract-path xmlns:es="http://marklogic.com/entity-services">//root</extract-path>
</extract-document-data>
<additional-query>
<cts:element-query>
<cts:element>parent</cts:element>
<cts:element-query>
<cts:element>child1</cts:element>
<cts:true-query>
</cts:true-query>
</cts:element-query>
</cts:element-query>
</additional-query>
</options>
return search:search("",$options)
Leading to my attempted query:
Negative xQuery:
let $query :=
cts:not-query(
cts:element-query(
XS:QName('parent')
,cts:element-query(
XS:QName('child1')
,cts:true-query()
)
)
)
return cts:search(fn:doc(),$query)
Upon further evaluation though, it's clear why the "Negative" query does not evaluate as I'd expect...The positive query returns documents where the path //parent/child1
exists... the opposite of this is "return documents where //parent/child1
does not exist"...
Nonetheless, I am perplexed how to do this in an efficient way utilizing MarkLogic's cts functions. This database harvests billions of documents, vanilla xquery/xpath will be time consuming. I'm really hoping to achieve this using the search module/api -- Please keep in mind (despite my search module example above) that to run this query I'm hoping to make it via an api call to the search REST endpoint, so I will not be able to enhance the server side search with xQuery. Although if it can only be achieved using pure xQuery, it is what it is and I can just use the eval REST endpoint.
While looking for information I did come across this similar post from 6 years ago: search-xmls-which-do-not-have-particular-element-in-marklogic
But it has been a fair amount of time since that was asked, its tagged for marklogic-8, and my question differs to a good degree since I'm hoping to achieve this with the out of the box search module/api.
Comments
Post a Comment