Split RDF dataset into two Random datasets

I have an RDF dataset with 100M triples from the watdiv RDF benchmark. How can I split this dataset into two smaller randomly-distributed datasets, each with about 50M triples? While some triples can appear in both datasets.

What I can think of, is to order the triples dataset by the predicate, and randomly shuffle and pick out of each predicate triples.



Comments

Popular posts from this blog

Today Walkin 14th-Sept

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation