2022-06-19

Split RDF dataset into two Random datasets

I have an RDF dataset with 100M triples from the watdiv RDF benchmark. How can I split this dataset into two smaller randomly-distributed datasets, each with about 50M triples? While some triples can appear in both datasets.

What I can think of, is to order the triples dataset by the predicate, and randomly shuffle and pick out of each predicate triples.



No comments:

Post a Comment