Split RDF dataset into two Random datasets
I have an RDF dataset with 100M triples from the watdiv RDF benchmark. How can I split this dataset into two smaller randomly-distributed datasets, each with about 50M triples? While some triples can appear in both datasets.
What I can think of, is to order the triples dataset by the predicate, and randomly shuffle and pick out of each predicate triples.
Comments
Post a Comment