2021-12-05

Find combination of string sentences - Combinations of frequency tables to target frequency table

I have a list of sentences, for example a list of 1000 sentences.

I would like to find a combination of sentences to match/'match closest' a certain frequency table:

[a:100, b:80, c:90, d:150, e:100, f:100, g:47, h:10 ..... z:900]

I thought about finding all possible combinations from the sentences list by using combinations like in here (so comb(1000, 1); to comb(100, 1000); ) and then compare every combination with the frequency table, so that distance is minimum. So sum all frequency tables from a possible combination and compare this sum with the target, the combination with the smallest difference with the target should be recorded. There could be multiple combinations that match closest.

The problem is that the calculation of all combinations takes way too long to complete, apparently couple of days. Is there a known algorithm that could solve this efficiently? Ideally couple of minutes maximum?

Input sentences:

More RVs were seen in the storage lot than at the campground.

She did her best to help him. There have been days when I wished to be separated from my body, but today wasn’t one of those days.

The swirled lollipop had issues with the pop rock candy.

The two walked down the slot canyon oblivious to the sound of thunder in the distance.

Acres of almond trees lined the interstate highway which complimented the crazy driving nuts.

He is no James Bond; his name is Roger Moore.

The tumbleweed refused to tumble but was more than willing to prance.

She was disgusted he couldn’t tell the difference between lemonade and > limeade.

He didn’t want to go to the dentist, yet he went anyway.

Find combination of sentences that match the following frequency table closest:

[a:5, b:5, c:5, d:5, e:5, f:5, g:5, h:5 ..... z:5]

Example:

Frequency table of sixth sentence

He is no James Bond; his name is Roger Moore.

is [a:2, e:5, g:1, h:1, i:3, j:1, m:3, n:3, o:5, r:3, s:4]

Frequency table takes upper and lower equal and excludes special characters.



from Recent Questions - Stack Overflow https://ift.tt/3xWWDja
https://ift.tt/eA8V8J

No comments:

Post a Comment