2022-04-20

splitting strings by list of separators irrespective of order

I have a string text and a list names

  • I want to split text every time an element of names occurs.

text = 'Monika goes shopping. Then she rides bike. Mike likes Pizza. Monika hates me.'

names = ['Mike', 'Monika']

desired output:

output = [['Monika', ' goes shopping. Then she rides bike.'], ['Mike', ' likes Pizza.'], ['Monika', ' hates me.']]

FAQ

  • The order of the separators within names is indepentend of their occurance in text.
  • separators within names are unique but can occur multiple times throughout text. Therefore the output will have more lists than names has strings.
  • text will never have the same unique names element occuring twice consecutively/<>.
  • Ultimately I want the output to be a list of lists where each split text slice corresponds to its separator, that it was split by. Order of lists doesent matter.

re.split() wont let me use a list as a separator argument. Can I re.compile() my separator list?


help:

I think somebody has already had a similar problem here: https://stackoverflow.com/a/4697047/14648054

def split(txt, seps):
    default_sep = seps[0]
    for sep in seps[1:]: # skip seps[0] as the default separator
        txt = txt.replace(sep, default_sep)
    return [i.strip() for i in txt.split(default_sep)]

and here: https://stackoverflow.com/a/2911664/14648054

def my_split(s, seps):
    res = [s]
    for sep in seps:
        s, res = res, []
        for seq in s:
            res += seq.split(sep)
    return res

print my_split('1111  2222 3333;4444,5555;6666', [' ', ';', ','])
['1111', '', '2222', '3333', '4444', '5555', '6666']


No comments:

Post a Comment