How to get an index for a certain sentance in python using nltk?
So I have a problem to find sentances containing certain words from text and outputting those sentances with their indexes (I mean sentance number in a text)
Using NLTK library I made my text to separate on sentances and outup certain I need:
Code:
from nltk.tokenize import sent_tokenize, word_tokenize
text = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."
search_words = ["Ipsum", "Aldus"]
matches = []
sentances = sent_tokenize(text)
for word in search_words:
for sentance in sentances:
if word in sentance:
matches.append(sentance)
print(matches)
Also using len I got overall sentances' number, But I can't make them output their indexes, when I trying to use .index:
index = sentances.index(matches)
print(index)
If anybody know how to resolve it?
I've tried to get indexes of certain sentances
Comments
Post a Comment