How can I get #text child node as element with beautifulSoup? - Python
I want to get every part of inner text of parsed <p>
tag as soup-element with beautifulSoup in Python. Im currently migrating the parser from php to python. Here is some code on php and my tryings of recreating functional in Python beautifulSoup:
PHP (that working)
foreach($pTagNode->childNodes as $innerNode){
if($innerNode->nodeName == "#text"){
# Editing and parahrasing text part of <p> tag...
}
else if($innerNode->nodeName == "a"){
# Do something with "a" tag, like removing blacklisted link or chaning text...
}
}
PYTHON (that doesnt)
node = soup.select("p")[0]
# <a> tag
for pnode in node.select("a"):
print("link found: " + pnode.string");
# <#text> tag
for pnode in node.select("#text"):
print("text found: " + pnode.string) # This message doesnt shown :(
HTML structure I want to parse:
...
<body>
<p>Some text 1 and this is <a href="">the link</p>
<p>Some text 2 and this is <a href="">the another link</p>
<p>Some text 3 and this is <a href="">the link 3</p>
</body>
I want to get from HTML: [Some text 1 and this is ][the link]
I am looking for a way how I can get #text as an element. For example, php has DomXPath that allows you to do this. Does anyone have any ideas? If something else is needed, I can supplement this question.
from Recent Questions - Stack Overflow https://ift.tt/39HZLEa
https://ift.tt/eA8V8J
Comments
Post a Comment