How can I get #text child node as element with beautifulSoup? - Python

I want to get every part of inner text of parsed <p> tag as soup-element with beautifulSoup in Python. Im currently migrating the parser from php to python. Here is some code on php and my tryings of recreating functional in Python beautifulSoup:

PHP (that working)

foreach($pTagNode->childNodes as $innerNode){
    if($innerNode->nodeName == "#text"){
        # Editing and parahrasing text part of <p> tag...
    }
    else if($innerNode->nodeName == "a"){
        # Do something with "a" tag, like removing blacklisted link or chaning text...
    }
}

PYTHON (that doesnt)

node = soup.select("p")[0]

# <a> tag
for pnode in node.select("a"):
    print("link found: " + pnode.string");
# <#text> tag
for pnode in node.select("#text"):
    print("text found: " + pnode.string) # This message doesnt shown :(

HTML structure I want to parse:

...
<body>
    <p>Some text 1 and this is <a href="">the link</p>
    <p>Some text 2 and this is <a href="">the another link</p>
    <p>Some text 3 and this is <a href="">the link 3</p>
</body>

I want to get from HTML: [Some text 1 and this is ][the link]

I am looking for a way how I can get #text as an element. For example, php has DomXPath that allows you to do this. Does anyone have any ideas? If something else is needed, I can supplement this question.



from Recent Questions - Stack Overflow https://ift.tt/39HZLEa
https://ift.tt/eA8V8J

Comments

Popular posts from this blog

Today Walkin 14th-Sept

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation