How can I extract text from a flex container?
I'm a beginner in Java and I'm attempting to extract some text from a website. The text however is between two tags and when I use getByXPath to extract the text I get everything except the text I need.
This is the layout of the website I'm scraping from: Website HTML Layout
The two highlighted portions are the pieces of text I actually need.
And this is the code I've got so far:
List<HtmlElement> name = (List<HtmlElement>) page.getByXPath("//ul/li/a[@class='title']");
List<HtmlElement> subText = (List<HtmlElement>) page.getByXPath("//ul/li/p[@data-af=' (Secret)']");
This however results in two lists:
name - which has HtmlAnchor objects within
[HtmlAnchor[<a class="title" data-af="10" href="/a180775/daddys-home-achievement">], HtmlAnchor[<a class="title" data-af="11" href="/a180776/protector-achievement">], HtmlAnchor[<a class="title" data-af="12" href="/a180777/sinclairs-solution-achievement">]]
subText - which has HtmlParagraph objects within.
[HtmlParagraph[<p data-af=" (Secret)">], HtmlParagraph[<p data-af=" (Secret)">], HtmlParagraph[<p data-af=" (Secret)">], HtmlParagraph[<p data-af=" (Secret)">]]
URL if you want to take a look at the whole website: https://truesteamachievements.com/game/BioShock-2-Remastered/achievements
I need the lists to look something like these:
["Daddy's Home", "Protector", "Sinclair's Solution"]
["Found your way back to the ruins of Rapture.", "Defended yourself against Lamb's assault in the train station.", "Joined forces with Sinclair in Ryan Amusements."]
This is the Html library I'm using : https://htmlunit.sourceforge.io/apidocs/overview-summary.html
Appreciate any help.
Comments
Post a Comment