pd.read_html(url) - awkward table design

By Ritesh Sahu - November 30, 2022

Table headings through the table are being converted into single column headings.

url = "https://www.environment.nsw.gov.au/topics/animals-and-plants/threatened-species/programs-legislation-and-framework/nsw-koala-strategy/local-government-resources-for-koala-conservation/north-coast-koala-management-area#:~:text=The%20North%20Coast%20Koala%20Management,Valley%2C%20Clarence%20Valley%20and%20Taree."
dfs = pd.read_html(url)
df = dfs[0]
df.head()

output

Be great if I could have the High preferred use as a column that assigns to the correct species. Tried reset_index() this did not work. I'm lost for searching can't find anything similar.

Response to @Master Oogway and thanks @DYZ for the edits.

The amendment suggested removes the error, but does not interact with the second table. Take White Box, Eucalyptus albens. Occurs in second table and not first. If I export dftable:

If I write htmltable to .txt when using find_all:

I have never done this before and appreciate that this is annoying. Thanks for the help so far.

It appears that find_all is gathering all the table data. But the creating of dftable is limiting to the first "table-striped".

Search This Blog

Theprogrammersfirst | A technical portal.

pd.read_html(url) - awkward table design

Comments

Post a Comment

Popular posts from this blog

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation

Today Walkin 14th-Sept