2022-11-30

pd.read_html(url) - awkward table design

Table headings through the table are being converted into single column headings.

url = "https://www.environment.nsw.gov.au/topics/animals-and-plants/threatened-species/programs-legislation-and-framework/nsw-koala-strategy/local-government-resources-for-koala-conservation/north-coast-koala-management-area#:~:text=The%20North%20Coast%20Koala%20Management,Valley%2C%20Clarence%20Valley%20and%20Taree."
dfs = pd.read_html(url)
df = dfs[0]
df.head()

output

Be great if I could have the High preferred use as a column that assigns to the correct species. Tried reset_index() this did not work. I'm lost for searching can't find anything similar.

Response to @Master Oogway and thanks @DYZ for the edits.

Screen shot inspect element - multiple class ="table-striped"

The amendment suggested removes the error, but does not interact with the second table. Take White Box, Eucalyptus albens. Occurs in second table and not first. If I export dftable: Filter no White Box

If I write htmltable to .txt when using find_all:

enter image description here

I have never done this before and appreciate that this is annoying. Thanks for the help so far.

It appears that find_all is gathering all the table data. But the creating of dftable is limiting to the first "table-striped".



No comments:

Post a Comment