Drop certain rows based on quantity of rows with specific values

I am newer data science and am working on a project to analyze sports statistics. I have a dataset of hockey statistics for a group of players over multiple seasons. Players have anywhere between 1 row to 12 rows representing their season statistics over however many seasons they've played.

Example:

    Player  Season  Pos GP  G   A   P   +/- PIM P/GP    ... PPG PPP SHG SHP OTG GWG S   S%  TOI/GP  FOW%
0   Nathan MacKinnon    2022    1   65  32  56  88  22  42  1.35    ... 7   27  0   0   1   5   299 10.7    21.07   45.4
1   Nathan MacKinnon    2021    1   48  20  45  65  22  37  1.35    ... 8   25  0   0   0   2   206 9.7 20.37   48.5
2   Nathan MacKinnon    2020    1   69  35  58  93  13  12  1.35    ... 12  31  0   0   2   4   318 11.0    21.22   43.1
3   Nathan MacKinnon    2019    1   82  41  58  99  20  34  1.21    ... 12  37  0   0   1   6   365 11.2    22.08   43.7
4   Nathan MacKinnon    2018    1   74  39  58  97  11  55  1.31    ... 12  32  0   1   3   12  284 13.7    19.90   41.9
5   Nathan MacKinnon    2017    1   82  16  37  53  -14 16  0.65    ... 2   14  2   2   2   4   251 6.4 19.95   50.6
6   Nathan MacKinnon    2016    1   72  21  31  52  -4  20  0.72    ... 7   16  0   1   0   6   245 8.6 18.87   48.4
7   Nathan MacKinnon    2015    1   64  14  24  38  -7  34  0.59    ... 3   7   0   0   0   2   192 7.3 17.05   47.0
8   Nathan MacKinnon    2014    1   82  24  39  63  20  26  0.77    ... 8   17  0   0   0   5   241 10.0    17.35   42.9
9   J.T. Compher        2022    2   70  18  15  33  6   25  0.47    ... 4   6   1   1   0   0   102 17.7    16.32   51.4
10  J.T. Compher        2021    2   48  10  8   18  10  19  0.38    ... 1   2   0   0   0   2   47  21.3    14.22   45.9
11  J.T. Compher        2020    2   67  11  20  31  9   18  0.46    ... 1   5   0   3   1   3   106 10.4    16.75   47.7
12  J.T. Compher        2019    2   66  16  16  32  -8  31  0.48    ... 4   9   3   3   0   3   118 13.6    17.48   49.2
13  J.T. Compher        2018    2   69  13  10  23  -29 20  0.33    ... 4   7   2   2   2   3   131 9.9 16.00   45.1
14  J.T. Compher        2017    2   21  3   2   5   0   4   0.24    ... 1   1   0   0   0   1   30  10.0    14.93   47.6
15  Darren Helm         2022    1   68  7   8   15  -5  14  0.22    ... 0   0   1   2   0   1   93  7.5 10.55   44.2
16  Darren Helm         2021    1   47  3   5   8   -3  10  0.17    ... 0   0   0   0   0   0   83  3.6 14.68   66.7
17  Darren Helm         2020    1   68  9   7   16  -6  37  0.24    ... 0   0   1   2   0   0   102 8.8 13.73   53.6
18  Darren Helm         2019    1   61  7   10  17  -11 20  0.28    ... 0   0   1   4   0   0   107 6.5 14.57   44.4
19  Darren Helm         2018    1   75  13  18  31  3   39  0.41    ... 0   0   2   4   0   0   141 9.2 15.57   44.1

[sample of my dataset][1] [1]: https://ift.tt/M4Sz5la

If any player has played more than 6 seasons, I want to drop the row corresponding to Season 2021. This is because COVID drastically shortened the season and it is causing issues as I work with averages.

As you can see from the screenshot, Nathan MacKinnon has played 9 seasons. Across those 9 seasons, except for 2021, he plays in no fewer than 64 games. Due to the shortened season of 2021, he only got 48 games. Removing Season 2021 results in an Average Games Played of 73.75. Keeping Season 2021 in the data, the Average Games Played becomes 70.89.

While not drastic, it compounds into the other metrics as well.

I have been trying this for a little while now, but as I mentioned, I am new to this world and am struggling to figure out how to accomplish this.

I don't want to just completely drop ALL rows for 2021 across all players, though, as some players only have 1-5 years' worth of data and for those players, I need to use as much data as I can and remove 1 row from a player with only 2 seasons would also negatively skew averages.

I would really appreciate some assistance from anyone more experienced than me!



Comments

Popular posts from this blog

Today Walkin 14th-Sept

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation