2021-05-30

How to work with continious overdispered data

I have troubles with analysis of my data. I analyze cross-sectional data about users activities and spendings from mobile game . I have paying and non-paying users, I need to explain what independent varibles (such as time spend in the game, session lenght, clan (binary), number of messages in chat and etc.) can explain higher spendings in group of paying users. The data was pre-transformed, so instead of spendings in $ I can analyze only variables from 0 to 1 (1 is max spendings for sample, other variables vere divided by max value). I filtered dataset and got 12k observations with paying users. But, the data is overdispersed, and generally looks like negative exponential. I`ve tried to apply log transformation for my dependent variable (spendings) and for indepenent(time spend, session lenght and chat messages because they were skewd) and run linear regression, but R2 was only 0.09, which is quite low. Also, I tried to rond my data and multiply in order to use negative binomial regression. Could I do it because initially data was count? What else could I try?

Thank you in advance!

P.S I also tried to devide users by fix % of overall spendings (10% and got groups with 6000 user, 1000 user, 500 user and so on), and it seemed resonable because each group had higher time spend in the game, session lenght, number of messages and etc.



from Recent Questions - Stack Overflow https://ift.tt/3p19ZpQ
https://ift.tt/eA8V8J

No comments:

Post a Comment