Logistic model validation
Hi, you mention in the GLM Validation section, "Recall a logistic regression model produces an estimate of the probability that an event occurs. Quantile plots, Lorenz curves, and the Gini index can be computed in this situation by ordering the predicted probabilities in increasing order." Do you have an example of creating a Lorenz curve for a logistic model? For quantile plots, I can see taking the averages within a bucket of the zeroes and ones (yes/no) and finding an average probability within a bucket, but can't see how you could create a Lorenz curve with the data as zeroes and ones plotted against increasing probabilities. would you need to group them into buckets? I would love an example if you have one!
Comments
Apologies: I found the answer in the text: "a Lorenz curve can be created by sorting the records by
predicted probability and graphing cumulative risks against cumulative occurrences
of the event, and a Gini index can be computed from the resulting graph by taking
the area between the curve and the line of equality." would still love an example if you have one!
You're fine. We'll work on clarifying the wiki to make it more clear how to do this type of problem.
We've added a new problem to the GLM PowerPack file which we think is one way the CAS could test this. It's also available here in PDF format: https://battleacts8.ca/8/pdf/GLM_LogisticLorenz.pdf
Please let us know what you think of this problem. Is there enough detail in the Excel file? What would you prefer to see in the accompanying PDF? The PDF will eventually end up in the OneStop file for quick reference needs.
Apologies: I found the answer in the text: "a Lorenz curve can be created by sorting the records by
predicted probability and graphing cumulative risks against cumulative occurrences
of the event, and a Gini index can be computed from the resulting graph by taking
the area between the curve and the line of equality." would still love an example if you have one!
Thank you! Yes that was clear. I ended up translating the true=1, false=0 and then cumulating % of loss (after sorting) = 0 or 1 /23 (total losses) and ended up with the same numbers and graph that looked reasonable:
Thank you!