Goldburd.Basics practice questions

edited May 2024 in GLM.Basics

In the first example question, why don't we take the ln of age since it's a continuous variable and there's a log link? Is it because it doesn't specifically say to like it does in 2016 #6a? Can we do it either way? Thanks in advance!

Comments

  • In general we want to match the scale of the link-function so logging would make sense. However (p12 of the GLM text) when we log a variable the assumption is there is a linear relationship with the logged mean of the variable and our response variable. Driver age generally doesn't follow that pattern as we expect drivers at either end of the age spectrum to be worse risks. This is why we often tackle driver age with hinge functions, categorizations, or piecewise linear functions.

    That said, you can definitely log driver age and then go ahead and fit hinge functions, categorizations etc. I guess the only reason not to is you're potentially making life more complicated as you're superimposing an exponential structure on the underlying curves (see the AOI example in the section on interactions for more on this).

    Ultimately, it's best to state your assumptions. Something like "treating driver age as a continuous variable and we're not told the coefficient is for the log of the driver age so I'm not logging." Or state the GLM text says ln(driver age) isn't expected to have a linear relationship so logging isn't appropriate. Clearly stating your assumptions gives you the best chance of convincing the grader you've carefully thought about the problem.

Sign In or Register to comment.