2015 Fall Question 2c
How do we classify vehicle type for levels that we eliminate? I understand how it would work if we, for instance, rolled all Trucks into the base level. But the model solutions say "eliminate." In this context, does eliminate imply that we are rolling it into the base level?
Comments
Great question and it really depends on the context. In general it's appropriate to roll into the base level so if "trucks" are being eliminated and "cars" is the base level then you could recode all trucks to read "cars" and proceed. Or you could map "cars" and "trucks" to a new base level called "cars & trucks" and proceed. Either of these are the usual approach. It depends how easy you want to make it for someone else to understand what you did.
A different approach would be if you looked at the data for "trucks" and decided its got enough data errors to merit removing those records from the data set. A suitable factor for "trucks" could then be adopted from say a competitor filing.
Hi,
A few questions on 2015 Fall 2c:
Thanks!
This is subtle. The key here is to read closely and notice there is no mention of the intercept which is usually denoted by the coefficient beta_0.
Intrinsic aliasing means the model is over-specified. We can see that here because we have a beta coefficient for every possible category within each rating variable. Therefore we can always express one of the categories as a linear combination of the others. However, this is provided we have an intercept. So assuming the model only consists of beta_1 through beta_8 then we can remove two coefficients (one from each rating variable) and introduce an intercept parameter, or we can remove one coefficient (from either of the variables) and use the remaining over-specified coefficient as the intercept.
We can make a smart choice about which variable to remove by dealing with the extrinsic aliasing where beta_3 = beta_7. Remove one of those and let the other behave as the intercept. This comes with issues though as the parameter confidence intervals vary depending on the level of exposures associated with the chosen base class.
For your second question, we remove intrinsic aliasing whenever we make a change that no longer means we can express a category in a rating variable as a linear combination of the remaining categories in that rating variable. In this question, because implicitly there's no intercept, removing any one beta will remove the intrinsic aliasing. However, there will still be other issues with the model that remain.
For your last question, it depends on how you view the term "eliminate". This could mean "reclassify to some other level which is being treated as the base level", or it could mean "drop these records from the data set". The latter is acceptable and avoids the need to reclassify but doing so too freely may limit the credibility/usefulness of the data set.
Thanks for explaining - follow up questions:
I am really confused on how intercept affect intrinsic aliasing and how to address this...
It's worth remembering that in 2022 this is a question from a paper that is not longer on the syllabus. Currently, the syllabus and GLM text indicate you should be thinking about this question in terms of correlation and multi-colinearity. Intrinsic and extrinsic aliasing aren't defined in the current readings.
Correlations between two pairs of variables means the response is picked up by both variables and the GLM gets stuck deciding how much to response to allocate to each. This is why we remove one of the correlated variables. Multi-colinearity is an extension of this where you can use a linear combination of several rating variables to describe the response of another. Aliasing is when you can write out an exact equation to do this whereas multi-colinearity is problematic even when you can't get equality but can closely approximate one variable using a linear combination of other variables.
This problem is tough to describe in part because there are multiple types of aliasing going on at once. How you deal with the extrinsic aliasing affects how you explain dealing with the intrinsic aliasing.
Let's think about the problem from a degrees of freedom aspect. We have two rating variables, each with 4 categories. To fully specify a GLM we need at most an intercept plus 2*(4-1) coefficients = 7 coefficients. However, that's without taking into account the multi-colinearity that's present. To make sure the GLM converges we need to remove the multi-colinearity. Let's look at how they do that in sample answer 1.
They notice territory D basically only contains Other vehicles. So we have beta_4 ~ beta_8. They decide to delete all records with Territory = D and Type = Other so then beta_4 = beta_8. To remove the overspecification, they get rid of beta_8 from the model. Similarly, they notice territory C only contains Trucks so beta_3 = beta_7. To remove the overspecification, they can rid of beta_7. Now they are left with the intrinsic aliasing part. They notice there is correlation between territories A and B, and vehicle type. In particular, if you're in territory A you're more likely to be a car, whereas in territory B you're more likely to be a van. This means a linear combination of beta_1 and beta_2 will be the same as a linear combination of beta_5 and beta_6. Or, rearranging, we can write say beta_6 as a linear combination of beta_1, beta_2, and beta_5. So again, the model is overspecified and we choose to remove beta_6. Now the model is appropriate as there is no intrinsic or extrinsic aliasing because we dealt with all of the multi-colinearity present.
We could have achieved the same result using the degrees of freedom approach by specifying the base class for each rating variable (removes 2 coefficients) and then looking for aliasing. We'd notice Territory D/Other and Territory C/Trucks and remove a further two coefficients to arrive at the same answer.
Thanks for explaining!
Hi - can you put that Intrinsic / Extrinisic aliasing are no longer on the syllabus in the BattleCard for this question? I only happened to notice it isn't anywhere in the current GLM paper and came here to see if it wasn't on the syllabus anymore. Otherwise would have memorized those for no reason since it's in the BattleCard answer.
Done. It's worth noting though that aliasing is still on the syllabus. The terminology and level of detail you're expected to understand has changed. I encourage you to at least read/familiarize yourself with these battlecards in case the CAS adds a twist to the exam.