2015 Fall Question 2c

edited May 2024 in GLM.Basics

How do we classify vehicle type for levels that we eliminate? I understand how it would work if we, for instance, rolled all Trucks into the base level. But the model solutions say "eliminate." In this context, does eliminate imply that we are rolling it into the base level?

Comments

  • Great question and it really depends on the context. In general it's appropriate to roll into the base level so if "trucks" are being eliminated and "cars" is the base level then you could recode all trucks to read "cars" and proceed. Or you could map "cars" and "trucks" to a new base level called "cars & trucks" and proceed. Either of these are the usual approach. It depends how easy you want to make it for someone else to understand what you did.

    A different approach would be if you looked at the data for "trucks" and decided its got enough data errors to merit removing those records from the data set. A suitable factor for "trucks" could then be adopted from say a competitor filing.

  • Hi,

    A few questions on 2015 Fall 2c:

    1. the examiner report for Part states "To address the intrinsic aliasing, the candidate needed to remove one additional parameter or remove two parameters (one territory and one vehicle type) and introduce an intercept". If we remove one additional parameter only (Sample solution 1), how does it remove intrinsic aliasing since Beta 4 is still = 1 – Beta 1 – Beta 2 – Beta 3?
    2. Also, is it true if we remove any parameter will address intrinsic aliasing? Sample solution 1 eliminate Vehicle Class = Van so the model is uniquely defined, will it address intrinsic aliasing if I remove say beta 1/2/3/4/5?
    3. From your comments above, you mentioned that the levels being removed are rolled into the base level. In sample solution 1, an intercept is not introduced, so if there is no base level, how do we classify the levels that are eliminated?

    Thanks!

  • This is subtle. The key here is to read closely and notice there is no mention of the intercept which is usually denoted by the coefficient beta_0.

    Intrinsic aliasing means the model is over-specified. We can see that here because we have a beta coefficient for every possible category within each rating variable. Therefore we can always express one of the categories as a linear combination of the others. However, this is provided we have an intercept. So assuming the model only consists of beta_1 through beta_8 then we can remove two coefficients (one from each rating variable) and introduce an intercept parameter, or we can remove one coefficient (from either of the variables) and use the remaining over-specified coefficient as the intercept.

    We can make a smart choice about which variable to remove by dealing with the extrinsic aliasing where beta_3 = beta_7. Remove one of those and let the other behave as the intercept. This comes with issues though as the parameter confidence intervals vary depending on the level of exposures associated with the chosen base class.

    For your second question, we remove intrinsic aliasing whenever we make a change that no longer means we can express a category in a rating variable as a linear combination of the remaining categories in that rating variable. In this question, because implicitly there's no intercept, removing any one beta will remove the intrinsic aliasing. However, there will still be other issues with the model that remain.

    For your last question, it depends on how you view the term "eliminate". This could mean "reclassify to some other level which is being treated as the base level", or it could mean "drop these records from the data set". The latter is acceptable and avoids the need to reclassify but doing so too freely may limit the credibility/usefulness of the data set.

  • Thanks for explaining - follow up questions:

    1. On your 2nd paragraph, "Therefore we can always express one of the categories as a linear combination of the others. However, this is provided we have an intercept." Do you mean in this question we have intercept so intrinsic aliasing occurs where beta 4 = 1 – Beta 1 – Beta 2 – Beta 3? Shouldn't there be a beta_0 if there is intercept? You also mentioned in 4th paragraph that in this question there is no intercept.
    2. Would you mind explaining why removing two coefficients and introducing intercept will address intrinsic aliasing?
    3. Sample solution 1 only has beta 1-5 (total 5 covariates) without intercept, how does this remove intrinsic aliasing?

    I am really confused on how intercept affect intrinsic aliasing and how to address this...

  • It's worth remembering that in 2022 this is a question from a paper that is not longer on the syllabus. Currently, the syllabus and GLM text indicate you should be thinking about this question in terms of correlation and multi-colinearity. Intrinsic and extrinsic aliasing aren't defined in the current readings.

    Correlations between two pairs of variables means the response is picked up by both variables and the GLM gets stuck deciding how much to response to allocate to each. This is why we remove one of the correlated variables. Multi-colinearity is an extension of this where you can use a linear combination of several rating variables to describe the response of another. Aliasing is when you can write out an exact equation to do this whereas multi-colinearity is problematic even when you can't get equality but can closely approximate one variable using a linear combination of other variables.

    This problem is tough to describe in part because there are multiple types of aliasing going on at once. How you deal with the extrinsic aliasing affects how you explain dealing with the intrinsic aliasing.

    Let's think about the problem from a degrees of freedom aspect. We have two rating variables, each with 4 categories. To fully specify a GLM we need at most an intercept plus 2*(4-1) coefficients = 7 coefficients. However, that's without taking into account the multi-colinearity that's present. To make sure the GLM converges we need to remove the multi-colinearity. Let's look at how they do that in sample answer 1.

    They notice territory D basically only contains Other vehicles. So we have beta_4 ~ beta_8. They decide to delete all records with Territory = D and Type = Other so then beta_4 = beta_8. To remove the overspecification, they get rid of beta_8 from the model. Similarly, they notice territory C only contains Trucks so beta_3 = beta_7. To remove the overspecification, they can rid of beta_7. Now they are left with the intrinsic aliasing part. They notice there is correlation between territories A and B, and vehicle type. In particular, if you're in territory A you're more likely to be a car, whereas in territory B you're more likely to be a van. This means a linear combination of beta_1 and beta_2 will be the same as a linear combination of beta_5 and beta_6. Or, rearranging, we can write say beta_6 as a linear combination of beta_1, beta_2, and beta_5. So again, the model is overspecified and we choose to remove beta_6. Now the model is appropriate as there is no intrinsic or extrinsic aliasing because we dealt with all of the multi-colinearity present.

    We could have achieved the same result using the degrees of freedom approach by specifying the base class for each rating variable (removes 2 coefficients) and then looking for aliasing. We'd notice Territory D/Other and Territory C/Trucks and remove a further two coefficients to arrive at the same answer.

  • Thanks for explaining!

  • Hi - can you put that Intrinsic / Extrinisic aliasing are no longer on the syllabus in the BattleCard for this question? I only happened to notice it isn't anywhere in the current GLM paper and came here to see if it wasn't on the syllabus anymore. Otherwise would have memorized those for no reason since it's in the BattleCard answer.

  • Done. It's worth noting though that aliasing is still on the syllabus. The terminology and level of detail you're expected to understand has changed. I encourage you to at least read/familiarize yourself with these battlecards in case the CAS adds a twist to the exam.

Sign In or Register to comment.