Setting Base Levels for Categorical Variable
Hi. I'm looking at the GLM_DesignMatrix.pdf.
I picked the B1 base level to be [Shutters: Yes] in the number of claims table (there are no exposures given so I just used this) there 49 claims with shutters and 41 with no shutters. Likewise, I selected [distance />25] because there were 53 claims />25 miles and 37 claims /<25 miles.
My design matrix ended up looking exactly like the solution, but the solution says the following:
Alice: "It's important you follow the given description of the parameters because this tells you the base levels. Here it's implicit the base levels are: 1. Homes with no hurricane shutters 2. Homes less than or equal to 25 miles from the coast. Remember the base level is usually the one with the most exposures. This makes sense here as people tend to live close to the coast and not always have hurricane shutters."
I'm confused because the solution doesn't seem to match the above statement. Any clarification would be helpful. Thanks.
Comments
Ok. I reread the statement from Alice above and the question description gives what the base levels should be, but the part quoted below still doesn't make sense to me.
1. Homes with no hurricane shutters 2. Homes less than or equal to 25 miles from the coast. Remember the base level is usually the one with the most exposures. This makes sense here as people tend to live close to the coast and not always have hurricane shutters."
It still seems to contradict what's given in the problem.
Ok. I reread the statement from Alice above and the question description gives what the base levels should be, but the part quoted below still doesn't make sense to me.
I think I see where you're coming from. There is ambiguity about what we're defining here as an exposure. In a pure premium model the response variable we're looking at loss $ / earned house years but in a severity model we're looking at loss $ / claims so one may consider the the number of claims to be the exposure base instead of the number of earned house years.
The part quoted below is assuming the base levels were chosen by looking at the number of earned house years rather than the number of claims. More people tend to live by the coast and/or not have hurricane shutters (at the moment). The number of claims pattern doesn't follow our earned house year exposure base intuition but this could be due to random volatility.
Ultimately, in the exam, if the base level isn't specified I would state your choice of exposure base. Using the number of claims as the exposure base in a severity model is fine because the greater the number of claims the lower the variance/smaller the confidence intervals. That said, you could justify using earned house years for determining base levels to reduce random volatility and to make it easier to combine with a separate frequency model to get a collective pure premium model.