Challenge of Regularization
Introduction
Different from China, citizens of U.S.A can own firearm legally under the regulation of state or local government as a liberty written in the constitution. However the frequent shooting accident in recent have shake the belief of gun holding to its foundation. The appeal of imposing more constraints into firearm transaction is rising.
Different people affected by different characteristics and value address different attitude towards the proposal. For example, the supporters of Democracry definitely lean to the approval of the suggestion comparing to the Republicans.
Data Description
To capture what factors effect people’s preference to the issue most, I choose 56 variables that might correlate with it from GSS(U.S. General Social Survey) and run both Lasso and elastic net method to regularize the regression model. The variables are listed below:
Variable name | Variable Label |
---|---|
bigbang | sci knowledge: the universe began with a huge explosion |
boyorgrl | sci knowledge: father gene decides sex of baby |
childs | number of children |
chldidel | ideal number of children |
class | subjective class identification |
conarmy | confidence in military |
conbiz | confidence in business and industry |
conchurh | confidence in churches and religious orgs |
concong | confidence in us congress |
concourt | confidence in courts and legal system |
condrift | sci knowledge: the continents have been moving |
conschls | confidence in schools and educ system |
divorce | ever been divorced or separated |
earthsun | sci knowledge: the earth goes around the sun |
electron | sci knowledge: electrons are smaller than atoms |
evolved | sci knowledge: human beings developed from animals |
finalter | change in financial situation |
god | r’s confidence in the existence of god |
grass | should marijuana be made legal |
helpoth | to help others |
hlthdays | days of activity limitation past 30 days |
homosex1 | is homosexual sex wrong? |
hotcore | sci knowledge: the center of earth is very hot |
id | respondent id number |
intecon | interested in economic issues |
intintl | interested in international issues |
inttech | interested in technologies |
makefrnd | religion helps people to make friends |
mntlhlth | days of poor mental health past 30 days |
obey | to obey |
othersex | r had sex with some other last year |
othlang | can r speak language other than english/spanish |
owngun | have gun in home |
popular | to be well liked or popular |
racwork | racial makeup of workplace |
radioact | sci knowledge: the center of earth is very hot |
rank | r’s self ranking of social position |
realinc | family income in constant $ |
realrinc | r’s income in constant $ |
relpersn | r consider self a religious person |
rowngun | does gun belong to r |
satfin | satisfaction with financial situation |
sexeduc | sex education in public schools |
sexornt | sexual orientation |
socbar | spend evening at bar |
socfrend | spend evening with friends |
socommun | spend evening with neighbor |
socrel | spend evening with relatives |
solarrev | sci knowledge: how long the earth goes around the sun |
sprtprsn | r consider self a spiritual person |
thnkself | to think for ones self |
trustsci | we trust too much in science |
viruses | sci knowledge: antibiotics kill viruses as well as bacteria |
workhard | to work hard |
xmarsex1 | is extramarital sex wrong? |
year | year for this respondent |
These variables are manually picked from 1064 variables which are related to human’s characteristics and background. I expected that the difference in those variables will lead to significant dissention. For instance, Ceteris paribus, parents may approve the constraints to avoid their child being threaten by probable shooting if they have more offsprings. A high-class gentle ment may incline to decrease the risk of gunshot wound. However, a data set containing 56 variables are over cumbersome for regression. Thus, I impose the Lasso and elastic net method for regularization.
Raw data and scripts are attached here
Data processing
Since this assignment focuses on the regularization, the process of data cleaning is skipped here, which is operated with STATA.
library(glmnet)
library(AER)
rm(list = ls())
setwd("D:/All courses/Data Analysis for Economics(Microeconometrics)/HW4")
data<-read.csv("hw4.csv")
#split into training and test data sets
n<-nrow(data)
train_pos<-sample(n,round(n/2))
train<-data[train_pos,]
test<-data[-train_pos,]
#logistic regression
logit<-glm(gunlaw~.,train,family='binomial',control=list(maxit=100))
coeftest(logit)
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.0964e+01 3.6123e+00 3.0353 0.0024029 **
bigbang 7.0992e-01 4.6875e-01 1.5145 0.1298990
boyorgrl -3.7576e-01 3.3289e-01 -1.1288 0.2589957
childs -8.3269e-02 8.7675e-02 -0.9497 0.3422448
chldidel 1.7834e-01 1.0345e-01 1.7239 0.0847286 .
class -2.1467e-01 2.3340e-01 -0.9197 0.3577116
conarmy -2.2725e-01 2.8863e-01 -0.7873 0.4310932
conbiz -1.5100e-02 3.2533e-01 -0.0464 0.9629798
conchurh 1.2749e-01 2.5209e-01 0.5058 0.6130273
concong 1.8584e-01 2.8695e-01 0.6476 0.5172267
concourt -1.0101e+00 2.8656e-01 -3.5248 0.0004238 ***
condrift -1.2049e+00 4.9991e-01 -2.4103 0.0159410 *
conschls -2.3614e-01 2.6782e-01 -0.8817 0.3779250
divorce 3.7545e-02 1.6120e-01 0.2329 0.8158302
earthsun 3.5836e-01 5.8541e-01 0.6122 0.5404370
electron 4.1608e-01 3.2078e-01 1.2971 0.1945935
evolved -6.1902e-01 4.3868e-01 -1.4111 0.1582125
finalter 7.4788e-02 1.4234e-01 0.5254 0.5992864
god -2.9511e-01 1.1203e-01 -2.6343 0.0084321 **
grass -5.7406e-01 3.0865e-01 -1.8599 0.0628944 .
helpoth 6.3852e-02 2.1485e-01 0.2972 0.7663207
hlthdays 2.0931e-02 4.5265e-02 0.4624 0.6437917
homosex1 -1.2441e-01 1.7702e-01 -0.7028 0.4821653
hotcore 2.7305e-02 6.0172e-01 0.0454 0.9638056
id -1.2267e-04 2.0988e-04 -0.5845 0.5589075
intecon 2.0959e-01 3.4136e-01 0.6140 0.5392253
intintl -1.5646e-01 3.1153e-01 -0.5022 0.6154992
inttech -3.5710e-01 3.5624e-01 -1.0024 0.3161514
makefrnd 1.3120e-01 2.5020e-01 0.5244 0.6000042
mntlhlth 2.2605e-02 2.4234e-02 0.9328 0.3509400
obey 1.0665e-01 1.7131e-01 0.6225 0.5336035
othersex -8.0252e-01 2.4054e-01 -3.3364 0.0008487 ***
othlang -3.1363e-01 2.9871e-01 -1.0499 0.2937450
owngun 2.9981e-01 2.5591e-01 1.1715 0.2413875
popular 9.7919e-02 2.3130e-01 0.4233 0.6720471
racwork 1.4458e-01 1.1384e-01 1.2701 0.2040450
radioact -6.8292e-01 3.3044e-01 -2.0667 0.0387629 *
rank 5.5001e-02 7.1463e-02 0.7696 0.4415116
realinc 1.2851e-05 7.0784e-06 1.8156 0.0694380 .
realrinc -1.3818e-05 6.5172e-06 -2.1202 0.0339897 *
relpersn -3.1808e-01 1.8741e-01 -1.6972 0.0896543 .
rowngun -2.2877e-01 1.8780e-01 -1.2182 0.2231533
satfin 1.8878e-01 1.9551e-01 0.9655 0.3342744
sexeduc 1.5789e-01 6.8143e-01 0.2317 0.8167655
sexornt -1.2067e+00 6.7942e-01 -1.7761 0.0757244 .
socbar -3.1598e-01 1.4114e-01 -2.2388 0.0251719 *
socfrend 2.7524e-01 1.6013e-01 1.7189 0.0856402 .
socommun 9.2997e-02 1.0928e-01 0.8510 0.3947801
socrel -3.0281e-01 1.3105e-01 -2.3107 0.0208506 *
solarrev -3.2621e-02 2.1055e-01 -0.1549 0.8768752
sprtprsn -7.0377e-02 1.5131e-01 -0.4651 0.6418579
thnkself 2.5489e-01 1.9263e-01 1.3233 0.1857497
trustsci0 -5.0583e+00 2.6751e+00 -1.8909 0.0586375 .
trustsci1 1.8300e+00 1.7832e+00 1.0262 0.3047742
trustsci2 1.4006e-01 1.6628e+00 0.0842 0.9328736
trustsci3 1.2898e+00 1.6788e+00 0.7683 0.4422983
trustsci4 4.4581e-01 1.7148e+00 0.2600 0.7948823
trustsci5 -4.5711e-01 1.7766e+00 -0.2573 0.7969491
viruses 4.1056e-01 3.4308e-01 1.1967 0.2314251
workhard 5.8777e-02 2.1220e-01 0.2770 0.7817946
xmarsex1 4.2989e-01 3.1604e-01 1.3602 0.1737565
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#predict on test data
pre=predict(logit,test,type='response')
logitpred<-round(pre)
table(logitpred,test$gunlaw,dnn=c("predicted","Disagree"))
Disagree
predicted 0 1
0 31 63
1 86 257
logiterr<-1-mean(logitpred==test$gunlaw)
logiterr
[1] 0.3409611
Then I regularize it with Lasso approach:
#Regularized by LASSO
# Fit on training data
x<-model.matrix(gunlaw~.,train)
y<-train$gunlaw
lassofit.all<-glmnet(x,y,alpha=1,family="binomial")
plot(lassofit.all,xvar="lambda")
# Cross Validation
cv.lasso <- cv.glmnet(x,y,alpha=1,family="binomial")
plot(cv.lasso)
lambda.star <- cv.lasso$lambda.min
lassofit.star <- glmnet(x,y,alpha=1,lambda=lambda.star,family="binomial")
coef(lassofit.star)
63 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept) 2.818170e+00
(Intercept) .
bigbang .
boyorgrl .
childs .
chldidel 3.383098e-04
class .
conarmy .
conbiz .
conchurh .
concong .
concourt -1.634249e-01
condrift .
conschls .
divorce .
earthsun 1.564713e-02
electron .
evolved .
finalter .
god -3.370955e-02
grass -2.575110e-01
helpoth .
hlthdays 6.726614e-03
homosex1 .
hotcore .
id -1.210066e-04
intecon .
intintl .
inttech .
makefrnd .
mntlhlth 8.770455e-03
obey .
othersex -4.821947e-01
othlang -2.180727e-01
owngun 1.441947e-01
popular .
racwork 7.066309e-02
radioact .
rank .
realinc .
realrinc -2.394779e-06
relpersn .
rowngun -8.385165e-02
satfin .
sexeduc .
sexornt -3.966880e-01
socbar .
socfrend 8.685264e-04
socommun .
socrel .
solarrev .
sprtprsn -6.448268e-02
thnkself .
trustsci0 .
trustsci1 1.989689e-01
trustsci2 .
trustsci3 3.114173e-01
trustsci4 .
trustsci5 .
viruses .
workhard .
xmarsex1 1.111911e-01
year .
# Predict on test data
newx <- model.matrix(gunlaw ~.,test)
lassopred <- predict(lassofit.star,newx,type="class")
table(lassopred,test$gunlaw,dnn=c("predicted","Disagree"))
Disagree
predicted 0 1
0 5 17
1 112 303
lassoerr <- 1-mean(lassopred==test$gunlaw)
lassoerr
[1] 0.2951945
Furthermore, I regularize it via elatic net approach:
#Regularized by Elastic Net
x<-model.matrix(gunlaw~.,train)
y<-train$gunlaw
elastic<-glmnet(x,y,alpha=0.3,family="binomial")
plot(elastic,xvar="lambda")
#Cross Validation
cv.elastic<-cv.glmnet(x,y,alpha=0.3,family="binomial")
plot(cv.elastic)
lambda.star_elastic<-cv.elastic$lambda.min
elastic.star<-glmnet(x,y,alpha=0.3,lambda=lambda.star_elastic,family="binomial")
coef(elastic.star)
63 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept) 2.395657e+00
(Intercept) .
bigbang .
boyorgrl .
childs .
chldidel 2.064755e-03
class .
conarmy .
conbiz .
conchurh .
concong .
concourt -1.001474e-01
condrift .
conschls .
divorce .
earthsun 1.525546e-02
electron .
evolved .
finalter .
god -3.376082e-02
grass -1.576088e-01
helpoth .
hlthdays 8.515723e-03
homosex1 .
hotcore .
id -1.170116e-04
intecon .
intintl .
inttech .
makefrnd .
mntlhlth 8.226933e-03
obey .
othersex -4.064984e-01
othlang -2.006813e-01
owngun 1.458341e-01
popular .
racwork 6.459718e-02
radioact .
rank .
realinc .
realrinc -2.167317e-06
relpersn .
rowngun -9.000636e-02
satfin 3.608882e-03
sexeduc .
sexornt -3.233116e-01
socbar .
socfrend 4.087481e-03
socommun .
socrel .
solarrev .
sprtprsn -5.726824e-02
thnkself .
trustsci0 .
trustsci1 1.593161e-01
trustsci2 .
trustsci3 2.629314e-01
trustsci4 .
trustsci5 .
viruses .
workhard .
xmarsex1 7.945114e-02
year .
#Predict on the test data
newx<-model.matrix(gunlaw~.,test)
elasticpred<-predict(elastic.star,newx,type='class')
table(elasticpred,test$gunlaw,dnn=c("Predicted","Disagree"))
Disagree
Predicted 0 1
0 5 13
1 112 307
elasticerr<-1-mean(elasticpred==test$gunlaw)
elasticerr
[1] 0.2860412
Conclusion
From result above, with these two approaches we can draw an illustration about people who are more incline to support the restriction of private firearm: a gun owner with more expected children, more rational and know more about science . On the contrary people who have more confidence in courts and legal system are in the opposite site. The impact of income of the respondents and their family surprisingly is not signifcant in this process, which deviates from our previous expectation.