Challenge(Regularization)

Assignment 4

Posted by Li Zichao on April 21, 2019

Challenge of Regularization

Introduction

Different from China, citizens of U.S.A can own firearm legally under the regulation of state or local government as a liberty written in the constitution. However the frequent shooting accident in recent have shake the belief of gun holding to its foundation. The appeal of imposing more constraints into firearm transaction is rising.

Different people affected by different characteristics and value address different attitude towards the proposal. For example, the supporters of Democracry definitely lean to the approval of the suggestion comparing to the Republicans.

Political Affiliation

Data Description

To capture what factors effect people’s preference to the issue most, I choose 56 variables that might correlate with it from GSS(U.S. General Social Survey) and run both Lasso and elastic net method to regularize the regression model. The variables are listed below:

Variable name Variable Label
bigbang sci knowledge: the universe began with a huge explosion
boyorgrl sci knowledge: father gene decides sex of baby
childs number of children
chldidel ideal number of children
class subjective class identification
conarmy confidence in military
conbiz confidence in business and industry
conchurh confidence in churches and religious orgs
concong confidence in us congress
concourt confidence in courts and legal system
condrift sci knowledge: the continents have been moving
conschls confidence in schools and educ system
divorce ever been divorced or separated
earthsun sci knowledge: the earth goes around the sun
electron sci knowledge: electrons are smaller than atoms
evolved sci knowledge: human beings developed from animals
finalter change in financial situation
god r’s confidence in the existence of god
grass should marijuana be made legal
helpoth to help others
hlthdays days of activity limitation past 30 days
homosex1 is homosexual sex wrong?
hotcore sci knowledge: the center of earth is very hot
id respondent id number
intecon interested in economic issues
intintl interested in international issues
inttech interested in technologies
makefrnd religion helps people to make friends
mntlhlth days of poor mental health past 30 days
obey to obey
othersex r had sex with some other last year
othlang can r speak language other than english/spanish
owngun have gun in home
popular to be well liked or popular
racwork racial makeup of workplace
radioact sci knowledge: the center of earth is very hot
rank r’s self ranking of social position
realinc family income in constant $
realrinc r’s income in constant $
relpersn r consider self a religious person
rowngun does gun belong to r
satfin satisfaction with financial situation
sexeduc sex education in public schools
sexornt sexual orientation
socbar spend evening at bar
socfrend spend evening with friends
socommun spend evening with neighbor
socrel spend evening with relatives
solarrev sci knowledge: how long the earth goes around the sun
sprtprsn r consider self a spiritual person
thnkself to think for ones self
trustsci we trust too much in science
viruses sci knowledge: antibiotics kill viruses as well as bacteria
workhard to work hard
xmarsex1 is extramarital sex wrong?
year year for this respondent

These variables are manually picked from 1064 variables which are related to human’s characteristics and background. I expected that the difference in those variables will lead to significant dissention. For instance, Ceteris paribus, parents may approve the constraints to avoid their child being threaten by probable shooting if they have more offsprings. A high-class gentle ment may incline to decrease the risk of gunshot wound. However, a data set containing 56 variables are over cumbersome for regression. Thus, I impose the Lasso and elastic net method for regularization.

Raw data and scripts are attached here

Data processing

Since this assignment focuses on the regularization, the process of data cleaning is skipped here, which is operated with STATA.

library(glmnet)
library(AER)
rm(list = ls())
setwd("D:/All courses/Data Analysis for Economics(Microeconometrics)/HW4")
data<-read.csv("hw4.csv")

#split into training and test data sets
n<-nrow(data)
train_pos<-sample(n,round(n/2))
train<-data[train_pos,]
test<-data[-train_pos,]

#logistic regression
logit<-glm(gunlaw~.,train,family='binomial',control=list(maxit=100))
coeftest(logit)

z test of coefficients:

           Estimate  Std. Error z value  Pr(>|z|)    
(Intercept)  1.0964e+01  3.6123e+00  3.0353 0.0024029 ** 
bigbang      7.0992e-01  4.6875e-01  1.5145 0.1298990    
boyorgrl    -3.7576e-01  3.3289e-01 -1.1288 0.2589957    
childs      -8.3269e-02  8.7675e-02 -0.9497 0.3422448    
chldidel     1.7834e-01  1.0345e-01  1.7239 0.0847286 .  
class       -2.1467e-01  2.3340e-01 -0.9197 0.3577116    
conarmy     -2.2725e-01  2.8863e-01 -0.7873 0.4310932    
conbiz      -1.5100e-02  3.2533e-01 -0.0464 0.9629798    
conchurh     1.2749e-01  2.5209e-01  0.5058 0.6130273    
concong      1.8584e-01  2.8695e-01  0.6476 0.5172267    
concourt    -1.0101e+00  2.8656e-01 -3.5248 0.0004238 ***
condrift    -1.2049e+00  4.9991e-01 -2.4103 0.0159410 *  
conschls    -2.3614e-01  2.6782e-01 -0.8817 0.3779250    
divorce      3.7545e-02  1.6120e-01  0.2329 0.8158302    
earthsun     3.5836e-01  5.8541e-01  0.6122 0.5404370    
electron     4.1608e-01  3.2078e-01  1.2971 0.1945935    
evolved     -6.1902e-01  4.3868e-01 -1.4111 0.1582125    
finalter     7.4788e-02  1.4234e-01  0.5254 0.5992864    
god         -2.9511e-01  1.1203e-01 -2.6343 0.0084321 ** 
grass       -5.7406e-01  3.0865e-01 -1.8599 0.0628944 .  
helpoth      6.3852e-02  2.1485e-01  0.2972 0.7663207    
hlthdays     2.0931e-02  4.5265e-02  0.4624 0.6437917    
homosex1    -1.2441e-01  1.7702e-01 -0.7028 0.4821653    
hotcore      2.7305e-02  6.0172e-01  0.0454 0.9638056    
id          -1.2267e-04  2.0988e-04 -0.5845 0.5589075    
intecon      2.0959e-01  3.4136e-01  0.6140 0.5392253    
intintl     -1.5646e-01  3.1153e-01 -0.5022 0.6154992    
inttech     -3.5710e-01  3.5624e-01 -1.0024 0.3161514    
makefrnd     1.3120e-01  2.5020e-01  0.5244 0.6000042    
mntlhlth     2.2605e-02  2.4234e-02  0.9328 0.3509400    
obey         1.0665e-01  1.7131e-01  0.6225 0.5336035    
othersex    -8.0252e-01  2.4054e-01 -3.3364 0.0008487 ***
othlang     -3.1363e-01  2.9871e-01 -1.0499 0.2937450    
owngun       2.9981e-01  2.5591e-01  1.1715 0.2413875    
popular      9.7919e-02  2.3130e-01  0.4233 0.6720471    
racwork      1.4458e-01  1.1384e-01  1.2701 0.2040450    
radioact    -6.8292e-01  3.3044e-01 -2.0667 0.0387629 *  
rank         5.5001e-02  7.1463e-02  0.7696 0.4415116    
realinc      1.2851e-05  7.0784e-06  1.8156 0.0694380 .  
realrinc    -1.3818e-05  6.5172e-06 -2.1202 0.0339897 *  
relpersn    -3.1808e-01  1.8741e-01 -1.6972 0.0896543 .  
rowngun     -2.2877e-01  1.8780e-01 -1.2182 0.2231533    
satfin       1.8878e-01  1.9551e-01  0.9655 0.3342744    
sexeduc      1.5789e-01  6.8143e-01  0.2317 0.8167655    
sexornt     -1.2067e+00  6.7942e-01 -1.7761 0.0757244 .  
socbar      -3.1598e-01  1.4114e-01 -2.2388 0.0251719 *  
socfrend     2.7524e-01  1.6013e-01  1.7189 0.0856402 .  
socommun     9.2997e-02  1.0928e-01  0.8510 0.3947801    
socrel      -3.0281e-01  1.3105e-01 -2.3107 0.0208506 *  
solarrev    -3.2621e-02  2.1055e-01 -0.1549 0.8768752    
sprtprsn    -7.0377e-02  1.5131e-01 -0.4651 0.6418579    
thnkself     2.5489e-01  1.9263e-01  1.3233 0.1857497    
trustsci0   -5.0583e+00  2.6751e+00 -1.8909 0.0586375 .  
trustsci1    1.8300e+00  1.7832e+00  1.0262 0.3047742    
trustsci2    1.4006e-01  1.6628e+00  0.0842 0.9328736    
trustsci3    1.2898e+00  1.6788e+00  0.7683 0.4422983    
trustsci4    4.4581e-01  1.7148e+00  0.2600 0.7948823    
trustsci5   -4.5711e-01  1.7766e+00 -0.2573 0.7969491    
viruses      4.1056e-01  3.4308e-01  1.1967 0.2314251    
workhard     5.8777e-02  2.1220e-01  0.2770 0.7817946    
xmarsex1     4.2989e-01  3.1604e-01  1.3602 0.1737565    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#predict on test data
pre=predict(logit,test,type='response')
logitpred<-round(pre)
table(logitpred,test$gunlaw,dnn=c("predicted","Disagree"))

         Disagree
predicted   0   1
        0  31  63
        1  86 257
        
logiterr<-1-mean(logitpred==test$gunlaw)
logiterr

[1] 0.3409611

Then I regularize it with Lasso approach:

#Regularized by LASSO
# Fit on training data
x<-model.matrix(gunlaw~.,train) 
y<-train$gunlaw
lassofit.all<-glmnet(x,y,alpha=1,family="binomial") 
plot(lassofit.all,xvar="lambda")

Lasso

# Cross Validation
cv.lasso <- cv.glmnet(x,y,alpha=1,family="binomial") 
plot(cv.lasso)

CV of Lasso

lambda.star <- cv.lasso$lambda.min
lassofit.star <- glmnet(x,y,alpha=1,lambda=lambda.star,family="binomial") 
coef(lassofit.star)

63 x 1 sparse Matrix of class "dgCMatrix"
                   s0
(Intercept)  2.818170e+00
(Intercept)  .           
bigbang      .           
boyorgrl     .           
childs       .           
chldidel     3.383098e-04
class        .           
conarmy      .           
conbiz       .           
conchurh     .           
concong      .           
concourt    -1.634249e-01
condrift     .           
conschls     .           
divorce      .           
earthsun     1.564713e-02
electron     .           
evolved      .           
finalter     .           
god         -3.370955e-02
grass       -2.575110e-01
helpoth      .           
hlthdays     6.726614e-03
homosex1     .           
hotcore      .           
id          -1.210066e-04
intecon      .           
intintl      .           
inttech      .           
makefrnd     .           
mntlhlth     8.770455e-03
obey         .           
othersex    -4.821947e-01
othlang     -2.180727e-01
owngun       1.441947e-01
popular      .           
racwork      7.066309e-02
radioact     .           
rank         .           
realinc      .           
realrinc    -2.394779e-06
relpersn     .           
rowngun     -8.385165e-02
satfin       .           
sexeduc      .           
sexornt     -3.966880e-01
socbar       .           
socfrend     8.685264e-04
socommun     .           
socrel       .           
solarrev     .           
sprtprsn    -6.448268e-02
thnkself     .           
trustsci0    .           
trustsci1    1.989689e-01
trustsci2    .           
trustsci3    3.114173e-01
trustsci4    .           
trustsci5    .           
viruses      .           
workhard     .           
xmarsex1     1.111911e-01
year         .           

# Predict on test data
newx <- model.matrix(gunlaw ~.,test)
lassopred <- predict(lassofit.star,newx,type="class")
table(lassopred,test$gunlaw,dnn=c("predicted","Disagree"))

         Disagree
predicted   0   1
        0   5  17
        1 112 303

lassoerr <- 1-mean(lassopred==test$gunlaw)
lassoerr

[1] 0.2951945

Furthermore, I regularize it via elatic net approach:

#Regularized by Elastic Net
x<-model.matrix(gunlaw~.,train)
y<-train$gunlaw
elastic<-glmnet(x,y,alpha=0.3,family="binomial") 
plot(elastic,xvar="lambda")

Elastic net

#Cross Validation
cv.elastic<-cv.glmnet(x,y,alpha=0.3,family="binomial")
plot(cv.elastic)

CV of Elastic net

lambda.star_elastic<-cv.elastic$lambda.min
elastic.star<-glmnet(x,y,alpha=0.3,lambda=lambda.star_elastic,family="binomial")
coef(elastic.star)

63 x 1 sparse Matrix of class "dgCMatrix"
                       s0
(Intercept)  2.395657e+00
(Intercept)  .           
bigbang      .           
boyorgrl     .           
childs       .           
chldidel     2.064755e-03
class        .           
conarmy      .           
conbiz       .           
conchurh     .           
concong      .           
concourt    -1.001474e-01
condrift     .           
conschls     .           
divorce      .           
earthsun     1.525546e-02
electron     .           
evolved      .           
finalter     .           
god         -3.376082e-02
grass       -1.576088e-01
helpoth      .           
hlthdays     8.515723e-03
homosex1     .           
hotcore      .           
id          -1.170116e-04
intecon      .           
intintl      .           
inttech      .           
makefrnd     .           
mntlhlth     8.226933e-03
obey         .           
othersex    -4.064984e-01
othlang     -2.006813e-01
owngun       1.458341e-01
popular      .           
racwork      6.459718e-02
radioact     .           
rank         .           
realinc      .           
realrinc    -2.167317e-06
relpersn     .           
rowngun     -9.000636e-02
satfin       3.608882e-03
sexeduc      .           
sexornt     -3.233116e-01
socbar       .           
socfrend     4.087481e-03
socommun     .           
socrel       .           
solarrev     .           
sprtprsn    -5.726824e-02
thnkself     .           
trustsci0    .           
trustsci1    1.593161e-01
trustsci2    .           
trustsci3    2.629314e-01
trustsci4    .           
trustsci5    .           
viruses      .           
workhard     .           
xmarsex1     7.945114e-02
year         .          

#Predict on the test data
newx<-model.matrix(gunlaw~.,test)
elasticpred<-predict(elastic.star,newx,type='class')
table(elasticpred,test$gunlaw,dnn=c("Predicted","Disagree"))

         Disagree
Predicted   0   1
        0   5  13
        1 112 307

elasticerr<-1-mean(elasticpred==test$gunlaw)
elasticerr

[1] 0.2860412

Conclusion

From result above, with these two approaches we can draw an illustration about people who are more incline to support the restriction of private firearm: a gun owner with more expected children, more rational and know more about science . On the contrary people who have more confidence in courts and legal system are in the opposite site. The impact of income of the respondents and their family surprisingly is not signifcant in this process, which deviates from our previous expectation.

Reference

Model_Selection_and_Regularization. Jiaming Mao