Challenge of Regularization

Introduction

Different from China, citizens of U.S.A can own firearm legally under the regulation of state or local government as a liberty written in the constitution. However the frequent shooting accident in recent have shake the belief of gun holding to its foundation. The appeal of imposing more constraints into firearm transaction is rising.

Different people affected by different characteristics and value address different attitude towards the proposal. For example, the supporters of Democracry definitely lean to the approval of the suggestion comparing to the Republicans.

Political Affiliation

Data Description

To capture what factors effect people’s preference to the issue most, I choose 56 variables that might correlate with it from GSS(U.S. General Social Survey) and run both Lasso and elastic net method to regularize the regression model. The variables are listed below:

Variable name	Variable Label
bigbang	sci knowledge: the universe began with a huge explosion
boyorgrl	sci knowledge: father gene decides sex of baby
childs	number of children
chldidel	ideal number of children
class	subjective class identification
conarmy	confidence in military
conbiz	confidence in business and industry
conchurh	confidence in churches and religious orgs
concong	confidence in us congress
concourt	confidence in courts and legal system
condrift	sci knowledge: the continents have been moving
conschls	confidence in schools and educ system
divorce	ever been divorced or separated
earthsun	sci knowledge: the earth goes around the sun
electron	sci knowledge: electrons are smaller than atoms
evolved	sci knowledge: human beings developed from animals
finalter	change in financial situation
god	r’s confidence in the existence of god
grass	should marijuana be made legal
helpoth	to help others
hlthdays	days of activity limitation past 30 days
homosex1	is homosexual sex wrong?
hotcore	sci knowledge: the center of earth is very hot
id	respondent id number
intecon	interested in economic issues
intintl	interested in international issues
inttech	interested in technologies
makefrnd	religion helps people to make friends
mntlhlth	days of poor mental health past 30 days
obey	to obey
othersex	r had sex with some other last year
othlang	can r speak language other than english/spanish
owngun	have gun in home
popular	to be well liked or popular
racwork	racial makeup of workplace
radioact	sci knowledge: the center of earth is very hot
rank	r’s self ranking of social position
realinc	family income in constant $
realrinc	r’s income in constant $
relpersn	r consider self a religious person
rowngun	does gun belong to r
satfin	satisfaction with financial situation
sexeduc	sex education in public schools
sexornt	sexual orientation
socbar	spend evening at bar
socfrend	spend evening with friends
socommun	spend evening with neighbor
socrel	spend evening with relatives
solarrev	sci knowledge: how long the earth goes around the sun
sprtprsn	r consider self a spiritual person
thnkself	to think for ones self
trustsci	we trust too much in science
viruses	sci knowledge: antibiotics kill viruses as well as bacteria
workhard	to work hard
xmarsex1	is extramarital sex wrong?
year	year for this respondent

These variables are manually picked from 1064 variables which are related to human’s characteristics and background. I expected that the difference in those variables will lead to significant dissention. For instance, Ceteris paribus, parents may approve the constraints to avoid their child being threaten by probable shooting if they have more offsprings. A high-class gentle ment may incline to decrease the risk of gunshot wound. However, a data set containing 56 variables are over cumbersome for regression. Thus, I impose the Lasso and elastic net method for regularization.

Raw data and scripts are attached here

Data processing

Since this assignment focuses on the regularization, the process of data cleaning is skipped here, which is operated with STATA.

library(glmnet)
library(AER)
rm(list = ls())
setwd("D:/All courses/Data Analysis for Economics(Microeconometrics)/HW4")
data<-read.csv("hw4.csv")

#split into training and test data sets
n<-nrow(data)
train_pos<-sample(n,round(n/2))
train<-data[train_pos,]
test<-data[-train_pos,]

#logistic regression
logit<-glm(gunlaw~.,train,family='binomial',control=list(maxit=100))
coeftest(logit)

z test of coefficients:

           Estimate  Std. Error z value  Pr(>|z|)    
(Intercept)  1.0964e+01  3.6123e+00  3.0353 0.0024029 ** 
bigbang      7.0992e-01  4.6875e-01  1.5145 0.1298990    
boyorgrl    -3.7576e-01  3.3289e-01 -1.1288 0.2589957    
childs      -8.3269e-02  8.7675e-02 -0.9497 0.3422448    
chldidel     1.7834e-01  1.0345e-01  1.7239 0.0847286 .  
class       -2.1467e-01  2.3340e-01 -0.9197 0.3577116    
conarmy     -2.2725e-01  2.8863e-01 -0.7873 0.4310932    
conbiz      -1.5100e-02  3.2533e-01 -0.0464 0.9629798    
conchurh     1.2749e-01  2.5209e-01  0.5058 0.6130273    
concong      1.8584e-01  2.8695e-01  0.6476 0.5172267    
concourt    -1.0101e+00  2.8656e-01 -3.5248 0.0004238 ***
condrift    -1.2049e+00  4.9991e-01 -2.4103 0.0159410 *  
conschls    -2.3614e-01  2.6782e-01 -0.8817 0.3779250    
divorce      3.7545e-02  1.6120e-01  0.2329 0.8158302    
earthsun     3.5836e-01  5.8541e-01  0.6122 0.5404370    
electron     4.1608e-01  3.2078e-01  1.2971 0.1945935    
evolved     -6.1902e-01  4.3868e-01 -1.4111 0.1582125    
finalter     7.4788e-02  1.4234e-01  0.5254 0.5992864    
god         -2.9511e-01  1.1203e-01 -2.6343 0.0084321 ** 
grass       -5.7406e-01  3.0865e-01 -1.8599 0.0628944 .  
helpoth      6.3852e-02  2.1485e-01  0.2972 0.7663207    
hlthdays     2.0931e-02  4.5265e-02  0.4624 0.6437917    
homosex1    -1.2441e-01  1.7702e-01 -0.7028 0.4821653    
hotcore      2.7305e-02  6.0172e-01  0.0454 0.9638056    
id          -1.2267e-04  2.0988e-04 -0.5845 0.5589075    
intecon      2.0959e-01  3.4136e-01  0.6140 0.5392253    
intintl     -1.5646e-01  3.1153e-01 -0.5022 0.6154992    
inttech     -3.5710e-01  3.5624e-01 -1.0024 0.3161514    
makefrnd     1.3120e-01  2.5020e-01  0.5244 0.6000042    
mntlhlth     2.2605e-02  2.4234e-02  0.9328 0.3509400    
obey         1.0665e-01  1.7131e-01  0.6225 0.5336035    
othersex    -8.0252e-01  2.4054e-01 -3.3364 0.0008487 ***
othlang     -3.1363e-01  2.9871e-01 -1.0499 0.2937450    
owngun       2.9981e-01  2.5591e-01  1.1715 0.2413875    
popular      9.7919e-02  2.3130e-01  0.4233 0.6720471    
racwork      1.4458e-01  1.1384e-01  1.2701 0.2040450    
radioact    -6.8292e-01  3.3044e-01 -2.0667 0.0387629 *  
rank         5.5001e-02  7.1463e-02  0.7696 0.4415116    
realinc      1.2851e-05  7.0784e-06  1.8156 0.0694380 .  
realrinc    -1.3818e-05  6.5172e-06 -2.1202 0.0339897 *  
relpersn    -3.1808e-01  1.8741e-01 -1.6972 0.0896543 .  
rowngun     -2.2877e-01  1.8780e-01 -1.2182 0.2231533    
satfin       1.8878e-01  1.9551e-01  0.9655 0.3342744    
sexeduc      1.5789e-01  6.8143e-01  0.2317 0.8167655    
sexornt     -1.2067e+00  6.7942e-01 -1.7761 0.0757244 .  
socbar      -3.1598e-01  1.4114e-01 -2.2388 0.0251719 *  
socfrend     2.7524e-01  1.6013e-01  1.7189 0.0856402 .  
socommun     9.2997e-02  1.0928e-01  0.8510 0.3947801    
socrel      -3.0281e-01  1.3105e-01 -2.3107 0.0208506 *  
solarrev    -3.2621e-02  2.1055e-01 -0.1549 0.8768752    
sprtprsn    -7.0377e-02  1.5131e-01 -0.4651 0.6418579    
thnkself     2.5489e-01  1.9263e-01  1.3233 0.1857497    
trustsci0   -5.0583e+00  2.6751e+00 -1.8909 0.0586375 .  
trustsci1    1.8300e+00  1.7832e+00  1.0262 0.3047742    
trustsci2    1.4006e-01  1.6628e+00  0.0842 0.9328736    
trustsci3    1.2898e+00  1.6788e+00  0.7683 0.4422983    
trustsci4    4.4581e-01  1.7148e+00  0.2600 0.7948823    
trustsci5   -4.5711e-01  1.7766e+00 -0.2573 0.7969491    
viruses      4.1056e-01  3.4308e-01  1.1967 0.2314251    
workhard     5.8777e-02  2.1220e-01  0.2770 0.7817946    
xmarsex1     4.2989e-01  3.1604e-01  1.3602 0.1737565    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#predict on test data
pre=predict(logit,test,type='response')
logitpred<-round(pre)
table(logitpred,test$gunlaw,dnn=c("predicted","Disagree"))

         Disagree
predicted   0   1
        0  31  63
        1  86 257
        
logiterr<-1-mean(logitpred==test$gunlaw)
logiterr

[1] 0.3409611

Then I regularize it with Lasso approach:

#Regularized by LASSO
# Fit on training data
x<-model.matrix(gunlaw~.,train) 
y<-train$gunlaw
lassofit.all<-glmnet(x,y,alpha=1,family="binomial") 
plot(lassofit.all,xvar="lambda")

Lasso

# Cross Validation
cv.lasso <- cv.glmnet(x,y,alpha=1,family="binomial") 
plot(cv.lasso)

CV of Lasso

lambda.star <- cv.lasso$lambda.min
lassofit.star <- glmnet(x,y,alpha=1,lambda=lambda.star,family="binomial") 
coef(lassofit.star)

63 x 1 sparse Matrix of class "dgCMatrix"
                   s0
(Intercept)  2.818170e+00
(Intercept)  .           
bigbang      .           
boyorgrl     .           
childs       .           
chldidel     3.383098e-04
class        .           
conarmy      .           
conbiz       .           
conchurh     .           
concong      .           
concourt    -1.634249e-01
condrift     .           
conschls     .           
divorce      .           
earthsun     1.564713e-02
electron     .           
evolved      .           
finalter     .           
god         -3.370955e-02
grass       -2.575110e-01
helpoth      .           
hlthdays     6.726614e-03
homosex1     .           
hotcore      .           
id          -1.210066e-04
intecon      .           
intintl      .           
inttech      .           
makefrnd     .           
mntlhlth     8.770455e-03
obey         .           
othersex    -4.821947e-01
othlang     -2.180727e-01
owngun       1.441947e-01
popular      .           
racwork      7.066309e-02
radioact     .           
rank         .           
realinc      .           
realrinc    -2.394779e-06
relpersn     .           
rowngun     -8.385165e-02
satfin       .           
sexeduc      .           
sexornt     -3.966880e-01
socbar       .           
socfrend     8.685264e-04
socommun     .           
socrel       .           
solarrev     .           
sprtprsn    -6.448268e-02
thnkself     .           
trustsci0    .           
trustsci1    1.989689e-01
trustsci2    .           
trustsci3    3.114173e-01
trustsci4    .           
trustsci5    .           
viruses      .           
workhard     .           
xmarsex1     1.111911e-01
year         .           

# Predict on test data
newx <- model.matrix(gunlaw ~.,test)
lassopred <- predict(lassofit.star,newx,type="class")
table(lassopred,test$gunlaw,dnn=c("predicted","Disagree"))

         Disagree
predicted   0   1
        0   5  17
        1 112 303

lassoerr <- 1-mean(lassopred==test$gunlaw)
lassoerr

[1] 0.2951945

Furthermore, I regularize it via elatic net approach:

#Regularized by Elastic Net
x<-model.matrix(gunlaw~.,train)
y<-train$gunlaw
elastic<-glmnet(x,y,alpha=0.3,family="binomial") 
plot(elastic,xvar="lambda")

Elastic net

#Cross Validation
cv.elastic<-cv.glmnet(x,y,alpha=0.3,family="binomial")
plot(cv.elastic)

CV of Elastic net

lambda.star_elastic<-cv.elastic$lambda.min
elastic.star<-glmnet(x,y,alpha=0.3,lambda=lambda.star_elastic,family="binomial")
coef(elastic.star)

63 x 1 sparse Matrix of class "dgCMatrix"
                       s0
(Intercept)  2.395657e+00
(Intercept)  .           
bigbang      .           
boyorgrl     .           
childs       .           
chldidel     2.064755e-03
class        .           
conarmy      .           
conbiz       .           
conchurh     .           
concong      .           
concourt    -1.001474e-01
condrift     .           
conschls     .           
divorce      .           
earthsun     1.525546e-02
electron     .           
evolved      .           
finalter     .           
god         -3.376082e-02
grass       -1.576088e-01
helpoth      .           
hlthdays     8.515723e-03
homosex1     .           
hotcore      .           
id          -1.170116e-04
intecon      .           
intintl      .           
inttech      .           
makefrnd     .           
mntlhlth     8.226933e-03
obey         .           
othersex    -4.064984e-01
othlang     -2.006813e-01
owngun       1.458341e-01
popular      .           
racwork      6.459718e-02
radioact     .           
rank         .           
realinc      .           
realrinc    -2.167317e-06
relpersn     .           
rowngun     -9.000636e-02
satfin       3.608882e-03
sexeduc      .           
sexornt     -3.233116e-01
socbar       .           
socfrend     4.087481e-03
socommun     .           
socrel       .           
solarrev     .           
sprtprsn    -5.726824e-02
thnkself     .           
trustsci0    .           
trustsci1    1.593161e-01
trustsci2    .           
trustsci3    2.629314e-01
trustsci4    .           
trustsci5    .           
viruses      .           
workhard     .           
xmarsex1     7.945114e-02
year         .          

#Predict on the test data
newx<-model.matrix(gunlaw~.,test)
elasticpred<-predict(elastic.star,newx,type='class')
table(elasticpred,test$gunlaw,dnn=c("Predicted","Disagree"))

         Disagree
Predicted   0   1
        0   5  13
        1 112 307

elasticerr<-1-mean(elasticpred==test$gunlaw)
elasticerr

[1] 0.2860412

Conclusion

From result above, with these two approaches we can draw an illustration about people who are more incline to support the restriction of private firearm: a gun owner with more expected children, more rational and know more about science . On the contrary people who have more confidence in courts and legal system are in the opposite site. The impact of income of the respondents and their family surprisingly is not signifcant in this process, which deviates from our previous expectation.

Reference

Model_Selection_and_Regularization. Jiaming Mao

Challenge(Regularization)

Assignment 4

Challenge of Regularization

Introduction

Data Description

Data processing

Conclusion

Reference

CATALOG

FEATURED TAGS

FRIENDS