Title: | Multivariate and Propensity Score Matching with Balance Optimization |
---|---|
Description: | Provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm. A variety of univariate and multivariate metrics to determine if balance has been obtained are also provided. For details, see the paper by Jasjeet Sekhon (2007, <doi:10.18637/jss.v042.i07>). |
Authors: | Jasjeet Singh Sekhon [aut, cre], Theo Saarinen [aut] |
Maintainer: | Jasjeet Singh Sekhon <[email protected]> |
License: | GPL-3 |
Version: | 4.10-15 |
Built: | 2024-12-13 05:58:27 UTC |
Source: | https://github.com/jasjeetsekhon/matching |
This function provides a number of univariate balance metrics.
Generally, users should call MatchBalance
and not this function
directly.
balanceUV(Tr, Co, weights = rep(1, length(Co)), exact = FALSE, ks=FALSE, nboots = 1000, paired=TRUE, match=FALSE, weights.Tr=rep(1,length(Tr)), weights.Co=rep(1,length(Co)), estimand="ATT")
balanceUV(Tr, Co, weights = rep(1, length(Co)), exact = FALSE, ks=FALSE, nboots = 1000, paired=TRUE, match=FALSE, weights.Tr=rep(1,length(Tr)), weights.Co=rep(1,length(Co)), estimand="ATT")
Tr |
A vector containing the treatment observations. |
Co |
A vector containing the control observations. |
weights |
A vector containing the observation specific weights. Only use this option when the treatment and control observations are paired (as they are after matching). |
exact |
A logical flag indicating if the exact Wilcoxon test
should be used instead of the test with a correction. See
|
ks |
A logical flag for if the univariate bootstrap
Kolmogorov-Smirnov (KS) test should be calculated. If the ks option
is set to true, the univariate KS test is calculated for all
non-dichotomous variables. The bootstrap KS test is consistent even
for non-continuous variables. See |
nboots |
The number of bootstrap samples to be run for the
|
paired |
A flag for if the paired |
match |
A flag for if the |
weights.Tr |
A vector of weights for the treated observations. |
weights.Co |
A vector of weights for the control observations. |
estimand |
This determines if the standardized mean difference
returned by the |
sdiff |
This is the standardized difference between the treated
and control units multiplied by 100. That is, 100 times the mean
difference between treatment and control units divided by the standard
deviation of the treatment
observations alone if the estimand is either |
sdiff.pooled |
This is the standardized difference between the treated and control units multiplied by 100 using the pooled variance. That is, 100 times the mean difference between treatment and control units divided by the pooled standard deviation as in Rosenbaum and Rubin (1985). |
mean.Tr |
The mean of the treatment group. |
mean.Co |
The mean of the control group. |
var.Tr |
The variance of the treatment group. |
var.Co |
The variance of the control group. |
p.value |
The p-value from the two-sided weighted |
var.ratio |
var.Tr/var.Co. |
ks |
The object returned by |
tt |
The object returned by two-sided weighted
|
qqsummary |
The return object from a call to
|
qqsummary.raw |
The return object from a call to
|
Jasjeet S. Sekhon, UC Berkeley, [email protected], https://www.jsekhon.com.
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization.” Journal of Statistical Software 42(7): 1-52. doi:10.18637/jss.v042.i07
Diamond, Alexis and Jasjeet S. Sekhon. 2013. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics. 95 (3): 932–945. https://www.jsekhon.com
Rosenbaum, Paul R. and Donald B. Rubin. 1985. “Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score.” The American Statistician 39:1 33-38.
Hollander, Myles and Douglas A. Wolfe. 1973. Nonparametric statistical inference. New York: John Wiley & Sons.
Also see summary.balanceUV
, qqstats
ks.boot
, Match
, GenMatch
,
MatchBalance
,
GerberGreenImai
, lalonde
data(lalonde) attach(lalonde) foo <- balanceUV(re75[treat==1],re75[treat!=1]) summary(foo)
data(lalonde) attach(lalonde) foo <- balanceUV(re75[treat==1],re75[treat!=1]) summary(foo)
This function finds optimal balance using multivariate matching where
a genetic search algorithm determines the weight each covariate is
given. Balance is determined by examining cumulative probability
distribution functions of a variety of standardized statistics. By
default, these statistics include t-tests and Kolmogorov-Smirnov
tests. A variety of descriptive statistics based on empirical-QQ
(eQQ) plots can also be used or any user provided measure of balance.
The statistics are not used to conduct formal hypothesis tests,
because no measure of balance is a monotonic function of bias and
because balance should be maximized without limit. The object
returned by GenMatch
can be supplied to the Match
function (via the Weight.matrix
option) to obtain causal
estimates. GenMatch
uses genoud
to
perform the genetic search. Using the cluster
option, one may
use multiple computers, CPUs or cores to perform parallel
computations.
GenMatch(Tr, X, BalanceMatrix=X, estimand="ATT", M=1, weights=NULL, pop.size = 100, max.generations=100, wait.generations=4, hard.generation.limit=FALSE, starting.values=rep(1,ncol(X)), fit.func="pvals", MemoryMatrix=TRUE, exact=NULL, caliper=NULL, replace=TRUE, ties=TRUE, CommonSupport=FALSE, nboots=0, ks=TRUE, verbose=FALSE, distance.tolerance=1e-05, tolerance=sqrt(.Machine$double.eps), min.weight=0, max.weight=1000, Domains=NULL, print.level=2, project.path=NULL, paired=TRUE, loss=1, data.type.integer=FALSE, restrict=NULL, cluster=FALSE, balance=TRUE, ...)
GenMatch(Tr, X, BalanceMatrix=X, estimand="ATT", M=1, weights=NULL, pop.size = 100, max.generations=100, wait.generations=4, hard.generation.limit=FALSE, starting.values=rep(1,ncol(X)), fit.func="pvals", MemoryMatrix=TRUE, exact=NULL, caliper=NULL, replace=TRUE, ties=TRUE, CommonSupport=FALSE, nboots=0, ks=TRUE, verbose=FALSE, distance.tolerance=1e-05, tolerance=sqrt(.Machine$double.eps), min.weight=0, max.weight=1000, Domains=NULL, print.level=2, project.path=NULL, paired=TRUE, loss=1, data.type.integer=FALSE, restrict=NULL, cluster=FALSE, balance=TRUE, ...)
Tr |
A vector indicating the observations which are in the treatment regime and those which are not. This can either be a logical vector or a real vector where 0 denotes control and 1 denotes treatment. |
X |
A matrix containing the variables we wish to match on. This matrix may contain the actual observed covariates or the propensity score or a combination of both. |
BalanceMatrix |
A matrix containing the variables we wish
to achieve balance on. This is by default equal to |
estimand |
A character string for the estimand. The default estimand is "ATT", the sample average treatment effect for the treated. "ATE" is the sample average treatment effect, and "ATC" is the sample average treatment effect for the controls. |
M |
A scalar for the number of matches which should be
found. The default is one-to-one matching. Also see the |
weights |
A vector the same length as |
pop.size |
Population Size. This is the number of individuals
|
max.generations |
Maximum Generations. This is the maximum
number of generations that |
wait.generations |
If there is no improvement in the objective
function in this number of generations, optimization will stop. The
other options controlling termination are |
hard.generation.limit |
This logical variable determines if the
|
starting.values |
This vector's length is equal to the number of variables in |
fit.func |
The balance metric |
MemoryMatrix |
This variable controls if
|
exact |
A logical scalar or vector for whether exact matching
should be done. If a logical scalar is
provided, that logical value is applied to all covariates in
|
caliper |
A scalar or vector denoting the caliper(s) which
should be used when matching. A caliper is the distance which is
acceptable for any match. Observations which are outside of the
caliper are dropped. If a scalar caliper is provided, this caliper is
used for all covariates in |
replace |
A logical flag for whether matching should be done with
replacement. Note that if |
ties |
A logical flag for whether ties should be handled deterministically. By
default |
CommonSupport |
This logical flag implements the usual procedure
by which observations outside of the common support of a variable
(usually the propensity score) across treatment and control groups are
discarded. The |
nboots |
The number of bootstrap samples to be run for the
|
ks |
A logical flag for if the univariate bootstrap
Kolmogorov-Smirnov (KS) test should be calculated. If the ks option
is set to true, the univariate KS test is calculated for all
non-dichotomous variables. The bootstrap KS test is consistent even
for non-continuous variables. By default, the bootstrap KS test is
not used. To change this see the |
verbose |
A logical flag for whether details of each
fitness evaluation should be printed. Verbose is set to FALSE if
the |
distance.tolerance |
This is a scalar which is used to determine
if distances between two observations are different from zero. Values
less than |
tolerance |
This is a scalar which is used to determine numerical tolerances. This option is used by numerical routines such as those used to determine if a matrix is singular. |
min.weight |
This is the minimum weight any variable may be given. |
max.weight |
This is the maximum weight any variable may be given. |
Domains |
This is a |
print.level |
This option controls the level of printing. There
are four possible levels: 0 (minimal printing), 1 (normal), 2
(detailed), and 3 (debug). If level 2 is selected, |
project.path |
This is the path of the
|
paired |
A flag for whether the paired |
loss |
The loss function to be optimized. The default value, If the value of |
data.type.integer |
By default, floating-point weights are considered. If this option is
set to |
restrict |
A matrix which restricts the possible matches. This
matrix has one row for each restriction and three
columns. The first two columns contain the two observation numbers
which are to be restricted (for example 4 and 20), and the third
column is the restriction imposed on the observation-pair.
Negative numbers in the third column imply that the two observations
cannot be matched under any circumstances, and positive numbers are
passed on as the distance between the two observations for the
matching algorithm. The most commonly used positive restriction is
Exclusion restriction are even more common. For example, if we want
to exclude the observation pair 4 and 20 and the pair 6 and 55 from
being matched, the restrict matrix would be:
|
cluster |
This
can either be an object of the 'cluster' class returned by one of
the |
balance |
This logical flag controls if load balancing is done
across the cluster. Load balancing can result in better cluster
utilization; however, increased communication can reduce
performance. This option is best used if each individual call to
|
... |
Other options which are passed on to
|
value |
The fit
values at the solution. By default, this is a vector of p-values
sorted from the smallest to the largest. There will generally be
twice as many p-values as there are variables in
|
par |
A vector
of the weights given to each variable in |
Weight.matrix |
A matrix whose diagonal corresponds to the
weight given to each variable in |
matches |
A matrix where the first column contains the row
numbers of the treated observations in the matched dataset. The
second column contains the row numbers of the control
observations. And the third column contains the weight that each
matched pair is given. These objects may not correspond
respectively to the |
ecaliper |
The
size of the enforced caliper on the scale of the |
Jasjeet S. Sekhon, UC Berkeley, [email protected], https://www.jsekhon.com.
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization.” Journal of Statistical Software 42(7): 1-52. doi:10.18637/jss.v042.i07
Diamond, Alexis and Jasjeet S. Sekhon. 2013. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics. 95 (3): 932–945. https://www.jsekhon.com
Sekhon, Jasjeet Singh and Walter R. Mebane, Jr. 1998. "Genetic Optimization Using Derivatives: Theory and Application to Nonlinear Models.” Political Analysis, 7: 187-210. https://www.jsekhon.com
Also see Match
, summary.Match
,
MatchBalance
, genoud
,
balanceUV
, qqstats
,
ks.boot
, GerberGreenImai
, lalonde
data(lalonde) attach(lalonde) #The covariates we want to match on X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74) #The covariates we want to obtain balance on BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74, I(re74*re75)) # #Let's call GenMatch() to find the optimal weight to give each #covariate in 'X' so as we have achieved balance on the covariates in #'BalanceMat'. This is only an example so we want GenMatch to be quick #so the population size has been set to be only 16 via the 'pop.size' #option. This is *WAY* too small for actual problems. #For details see https://www.jsekhon.com. # genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1, pop.size=16, max.generations=10, wait.generations=1) #The outcome variable Y=re78/1000 # # Now that GenMatch() has found the optimal weights, let's estimate # our causal effect of interest using those weights # mout <- Match(Y=Y, Tr=treat, X=X, estimand="ATE", Weight.matrix=genout) summary(mout) # #Let's determine if balance has actually been obtained on the variables of interest # mb <- MatchBalance(treat~age +educ+black+ hisp+ married+ nodegr+ u74+ u75+ re75+ re74+ I(re74*re75), match.out=mout, nboots=500) # For more examples see: https://www.jsekhon.com.
data(lalonde) attach(lalonde) #The covariates we want to match on X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74) #The covariates we want to obtain balance on BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74, I(re74*re75)) # #Let's call GenMatch() to find the optimal weight to give each #covariate in 'X' so as we have achieved balance on the covariates in #'BalanceMat'. This is only an example so we want GenMatch to be quick #so the population size has been set to be only 16 via the 'pop.size' #option. This is *WAY* too small for actual problems. #For details see https://www.jsekhon.com. # genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1, pop.size=16, max.generations=10, wait.generations=1) #The outcome variable Y=re78/1000 # # Now that GenMatch() has found the optimal weights, let's estimate # our causal effect of interest using those weights # mout <- Match(Y=Y, Tr=treat, X=X, estimand="ATE", Weight.matrix=genout) summary(mout) # #Let's determine if balance has actually been obtained on the variables of interest # mb <- MatchBalance(treat~age +educ+black+ hisp+ married+ nodegr+ u74+ u75+ re75+ re74+ I(re74*re75), match.out=mout, nboots=500) # For more examples see: https://www.jsekhon.com.
This is the dataset used by Imai (2005) to replicate and evaluate the field experiment done by Gerber and Green (2000). The accompanying demo replicates Imai's propensity score model which is then used to estimate the causal effect of get-out-the-vote telephone calls on turnout.
data(GerberGreenImai)
data(GerberGreenImai)
A data frame with 10829 observations on the following 26 variables.
Number persons in household
Ward of residence
Asked to commit to voting
Sent mail
Phone batch #1
Personal contact attempted
Content of message
Personal contact occurred
Number of mailings sent
Age of respondent
Democratic or Republican
Abstained in 1996
Voted in 1996
Phone batch #2
Voted in 1998
Script read to phone respondents
Contacted by phone in batch #2
Contacted by phone in batch #1
Contacted by phone
Phone contact attempted (no blood or blood/civic)
Phone contact attempted (no blood)
Contact occurred in phntrt1
Contact occurred in phntrt2
New voter
Contacted by phone
Age squared
The demo provided, entitled GerberGreenImai
, uses Imai's
propensity score model to estimate the causal effect of
get-out-the-vote telephone calls on turnout. The propensity score
model fails to balance age.
Gerber, Alan S. and Donald P. Green. 2000. “The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment.” American Political Science Review 94: 653-663.
Gerber, Alan S. and Donald P. Green. 2005. “Correction to Gerber and Green (2000), replication of disputed findings, and reply to Imai (2005).” American Political Science Review 99: 301-313.
Imai, Kosuke. 2005. “Do Get-Out-The-Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments.” American Political Science Review 99: 283-300.
Hansen, Ben B. Hansen and Jake Bowers. forthcoming. “Attributing Effects to a Cluster Randomized Get-Out-The-Vote Campaign.” Journal of the American Statistical Association.
Also see Match
and MatchBalance
,
GenMatch
, balanceUV
, ks.boot
lalonde
This function executes a bootstrap version of the univariate Kolmogorov-Smirnov test which provides correct coverage even when the distributions being compared are not entirely continuous. Ties are allowed with this test unlike the traditional Kolmogorov-Smirnov test.
ks.boot(Tr, Co, nboots=1000, alternative = c("two.sided","less","greater"), print.level=0)
ks.boot(Tr, Co, nboots=1000, alternative = c("two.sided","less","greater"), print.level=0)
Tr |
A vector containing the treatment observations. |
Co |
A vector containing the control observations. |
nboots |
The number of bootstraps to be performed. These are, in fact, really Monte Carlo simulations which are preformed in order to determine the proper p-value from the empiric. |
alternative |
indicates the alternative hypothesis and must be one of
'"two.sided"' (default), '"less"', or '"greater"'. You can
specify just the initial letter. See |
print.level |
If this is greater than 1, then the simulation count is printed out while the simulations are being done. |
ks.boot.pvalue |
The bootstrap p-value of the Kolmogorov-Smirnov test for the hypothesis that the probability densities for both the treated and control groups are the same. |
ks |
Return object from |
nboots |
The number of bootstraps which were completed. |
Jasjeet S. Sekhon, UC Berkeley, [email protected], https://www.jsekhon.com.
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization.” Journal of Statistical Software 42(7): 1-52. doi:10.18637/jss.v042.i07
Diamond, Alexis and Jasjeet S. Sekhon. 2013. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics. 95 (3): 932–945. https://www.jsekhon.com
Abadie, Alberto. 2002. “Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models.” Journal of the American Statistical Association, 97:457 (March) 284-292.
Also see summary.ks.boot
,
qqstats
, balanceUV
, Match
,
GenMatch
, MatchBalance
,
GerberGreenImai
, lalonde
# # Replication of Dehejia and Wahba psid3 model # # Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in # Non-Experimental Studies: Re-Evaluating the Evaluation of Training # Programs.''Journal of the American Statistical Association 94 (448): # 1053-1062. # data(lalonde) # # Estimate the propensity model # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) # #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # # one-to-one matching with replacement (the "M=1" option). # Estimating the treatment effect on the treated (the "estimand" option which defaults to 0). # rr <- Match(Y=Y,Tr=Tr,X=X,M=1); summary(rr) # # Do we have balance on 1975 income after matching? # ks <- ks.boot(lalonde$re75[rr$index.treated], lalonde$re75[rr$index.control], nboots=500) summary(ks)
# # Replication of Dehejia and Wahba psid3 model # # Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in # Non-Experimental Studies: Re-Evaluating the Evaluation of Training # Programs.''Journal of the American Statistical Association 94 (448): # 1053-1062. # data(lalonde) # # Estimate the propensity model # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) # #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # # one-to-one matching with replacement (the "M=1" option). # Estimating the treatment effect on the treated (the "estimand" option which defaults to 0). # rr <- Match(Y=Y,Tr=Tr,X=X,M=1); summary(rr) # # Do we have balance on 1975 income after matching? # ks <- ks.boot(lalonde$re75[rr$index.treated], lalonde$re75[rr$index.control], nboots=500) summary(ks)
Dataset used by Dehejia and Wahba (1999) to evaluate propensity score matching.
data(lalonde)
data(lalonde)
A data frame with 445 observations on the following 12 variables.
age in years.
years of schooling.
indicator variable for blacks.
indicator variable for Hispanics.
indicator variable for martial status.
indicator variable for high school diploma.
real earnings in 1974.
real earnings in 1975.
real earnings in 1978.
indicator variable for earnings in 1974 being zero.
indicator variable for earnings in 1975 being zero.
an indicator variable for treatment status.
Two demos are provided which use this dataset. The first,
DehejiaWahba
, replicates one of the models from Dehejia and
Wahba (1999). The second demo, AbadieImbens
, replicates the
models produced by Abadie and Imbens in their Matlab code.
Many of these models are found to produce good balance for the Lalonde
data.
Dehejia, Rajeev and Sadek Wahba. 1999.“Causal Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs.” Journal of the American Statistical Association 94 (448): 1053-1062.
LaLonde, Robert. 1986. “Evaluating the Econometric Evaluations of Training Programs.” American Economic Review 76:604-620.
Also see Match
, GenMatch
,
MatchBalance
,
balanceUV
,
ks.boot
, GerberGreenImai
Match
implements a variety of algorithms for multivariate
matching including propensity score, Mahalanobis and inverse variance
matching. The function is intended to be used in conjunction with the
MatchBalance
function which determines the extent to which
Match
has been able to achieve covariate balance. In order to
do propensity score matching, one should estimate the propensity model
before calling Match
, and then send Match
the propensity
score to use. Match
enables a wide variety of matching
options including matching with or without replacement, bias
adjustment, different methods for handling ties, exact and caliper
matching, and a method for the user to fine tune the matches via a
general restriction matrix. Variance estimators include the usual
Neyman standard errors, Abadie-Imbens standard errors, and robust
variances which do not assume a homogeneous causal effect. The
GenMatch
function can be used to automatically
find balance via a genetic search algorithm which determines the
optimal weight to give each covariate.
Match(Y=NULL, Tr, X, Z = X, V = rep(1, length(Y)), estimand = "ATT", M = 1, BiasAdjust = FALSE, exact = NULL, caliper = NULL, replace=TRUE, ties=TRUE, CommonSupport=FALSE,Weight = 1, Weight.matrix = NULL, weights = NULL, Var.calc = 0, sample = FALSE, restrict=NULL, match.out = NULL, distance.tolerance = 1e-05, tolerance=sqrt(.Machine$double.eps), version="standard")
Match(Y=NULL, Tr, X, Z = X, V = rep(1, length(Y)), estimand = "ATT", M = 1, BiasAdjust = FALSE, exact = NULL, caliper = NULL, replace=TRUE, ties=TRUE, CommonSupport=FALSE,Weight = 1, Weight.matrix = NULL, weights = NULL, Var.calc = 0, sample = FALSE, restrict=NULL, match.out = NULL, distance.tolerance = 1e-05, tolerance=sqrt(.Machine$double.eps), version="standard")
Y |
A vector containing the outcome of interest. Missing values are not allowed. An outcome vector is not required because the matches generated will be the same regardless of the outcomes. Of course, without any outcomes no causal effect estimates will be produced, only a matched dataset. |
Tr |
A vector indicating the observations which are in the treatment regime and those which are not. This can either be a logical vector or a real vector where 0 denotes control and 1 denotes treatment. |
X |
A matrix containing the variables we wish to match on.
This matrix may contain the actual observed covariates or the
propensity score or a combination of both. All columns of this
matrix must have positive variance or |
Z |
A matrix containing the covariates for which we wish to make bias adjustments. |
V |
A matrix containing the covariates for which the variance
of the causal effect may vary. Also see the |
estimand |
A character string for the estimand. The default estimand is "ATT", the sample average treatment effect for the treated. "ATE" is the sample average treatment effect, and "ATC" is the sample average treatment effect for the controls. |
M |
A scalar for the number of matches which should be
found. The default is one-to-one matching. Also see the |
BiasAdjust |
A logical scalar for whether regression adjustment
should be used. See the |
exact |
A logical scalar or vector for whether exact matching
should be done. If a logical scalar is provided, that logical value is
applied to all covariates in
|
caliper |
A scalar or vector denoting the caliper(s) which
should be used when matching. A caliper is the distance which is
acceptable for any match. Observations which are outside of the
caliper are dropped. If a scalar caliper is provided, this caliper is
used for all covariates in |
replace |
A logical flag for whether matching should be done with
replacement. Note that if |
ties |
A logical flag for whether ties should be handled deterministically. By
default |
CommonSupport |
This logical flag implements the usual procedure
by which observations outside of the common support of a variable
(usually the propensity score) across treatment and control groups are
discarded. The |
Weight |
A scalar for the type of weighting scheme the matching
algorithm should use when weighting each of the covariates in
|
Weight.matrix |
This matrix denotes the weights the matching
algorithm uses when weighting each of the covariates in For most uses, this matrix has zeros in the off-diagonal
cells. This matrix can be used to weight some variables more than
others. For
example, if |
weights |
A vector the same length as |
Var.calc |
A scalar for the variance estimate
that should be used. By default |
sample |
A logical flag for whether the population or sample variance is returned. |
distance.tolerance |
This is a scalar which is used to determine
if distances between two observations are different from zero. Values
less than |
tolerance |
This is a scalar which is used to determine numerical tolerances. This option is used by numerical routines such as those used to determine if a matrix is singular. |
restrict |
A matrix which restricts the possible matches. This
matrix has one row for each restriction and three
columns. The first two columns contain the two observation numbers
which are to be restricted (for example 4 and 20), and the third
column is the restriction imposed on the observation-pair.
Negative numbers in the third column imply that the two observations
cannot be matched under any circumstances, and positive numbers are
passed on as the distance between the two observations for the
matching algorithm. The most commonly used positive restriction is
Exclusion restrictions are even more common. For example, if we want
to exclude the observation pair 4 and 20 and
the pair 6 and 55 from being matched, the restrict matrix would be:
|
match.out |
The return object from a previous call to
|
version |
The version of the code to be used. The "fast" C/C++
version of the code does not calculate Abadie-Imbens standard errors.
Additional speed can be obtained by setting |
This function is intended to be used in conjunction with the
MatchBalance
function which checks if the results of this
function have actually achieved balance. The results of this function
can be summarized by a call to the summary.Match
function. If one wants to do propensity score matching, one should estimate the
propensity model before calling Match
, and then place the
fitted values in the X
matrix—see the provided example.
The GenMatch
function can be used to automatically
find balance by the use of a genetic search algorithm which determines
the optimal weight to give each covariate. The object returned by
GenMatch
can be supplied to the Weight.matrix
option of Match
to obtain estimates.
Match
is often much faster with large datasets if
ties=FALSE
or replace=FALSE
—i.e., if matching is done
by randomly breaking ties or without replacement. Also see the
Matchby
function. It provides a wrapper for
Match
which is much faster for large datasets when it can be
used.
Three demos are included: GerberGreenImai
, DehejiaWahba
,
and AbadieImbens
. These can be run by calling the
demo
function such as by demo(DehejiaWahba)
.
est |
The estimated average causal effect. |
se |
The Abadie-Imbens standard error. This standard error has
correct coverage if |
est.noadj |
The estimated average causal effect without any
|
se.standard |
The usual standard error. This is the standard error
calculated on the matched data using the usual method of calculating
the difference of means (between treated and control) weighted by the
observation weights provided by |
se.cond |
The conditional standard error. The practitioner should not generally use this. |
mdata |
A list which contains the matched datasets produced by
|
index.treated |
A vector containing the observation numbers from
the original dataset for the treated observations in the
matched dataset. This index in conjunction with |
index.control |
A vector containing the observation numbers from
the original data for the control observations in the
matched data. This index in conjunction with |
index.dropped |
A vector containing the observation numbers from
the original data which were dropped (if any) in the matched dataset
because of various options such as |
weights |
A vector of weights. There is one weight for each matched-pair in the matched dataset. If all of the observations had a weight of 1 on input, then each matched-pair will have a weight of 1 on output if there are no ties. |
orig.nobs |
The original number of observations in the dataset. |
orig.wnobs |
The original number of weighted observations in the dataset. |
orig.treated.nobs |
The original number of treated observations (unweighted). |
nobs |
The number of observations in the matched dataset. |
wnobs |
The number of weighted observations in the matched dataset. |
caliper |
The |
ecaliper |
The size of the enforced caliper on the scale of the
|
exact |
The value of the |
ndrops |
The number of weighted observations which were dropped
either because of caliper or exact matching. This number, unlike
|
ndrops.matches |
The number of matches which were dropped either because of caliper or exact matching. |
Jasjeet S. Sekhon, UC Berkeley, [email protected], https://www.jsekhon.com.
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization.” Journal of Statistical Software 42(7): 1-52. doi:10.18637/jss.v042.i07
Diamond, Alexis and Jasjeet S. Sekhon. 2013. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics. 95 (3): 932–945. https://www.jsekhon.com
Abadie, Alberto and Guido Imbens. 2006. “Large Sample Properties of Matching Estimators for Average Treatment Effects.” Econometrica 74(1): 235-267.
Imbens, Guido. 2004. Matching Software for Matlab and Stata.
Also see summary.Match
,
GenMatch
,
MatchBalance
,
Matchby
,
balanceUV
,
qqstats
, ks.boot
,
GerberGreenImai
, lalonde
# Replication of Dehejia and Wahba psid3 model # # Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in # Non-Experimental Studies: Re-Evaluating the Evaluation of Training # Programs.''Journal of the American Statistical Association 94 (448): # 1053-1062. data(lalonde) # # Estimate the propensity model # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) # #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # # one-to-one matching with replacement (the "M=1" option). # Estimating the treatment effect on the treated (the "estimand" option defaults to ATT). # rr <- Match(Y=Y, Tr=Tr, X=X, M=1); summary(rr) # Let's check the covariate balance # 'nboots' is set to small values in the interest of speed. # Please increase to at least 500 each for publication quality p-values. mb <- MatchBalance(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, data=lalonde, match.out=rr, nboots=10)
# Replication of Dehejia and Wahba psid3 model # # Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in # Non-Experimental Studies: Re-Evaluating the Evaluation of Training # Programs.''Journal of the American Statistical Association 94 (448): # 1053-1062. data(lalonde) # # Estimate the propensity model # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) # #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # # one-to-one matching with replacement (the "M=1" option). # Estimating the treatment effect on the treated (the "estimand" option defaults to ATT). # rr <- Match(Y=Y, Tr=Tr, X=X, M=1); summary(rr) # Let's check the covariate balance # 'nboots' is set to small values in the interest of speed. # Please increase to at least 500 each for publication quality p-values. mb <- MatchBalance(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, data=lalonde, match.out=rr, nboots=10)
This function provides a variety of balance statistics useful for
determining if balance exists in any unmatched dataset and
in matched datasets produced by the Match
function. Matching is performed by the Match
function,
and MatchBalance
is used to determine if Match
was successful in achieving balance on the observed covariates.
MatchBalance(formul, data = NULL, match.out = NULL, ks = TRUE, nboots=500, weights=NULL, digits=5, paired=TRUE, print.level=1)
MatchBalance(formul, data = NULL, match.out = NULL, ks = TRUE, nboots=500, weights=NULL, digits=5, paired=TRUE, print.level=1)
formul |
This formula does not estimate any model. The formula is simply an efficient way to use the R modeling language to list the variables we wish to obtain univariate balance statistics for. The dependent variable in the formula is usually the treatment indicator. One should include many functions of the observed covariates. Generally, one should request balance statistics on more higher-order terms and interactions than were used to conduct the matching itself. |
data |
A data frame which contains all of the variables in the formula. If a data frame is not provided, the variables are obtained via lexical scoping. |
match.out |
The output object from the |
ks |
A logical flag for whether the univariate bootstrap
Kolmogorov-Smirnov (KS) test should be calculated. If the ks option
is set to true, the univariate KS test is calculated for all
non-dichotomous variables. The bootstrap KS test is consistent even
for non-continuous variables. See |
weights |
An optional vector of observation specific weights. |
nboots |
The number of bootstrap samples to be run. If zero, no
bootstraps are done. Bootstrapping is highly recommended because
the bootstrapped Kolmogorov-Smirnov test provides correct coverage
even when the distributions being compared are not continuous. At
least 500 |
digits |
The number of significant digits that should be displayed. |
paired |
A flag for whether the paired |
print.level |
The amount of printing to be done. If zero, there is no printing. If one, the results are summarized. If two, details of the computations are printed. |
This function can be used to determine if there is balance in the pre-
and/or post-matching datasets. Difference of means between treatment
and control groups are provided as well as a variety of summary
statistics for the empirical CDF (eCDF) and empirical-QQ (eQQ) plot
between the two groups. The eCDF results are the standardized mean,
median and maximum differences in the empirical CDF. The eQQ results
are summaries of the raw differences in the empirical-QQ plot.
Two univariate tests are also provided: the t-test and the bootstrap
Kolmogorov-Smirnov (KS) test. These tests should not be treated as
hypothesis tests in the usual fashion because we wish to maximize
balance without limit. The bootstrap KS test is highly
recommended (see the ks
and nboots
options) because the
bootstrap KS is consistent even for non-continuous distributions.
Before matching, the two sample t-test is used; after matching, the
paired t-test is used.
Two multivariate tests are provided. The KS and Chi-Square null deviance tests. The KS test is to be preferred over the Chi-Square test because the Chi-Square test is not testing the relevant hypothesis. The null hypothesis for the KS test is equal balance in the estimated probabilities between treated and control. The null hypothesis for the Chi-Square test, however, is all of the parameters being insignificant; a comparison of residual versus null deviance. If the covariates being considered are discrete, this KS test is asymptotically nonparametric as long as the logit model does not produce zero parameter estimates.
NA
's are handled by the na.action
option. But it
is highly recommended that NA
's not simply be deleted, but
one should check to make sure that missingness is balanced.
BeforeMatching |
A list containing the before matching univariate
balance statistics. That is, a list containing the results of
the |
AfterMatching |
A list containing the after matching univariate
balance statistics. That is, a list containing the results of
the |
BMsmallest.p.value |
The smallest p.value found across all of the before matching balance tests (including t-tests and KS-tests. |
BMsmallestVarName |
The name of the variable with the
|
BMsmallestVarNumber |
The number of the variable with the
|
AMsmallest.p.value |
The smallest p.value found across all of the after matching balance tests (including t-tests and KS-tests. |
AMsmallestVarName |
The name of the variable with the
|
AMsmallestVarNumber |
The number of the variable with the
|
Jasjeet S. Sekhon, UC Berkeley, [email protected], https://www.jsekhon.com.
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization.” Journal of Statistical Software 42(7): 1-52. doi:10.18637/jss.v042.i07
Diamond, Alexis and Jasjeet S. Sekhon. 2013. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics. 95 (3): 932–945. https://www.jsekhon.com
Abadie, Alberto. 2002. “Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models.” Journal of the American Statistical Association, 97:457 (March) 284-292.
Hall, Peter. 1992. The Bootstrap and Edgeworth Expansion. New York: Springer-Verlag.
Wilcox, Rand R. 1997. Introduction to Robust Estimation. San Diego, CA: Academic Press.
William J. Conover (1971), Practical nonparametric statistics. New York: John Wiley & Sons. Pages 295-301 (one-sample "Kolmogorov" test), 309-314 (two-sample "Smirnov" test).
Shao, Jun and Dongsheng Tu. 1995. The Jackknife and Bootstrap. New York: Springer-Verlag.
Also see Match
, GenMatch
,
balanceUV
, qqstats
, ks.boot
,
GerberGreenImai
, lalonde
# # Replication of Dehejia and Wahba psid3 model # # Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in # Non-Experimental Studies: Re-Evaluating the Evaluation of Training # Programs.''Journal of the American Statistical Association 94 (448): # 1053-1062. data(lalonde) # # Estimate the propensity model # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) # #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # # one-to-one matching with replacement (the "M=1" option). # Estimating the treatment effect on the treated (the "estimand" option which defaults to 0). # rr <- Match(Y=Y,Tr=Tr,X=X,M=1); #Let's summarize the output summary(rr) # Let's check the covariate balance # 'nboots' is set to small values in the interest of speed. # Please increase to at least 500 each for publication quality p-values. mb <- MatchBalance(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, data=lalonde, match.out=rr, nboots=10)
# # Replication of Dehejia and Wahba psid3 model # # Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in # Non-Experimental Studies: Re-Evaluating the Evaluation of Training # Programs.''Journal of the American Statistical Association 94 (448): # 1053-1062. data(lalonde) # # Estimate the propensity model # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) # #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # # one-to-one matching with replacement (the "M=1" option). # Estimating the treatment effect on the treated (the "estimand" option which defaults to 0). # rr <- Match(Y=Y,Tr=Tr,X=X,M=1); #Let's summarize the output summary(rr) # Let's check the covariate balance # 'nboots' is set to small values in the interest of speed. # Please increase to at least 500 each for publication quality p-values. mb <- MatchBalance(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, data=lalonde, match.out=rr, nboots=10)
This function is a wrapper for the Match
function which
separates the matching problem into subgroups defined by a factor.
This is equivalent to conducting exact matching on each level of a factor.
Matches within each level are found as determined by the
usual matching options. This function is much faster for large
datasets than the Match
function itself. For additional
speed, consider doing matching without replacement—see the
replace
option. This function is more limited than the
Match
function. For example, Matchby
cannot be
used if the user wishes to provide observation specific weights.
Matchby(Y, Tr, X, by, estimand = "ATT", M = 1, ties=FALSE, replace=TRUE, exact = NULL, caliper = NULL, AI=FALSE, Var.calc=0, Weight = 1, Weight.matrix = NULL, distance.tolerance = 1e-05, tolerance = sqrt(.Machine$double.eps), print.level=1, version="Matchby", ...)
Matchby(Y, Tr, X, by, estimand = "ATT", M = 1, ties=FALSE, replace=TRUE, exact = NULL, caliper = NULL, AI=FALSE, Var.calc=0, Weight = 1, Weight.matrix = NULL, distance.tolerance = 1e-05, tolerance = sqrt(.Machine$double.eps), print.level=1, version="Matchby", ...)
Y |
A vector containing the outcome of interest. Missing values are not allowed. |
Tr |
A vector indicating the observations which are in the treatment regime and those which are not. This can either be a logical vector or a real vector where 0 denotes control and 1 denotes treatment. |
X |
A matrix containing the variables we wish to match on. This matrix may contain the actual observed covariates or the propensity score or a combination of both. |
by |
A "factor" in the sense that |
estimand |
A character string for the estimand. The default estimand is "ATT", the sample average treatment effect for the treated. "ATE" is the sample average treatment effect (for all), and "ATC" is the sample average treatment effect for the controls. |
M |
A scalar for the number of matches which should be
found. The default is one-to-one matching. Also see the
|
ties |
A logical flag for whether ties should be handled
deterministically. By default |
replace |
Whether matching should be done with replacement. Note
that if |
exact |
A logical scalar or vector for whether exact matching
should be done. If a logical scalar is provided, that logical value is
applied to all covariates of
|
caliper |
A scalar or vector denoting the caliper(s) which
should be used when matching. A caliper is the distance which is
acceptable for any match. Observations which are outside of the
caliper are dropped. If a scalar caliper is provided, this caliper is
used for all covariates in |
AI |
A logical flag for if the Abadie-Imbens standard error
should be calculated. It is computationally expensive to calculate
with large datasets. |
Var.calc |
A scalar for the variance estimate
that should be used. By default |
Weight |
A scalar for the type of
weighting scheme the matching algorithm should use when weighting
each of the covariates in |
Weight.matrix |
This matrix denotes the weights the matching
algorithm uses when weighting each of the covariates in For most uses, this matrix has zeros in the off-diagonal
cells. This matrix can be used to weight some variables more than
others. For
example, if |
distance.tolerance |
This is a scalar which is used to determine if distances
between two observations are different from zero. Values less than
|
tolerance |
This is a scalar which is used to determine numerical tolerances. This option is used by numerical routines such as those used to determine if a matrix is singular. |
print.level |
The level of printing. Set to '0' to turn off printing. |
version |
The version of the code to be used. The "Matchby" C/C++ version of the code is the fastest, and the end-user should not change this option. |
... |
Additional arguments passed on to |
Matchby
is much faster for large datasets than
Match
. But Matchby
only implements a subset of
the functionality of Match
. For example, the
restrict
option cannot be used, Abadie-Imbens standard errors
are not provided and bias adjustment cannot be requested.
Matchby
is a wrapper for the Match
function which
separates the matching problem into subgroups defined by a factor. This
is the equivalent to doing exact matching on each factor, and the
way in which matches are found within each factor is determined by the
usual matching options.
Note that by default ties=FALSE
although the default for
the Match
in GenMatch
functions is TRUE
. This is
done because randomly breaking ties in large datasets often results in
a great speedup. For additional speed, consider doing matching
without replacement which is often much faster when the dataset is
large—see the replace
option.
There will be slight differences in the matches produced by
Matchby
and Match
because of how the covariates
are weighted. When the data is broken up into separate groups (via
the by
option), Mahalanobis distance and inverse variance
will imply different weights than when the data is taken as whole.
est |
The estimated average causal effect. |
se.standard |
The usual standard error. This is the standard error calculated on the matched data using the usual method of calculating the difference of means (between treated and control) weighted so that ties are taken into account. |
se |
The Abadie-Imbens standard error. This is only calculated
if the |
index.treated |
A vector containing the observation numbers from
the original dataset for the treated observations in the
matched dataset. This index in conjunction with |
index.control |
A vector containing the observation numbers from
the original data for the control observations in the
matched data. This index in conjunction with |
weights |
The weights for each observation in the matched dataset. |
orig.nobs |
The original number of observations in the dataset. |
nobs |
The number of observations in the matched dataset. |
wnobs |
The number of weighted observations in the matched dataset. |
orig.treated.nobs |
The original number of treated observations. |
ndrops |
The number of matches which were dropped because there were not enough observations in a given group and because of caliper and exact matching. |
estimand |
The estimand which was estimated. |
version |
The version of |
Jasjeet S. Sekhon, UC Berkeley, [email protected], https://www.jsekhon.com.
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization.” Journal of Statistical Software 42(7): 1-52. doi:10.18637/jss.v042.i07
Diamond, Alexis and Jasjeet S. Sekhon. 2013. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics. 95 (3): 932–945. https://www.jsekhon.com
Abadie, Alberto and Guido Imbens. 2006. “Large Sample Properties of Matching Estimators for Average Treatment Effects.” Econometrica 74(1): 235-267.
Imbens, Guido. 2004. Matching Software for Matlab and Stata.
Also see Match
,
summary.Matchby
,
GenMatch
,
MatchBalance
,
balanceUV
,
qqstats
, ks.boot
,
GerberGreenImai
, lalonde
# # Match exactly by racial groups and then match using the propensity score within racial groups # data(lalonde) # # Estimate the Propensity Score # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # one-to-one matching with replacement (the "M=1" option) after exactly # matching on race using the 'by' option. Estimating the treatment # effect on the treated (the "estimand" option defaults to ATT). rr <- Matchby(Y=Y, Tr=Tr, X=X, by=lalonde$black, M=1); summary(rr) # Let's check the covariate balance # 'nboots' is set to small values in the interest of speed. # Please increase to at least 500 each for publication quality p-values. mb <- MatchBalance(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, data=lalonde, match.out=rr, nboots=10)
# # Match exactly by racial groups and then match using the propensity score within racial groups # data(lalonde) # # Estimate the Propensity Score # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # one-to-one matching with replacement (the "M=1" option) after exactly # matching on race using the 'by' option. Estimating the treatment # effect on the treated (the "estimand" option defaults to ATT). rr <- Matchby(Y=Y, Tr=Tr, X=X, by=lalonde$black, M=1); summary(rr) # Let's check the covariate balance # 'nboots' is set to small values in the interest of speed. # Please increase to at least 500 each for publication quality p-values. mb <- MatchBalance(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, data=lalonde, match.out=rr, nboots=10)
This function calculates a set of summary statistics for the QQ
plot of two samples of data. The summaries are useful for determining
if the two samples are from the same distribution. If
standardize==TRUE
, the empirical CDF is used instead of the
empirical-QQ plot. The later retains the scale of the variable.
qqstats(x, y, standardize=TRUE, summary.func)
qqstats(x, y, standardize=TRUE, summary.func)
x |
The first sample. |
y |
The second sample. |
standardize |
A logical flag for whether the statistics should be standardized by the empirical cumulative distribution functions of the two samples. |
summary.func |
A user provided function to summarize the
difference between the two distributions. The function should
expect a vector of the differences as an argument and return summary
statistic. For example, the |
meandiff |
The mean difference between the QQ plots of the two samples. |
mediandiff |
The median difference between the QQ plots of the two samples. |
maxdiff |
The maximum difference between the QQ plots of the two samples. |
summarydiff |
If the user provides a |
summary.func |
If the user provides a |
Jasjeet S. Sekhon, UC Berkeley, [email protected], https://www.jsekhon.com.
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization.” Journal of Statistical Software 42(7): 1-52. doi:10.18637/jss.v042.i07
Diamond, Alexis and Jasjeet S. Sekhon. Forthcoming. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics. https://www.jsekhon.com
Also see ks.boot
,
balanceUV
, Match
,
GenMatch
,
MatchBalance
,
GerberGreenImai
, lalonde
# # Replication of Dehejia and Wahba psid3 model # # Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in # Non-Experimental Studies: Re-Evaluating the Evaluation of Training # Programs.''Journal of the American Statistical Association 94 (448): # 1053-1062. # data(lalonde) # # Estimate the propensity model # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) # #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # # one-to-one matching with replacement (the "M=1" option). # Estimating the treatment effect on the treated (the "estimand" option which defaults to 0). # rr <- Match(Y=Y,Tr=Tr,X=X,M=1); summary(rr) # # Do we have balance on 1975 income after matching? # qqout <- qqstats(lalonde$re75[rr$index.treated], lalonde$re75[rr$index.control]) print(qqout)
# # Replication of Dehejia and Wahba psid3 model # # Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in # Non-Experimental Studies: Re-Evaluating the Evaluation of Training # Programs.''Journal of the American Statistical Association 94 (448): # 1053-1062. # data(lalonde) # # Estimate the propensity model # glm1 <- glm(treat~age + I(age^2) + educ + I(educ^2) + black + hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75, family=binomial, data=lalonde) # #save data objects # X <- glm1$fitted Y <- lalonde$re78 Tr <- lalonde$treat # # one-to-one matching with replacement (the "M=1" option). # Estimating the treatment effect on the treated (the "estimand" option which defaults to 0). # rr <- Match(Y=Y,Tr=Tr,X=X,M=1); summary(rr) # # Do we have balance on 1975 income after matching? # qqout <- qqstats(lalonde$re75[rr$index.treated], lalonde$re75[rr$index.control]) print(qqout)
summary
method for class balanceUV
## S3 method for class 'balanceUV' summary(object, ..., digits=5)
## S3 method for class 'balanceUV' summary(object, ..., digits=5)
object |
An object of class " |
digits |
The number of significant digits that should be displayed. |
... |
Other options for the generic summary function. |
Jasjeet S. Sekhon, UC Berkeley, [email protected], https://www.jsekhon.com.
Also see balanceUV
,
Match
, GenMatch
,
MatchBalance
, qqstats
, ks.boot
,
GerberGreenImai
,
lalonde
summary
method for class ks.boot
## S3 method for class 'ks.boot' summary(object, ..., digits=5)
## S3 method for class 'ks.boot' summary(object, ..., digits=5)
object |
An object of class " |
digits |
The number of significant digits that should be displayed. |
... |
Other options for the generic summary function. |
Jasjeet S. Sekhon, UC Berkeley, [email protected], https://www.jsekhon.com.
Also see ks.boot
, balanceUV
, qqstats
,
Match
, GenMatch
,
MatchBalance
, GerberGreenImai
,
lalonde
summary
method for class Match
## S3 method for class 'Match' summary(object, ... , full=FALSE, digits=5)
## S3 method for class 'Match' summary(object, ... , full=FALSE, digits=5)
object |
An object of class " |
full |
A flag for whether the unadjusted estimates and naive standard errors should also be summarized. |
digits |
The number of significant digits that should be displayed. |
... |
Other options for the generic summary function. |
Jasjeet S. Sekhon, UC Berkeley, [email protected], https://www.jsekhon.com.
Also see Match
, GenMatch
,
MatchBalance
,
balanceUV
, qqstats
, ks.boot
,
GerberGreenImai
, lalonde
summary
method for class Matchby
## S3 method for class 'Matchby' summary(object, ... , digits=5)
## S3 method for class 'Matchby' summary(object, ... , digits=5)
object |
An object of class " |
digits |
The number of significant digits that should be displayed. |
... |
Other options for the generic summary function. |
Jasjeet S. Sekhon, UC Berkeley, [email protected], https://www.jsekhon.com.
Also see Matchby
, Match
, GenMatch
,
MatchBalance
,
balanceUV
, qqstats
, ks.boot
,
GerberGreenImai
, lalonde