The spatial data are typically collected from points or regions located in space and thus tend to be spatially dependent. Ignoring the violation of spatial independence between observations will produce estimates that are biased, inconsistent or inefficient. A large variety of spatial models to take spatial dependence among observations into account have been developed [1-3].
Measurement errors in the spatially lagged explanatory variables is are not routinely accounted for, in spite of the fact that their consequences are serious. The estimator of the coefficients spatially lagged exogenous variables are attenuated, while the estimator of the variance components are inflated, if covariate measurement error is ignored . However, the amount of attenuation depends on the degree of spatial correlation in both the true covariates and the random error term of the regression model .
Several approaches to correct for measurement error in spatially lagged exogenous regressors have been proposed in literature. The Maximum Likelihood (ML) based on an Expectation-Maximization EM algorithm correct the biases in the estimators of the naive estimator, i.e. the estimator that ignore the measurement error, but are associated with larger variances . Another approaches adjusting the estimates by means of an estimated attenuation factor obtained by the method of moments, or using an appropriate transformation of the error prone covariate . Additionally, a semiparametric approach i.e. penalized least squares to obtain a bias-corrected estimator of the parameters could be as an alternative .
The error-prone covariates and the random errors are usually assumed to be symmetrically, normally distribution [4-6]. However, the assumption of normality may be too restrictive in many applications [7,8]. The linear models with Skew-normal measurement error models perform better when there is evidence of departure from symmetry or normality . Furthermore, the Skew-normal linear mixed measurement error outperform the normal mixed measurement error model when the actual covariate distribution has a Skew-normal .
Among several approaches to correct for measurement error, Bayesian methods provide the most flexible framework. The advantage of Bayesian approaches is that prior knowledge, and in particular prior uncertainty of error variance can be incorporated in the model. While frequentist approaches require fixing the regression coefficients and the variance components parameters to guarantee identifiability, the Bayesian setting allows to represent uncertainty with suitable prior distributions .
The purpose of this paper is to analyze Bayesian inference of spatial regression models with covariate measured with Skew-normal error by way of Monte Carlo simulation.
2. Materials and Methods
The spatial linear model with measurement error
A spatial regression model defined as follows:
Let represents the error prone true covariate for spatial unit i, i=1, .....,n, and is related to the response corresponding to a linear model:
where and is a covariance matrix with a spatial structure. Suppose the observed error prone covariate for spatial unit i related to the true covariate according to a classical measurement error model:
where . When is also a normally distributed (say with mean and covariance , then and have a multivariate normal distribution,
where 1 is an vector of 1's. The is normally distributed with conditional mean
and conditional variance
These results indicate that the regression coefficients obtained by regressing the response on the observed, but measured with error, covariate are biased. The same holds for the conditional variance .
Bayesian analysis of measurement error
The joint density of all relevant variables of measurement error model (1) can be factored as
where is the vector of the model parameters. The first term on the right hand side of (5) known as the outcome model, represents the relationship between the response y and the true covariate x. The vector, is the regression parameters in the outcome model. The second term is the measurement error model, and the third term is the covariate (exposure) model.
In the presence of measurement error, we observe instead of , hence
is required to form the likelihood. In some cases, this integral does not have a closed form. However, the Bayes MCMC approach can be applied with (5) and works with the integral in (6) only implicitly .
Furthermore the equation (5) can be written as
where is the prior distribution of the model parameters. The joint posterior density for the unknown and conditional on the observed response data and surrogate covariate values is given by
Given the joint posterior distribution, it is straightforward to derive the full posterior conditional for each unobserved quantity given the observed quantities and the remaining unobserved quantities. The Bayesian inference can then be carried out based on the posterior conditionals by applying appropriate MCMC algorithms .
Skew-normal covariate model
In this paper we extend the above measurement error model (2) by considering that the covariate follow a Skew-normal distribution. The univariate Skew-normal distribution with location parameter μ, scale parameter and skewness parameter γ is defined as:
where and denote the probability density function and cumulative distribution function of the normal distribution, respectively. The distribution is denoted as . A random variable following a standard Skew-normal distribution with μ=0 and , which is denoted as SN(γ) .
The Skew-normal distribution has the following properties,
We consider the spatial regression model as follows,
with Y the response; α the intercept, X the single true covariates with coefficients β, and ε the error term. The unobserved true covariate X was generated spatially autocorrelated by means of spatial weight matrix W, i.e., X = λWX + , where the weight is 1 if areas i and j are neighbors and 0 otherwise, λ the spatial dependence parameter .
We assume that
where Q is the observed covariates related to the true covariates X according to a classical measurement error model with . We assume with , and .
We take the data to be on a regular grid with the grid size to be 7 and representing small, medium and large sample sizes. The weights matrix W is row normalized. We allow three different values for λ, namely 0.3, 0.6, and 0.9 for a weak, medium, and strong spatial dependence . The observed error-prone covariate Q is generated by adding Gaussian noise with variance and 0.7 to X. Outcome data, Y are then generated with slope and intercept parameters set at . We further take ε with .
For each sample size (T), λ and , we generate 100 Monte Carlo simulation datasets. For each generated dataset, the Spatial Regression Models are estimated under the assumption of
• Naive models without measurement error correction
• Normal distribution for the error-prone covariate and random errors,
• Skew-normal distribution for the error-prone covariate and Normal distribution for random errors,
The following independent priors were considered to perform the Gibbs sampler, . For these prior densities, we generated three parallel independent runs of the Gibbs sampler chain of size 25 000 for each parameter. We disregarded the first 5 000 iterations to eliminate the effect of the initial value. We assessed chain convergence using the Brooks-Gelman-Rubin scale reduction factor . The approximately 1 indicates convergence . We estimate the models using the R2jags package available in R .
For each simulation, we compute the relative bias (RelBias) and the Root Mean Square Error (RMSE) for each parameter estimate over 100 samples. These statistics are defined as
where is the estimate of β for the jth sample and k=100.
We also compare the models based on the expected Akaike information criterion (EAIC) and the expected Bayesian information criterion (EBIC). The EAIC and EBIC can be estimated using MCMC output as follows
where is the posterior mean of the deviance, p the number of parameters in the model, T the total number of observations .
3. Results and Discussion
Tables 1, 2 and 3 show that for the Spatial regression model and Skew-normal data, the average RelBias (in absolute value) and the average RMSE for all T, , three measurement error variance and the coefficient of the normal prior (N-N) are quiet similar to the Skew normal prior (SN-N). However, for the Naive model are larger than for the normal (N-N) and Skew normal prior (SN-N).
We observed that the naïve estimate of the regression coefficient is attenuated toward zero. Additionally, the values of RelBias and RMSE of the coefficient for the three estimators increase with the measurement error variance , but decrease with the spatial dependence parameter . According to  that the stronger dependence implies that neighbor areas can provide more information, and hence the estimates are more resistant to the effect of measurement error.
Note also that the RelBias and RMSE of in the case of the normal and Skew-normal prior with the measurement error variance are smaller than . Moreover, for the measurement error variance the RelBias of with the spatial dependence parameter are larger than , but for the RMSE the opposite holds.
Tables 4, 5 and 6 show the overall fit statistics for the Spatial measurement error model. Compare to the normal model, the EAIC and EBIC all tend to favor the Skew-normal model for all sample sizes (T), the three dependence parameter , and the three measurement error variance . Note that the Naive model have the smallest EAIC and EBIC values, but this model does not account for the measurement error. Therefore, the above results show that the Skew-normal prior outperform the normal, symmetrical prior and the Naive model without measurement error correction.
4. Concluding Remarks
This paper analyzed by way of Monte Carlo simulation Bayesian inference of Spatial Regression models with a Skew-normally spatially lagged covariate measured with errors. The simulation examines the performance of Bayesian estimators in the case of (i) Naive models without measurement error correction; (ii) Normal distribution for the error-prone covariate and random errors; (iii) Skew-normal distribution (SN) for the error-prone covariate and normal distribution for random errors.
The simulation results show that the Skew-normal prior estimator outperforms the normal, symmetrical prior and the Naive models without measurement error correction.