KnE Engineering | International Conference on Basic Sciences and Its Applications (ICBSA-2018) | pages: 204–214

and

1. Introduction

The spatial data are typically collected from points or regions located in space and thus tend to be spatially dependent. Ignoring the violation of spatial independence between observations will produce estimates that are biased, inconsistent or inefficient. A large variety of spatial models to take spatial dependence among observations into account have been developed [1-3].

Measurement errors in the spatially lagged explanatory variables is are not routinely accounted for, in spite of the fact that their consequences are serious. The estimator of the coefficients spatially lagged exogenous variables are attenuated, while the estimator of the variance components are inflated, if covariate measurement error is ignored [4]. However, the amount of attenuation depends on the degree of spatial correlation in both the true covariates and the random error term of the regression model [5].

Several approaches to correct for measurement error in spatially lagged exogenous regressors have been proposed in literature. The Maximum Likelihood (ML) based on an Expectation-Maximization EM algorithm correct the biases in the estimators of the naive estimator, i.e. the estimator that ignore the measurement error, but are associated with larger variances [4]. Another approaches adjusting the estimates by means of an estimated attenuation factor obtained by the method of moments, or using an appropriate transformation of the error prone covariate [5]. Additionally, a semiparametric approach i.e. penalized least squares to obtain a bias-corrected estimator of the parameters could be as an alternative [6].

The error-prone covariates and the random errors are usually assumed to be symmetrically, normally distribution [4-6]. However, the assumption of normality may be too restrictive in many applications [7,8]. The linear models with Skew-normal measurement error models perform better when there is evidence of departure from symmetry or normality [7]. Furthermore, the Skew-normal linear mixed measurement error outperform the normal mixed measurement error model when the actual covariate distribution has a Skew-normal [8].

Among several approaches to correct for measurement error, Bayesian methods provide the most flexible framework. The advantage of Bayesian approaches is that prior knowledge, and in particular prior uncertainty of error variance can be incorporated in the model. While frequentist approaches require fixing the regression coefficients and the variance components parameters to guarantee identifiability, the Bayesian setting allows to represent uncertainty with suitable prior distributions [9].

The purpose of this paper is to analyze Bayesian inference of spatial regression models with covariate measured with Skew-normal error by way of Monte Carlo simulation.

2. Materials and Methods

The spatial linear model with measurement error

A spatial regression model defined as follows:

Let xi represents the error prone true covariate for spatial unit i, i=1, .....,n, and is related to the response yi corresponding to a linear model:

yi=β0+βxxi+εi

where ε=(ε1,.,εn)TN(0,Σε) and Σε is a covariance matrix with a spatial structure. Suppose qi the observed error prone covariate for spatial unit i related to the true covariate xi according to a classical measurement error model:

qi=xi+ui

where u=(u1,.,un)TN(0,Σu) . When x is also a normally distributed (say with mean μx and covariance Σx) , then y=(y1,.,yn)T and q=(q1,.,qn)T have a multivariate normal distribution,

yqMVNβ0+βxμx1μx1,Σε+βx2ΣXβxΣXβxΣXΣX+ΣU

where 1 is an n×1 vector of 1's. The yyqq is normally distributed with conditional mean

Eyq=β01+βxIΛμx+βxΛq

and conditional variance

Var yq=Σε+βx2(IΛ)ΣX

where

Λ=ΣXΣX+ΣU1

These results indicate that the regression coefficients obtained by regressing the response y on the observed, but measured with error, covariate q are biased. The same holds for the conditional variance [5].

Bayesian analysis of measurement error

The joint density of all relevant variables of measurement error model (1) can be factored as

fy,x,qθR,θM,θE=fyx,θRfqx,y,θMf(xθE)

where θ=(θR,θM,θE) is the vector of the model parameters. The first term on the right hand side of (5) known as the outcome model, represents the relationship between the response y and the true covariate x. The vector, θR is the regression parameters in the outcome model. The second term is the measurement error model, and the third term is the covariate (exposure) model.

In the presence of measurement error, we observe (y,q) instead of (y,x) , hence

fy,qθR,θM,θE=fy,x,qθR,θM,θEdx

is required to form the likelihood. In some cases, this integral does not have a closed form. However, the Bayes MCMC approach can be applied with (5) and works with the integral in (6) only implicitly [10].

Posterior distribution

Furthermore the equation (5) can be written as

f(y,x,q,θ)=i=1nfyixi,θRfqxi,θMfxiθE×π(θR,θM,θE)

where π(θR,θM,θE) is the prior distribution of the model parameters. The joint posterior density for the unknown θ and x conditional on the observed response data and surrogate covariate values (y,q) is given by

fx,θy,qi=1nfyixi,θRfqxi,θMfxiθE×π(θR,θM,θE)

Given the joint posterior distribution, it is straightforward to derive the full posterior conditional for each unobserved quantity given the observed quantities and the remaining unobserved quantities. The Bayesian inference can then be carried out based on the posterior conditionals by applying appropriate MCMC algorithms [10].

Skew-normal covariate model

In this paper we extend the above measurement error model (2) by considering that the covariate follow a Skew-normal distribution. The univariate Skew-normal distribution with location parameter μ, scale parameter σ2 and skewness parameter γ is defined as:

fx;μ,σ2,γ=2ϕxμσΦγxμσ,x,μ,γϵR,σ>0

where ϕ(.) and Φ. denote the probability density function and cumulative distribution function of the normal distribution, respectively. The distribution is denoted as SN(μ,σ2,γ) . A random variable Z=x-μσ following a standard Skew-normal distribution with μ=0 and σ2=1 , which is denoted as SN(γ) [11].

The Skew-normal distribution has the following properties,

  • EX=μ+2πγ1+γ2,

  • VarX=1-2γπ(1+γ2)σ2,

  • υ=12(4-π)E2(X)Var(X)32 and κ=2(π-3)E2(X)Var(X)2 where υ and κ are asymmetry and kurtosis indexes, respectively.

  • If γ=0 then XN(μ,σ2),

  • If ZSN(γ) then Zdγ1+γ2Z0+11+γ2Z1

where Z0 and Z1 are iidN(0,1) random variables and d means “distributed as” [7,8].

Simulation

We consider the spatial regression model as follows,

Y=α+Xβ+ε

with Y the response; α the intercept, X the single true covariates with coefficients β, and ε the error term. The unobserved true covariate X was generated spatially autocorrelated by means of spatial weight matrix W, i.e., X = λWX + ϵ , where the weight wij is 1 if areas i and j are neighbors and 0 otherwise, λ the spatial dependence parameter [12].

We assume that

Q=X+U

where Q is the observed covariates related to the true covariates X according to a classical measurement error model with UN0,σU2 . We assume XSNμx,σx2,γx with μx=0,σx2=1 , and γx=3 .

We take the data to be on a regular grid with the grid size to be 7 (n=7x7),10(n=10x10) and 20(n=20x20) representing small, medium and large sample sizes. The weights matrix W is row normalized. We allow three different values for λ, namely 0.3, 0.6, and 0.9 for a weak, medium, and strong spatial dependence [13]. The observed error-prone covariate Q is generated by adding Gaussian noise with variance σU2=0.1,0.3 and 0.7 to X. Outcome data, Y are then generated with slope and intercept parameters set at α,βT=1,2T . We further take ε N0,σε2 with σε2=1 .

For each sample size (T), λ and σU2 , we generate 100 Monte Carlo simulation datasets. For each generated dataset, the Spatial Regression Models are estimated under the assumption of

  • Naive models without measurement error correction

  • Normal distribution for the error-prone covariate XNμx,σx2 and random errors, εN0,σε2.

  • Skew-normal distribution for the error-prone covariate XSNμx,σx2,γx and Normal distribution for random errors, εN0,σε2.

The following independent priors were considered to perform the Gibbs sampler, α,βN0,100,σε2IG0.01,0.01,σU2IG0.01,0.01,μxN0,1000,σx2IG0.01,0.01 . For these prior densities, we generated three parallel independent runs of the Gibbs sampler chain of size 25 000 for each parameter. We disregarded the first 5 000 iterations to eliminate the effect of the initial value. We assessed chain convergence using the Brooks-Gelman-Rubin scale reduction factor (R^) . The R^ approximately 1 indicates convergence [14]. We estimate the models using the R2jags package available in R [15].

For each simulation, we compute the relative bias (RelBias) and the Root Mean Square Error (RMSE) for each parameter estimate over 100 samples. These statistics are defined as

RelBiasβ=1kj=1kβj^β1,𝑅𝑀𝑆𝐸β=1kj=1k(βj^β)2

where βj^ is the estimate of β for the jth sample and k=100.

We also compare the models based on the expected Akaike information criterion (EAIC) and the expected Bayesian information criterion (EBIC). The EAIC and EBIC can be estimated using MCMC output as follows

EAIC^=𝒟¯+2p,EBIC^=𝒟¯+plogT

where 𝒟¯ is the posterior mean of the deviance, p the number of parameters in the model, T the total number of observations [16].

Table 1

RelBias and RMSE of the Naïve, Normal (N-N), and Skew Normal (SN-N) prior for the Spatial regression model with measurement error variance 0.1.


Prior
T λX Naive N-N SN-N
RelBias RMSE RelBias RMSE RelBias RMSE
49 0.3 -0.185 0.4411 -0.0088 0.2317 -0.0087 0.2323
0.6 -0.1495 0.3542 -0.0049 0.1626 -0.005 0.1628
0.9 -0.0488 0.169 0.0149 0.1236 0.0149 0.1224
100 0.3 -0.1957 0.4249 -0.0136 0.1509 -0.0133 0.1508
0.6 -0.1351 0.3051 0.0036 0.1334 0.0034 0.1335
0.9 -0.0637 0.1669 -0.0059 0.0891 -0.0058 0.0895
400 0.3 -0.1804 0.3699 -0.0032 0.073 -0.0034 0.0731
0.6 -0.1403 0.2885 0.0001 0.053 0.0002 0.0528
0.9 -0.0531 0.1154 0.0011 0.0365 0.001 0.0364
Average -0.1280 0.2928 -0.0019 0.1171 -0.0019 0.1171
Table 2

RelBias and RMSE of the Naïve, Normal (N-N), and Skew Normal (SN-N) prior for the Spatial regression model with measurement error variance 0.3.


Prior
T λX Naive N-N SN-N
RelBias RMSE RelBias RMSE RelBias RMSE
49 0.3 -0.3888 0.8043 0.0215 0.181 0.022 0.1811
0.6 -0.33 0.6997 -0.0012 0.186 -0.0011 0.1859
0.9 -0.1743 0.4071 0.0041 0.1517 0.0049 0.1507
100 0.3 -0.4028 0.8212 0.0038 0.1605 0.0038 0.161
0.6 -0.3301 0.6773 -0.0127 0.1312 -0.0127 0.1311
0.9 -0.1483 0.326 0.003 0.0813 0.003 0.0803
400 0.3 -0.3876 0.7786 0.0056 0.0759 0.0056 0.0757
0.6 -0.3316 0.6669 -0.0001 0.0591 0 0.0591
0.9 -0.1468 0.3026 -0.0004 0.0395 -0.0005 0.0394
Average -0.2934 0.6093 0.0026 0.1185 0.0028 0.1183
Table 3

RelBias and RMSE of the Naïve, Normal (N-N), and Skew Normal (SN-N) prior for the Spatial regression model with measurement error variance 0.7.


Prior
T λX Naive N-N SN-N
RelBias RMSE RelBias RMSE RelBias RMSE
49 0.3 -0.6074 1.2298 -0.01 0.2305 -0.0102 0.23
0.6 -0.5369 1.0998 0.0023 0.1809 0.0029 0.1811
0.9 -0.3363 0.7162 0.0003 0.1236 0.0003 0.1245
100 0.3 -0.5966 1.1996 -0.0026 0.1495 -0.0026 0.1495
0.6 -0.5396 1.0879 0.0034 0.1254 0.0035 0.1256
0.9 -0.2982 0.6205 -0.0035 0.0782 -0.0034 0.0786
400 0.3 -0.6048 1.2117 0 0.0743 0 0.0745
0.6 -0.5409 1.0845 0.0006 0.0566 0.0007 0.0562
0.9 -0.2874 0.5814 -0.0027 0.0375 -0.003 0.0375
Average -0.4831 0.9813 -0.0014 0.1174 -0.0013 0.1175
Table 4

EAIC and EBIC of the Naïve, Normal (N-N), and Skew Normal (SN-N) prior for the Spatial regression model with measurement error variance 0.1.


Prior
T λX Parameter Naive N-N SN-N
49 0.3 EAIC 159.2988 274.4879 208.6433
EBIC 158.0596 272.629 206.4747
0.6 EAIC 161.8768 289.3854 224.1755
EBIC 160.6376 287.5266 222.0068
0.9 EAIC 163.3675 341.1097 267.4752
EBIC 162.1283 339.2509 265.3066
100 0.3 EAIC 320.3707 551.9545 418.4622
EBIC 320.3707 551.9545 418.4622
0.6 EAIC 322.7067 584.9334 473.0221
EBIC 322.7067 584.9334 473.0221
0.9 EAIC 324.1289 681.606 584.0127
EBIC 324.1289 681.606 584.0127
400 0.3 EAIC 1253.649 2187.9461 1614.2875
EBIC 1256.0572 2191.5584 1618.5019
0.6 EAIC 1260.2144 2304.397 1828.0391
EBIC 1262.6226 2308.0094 1832.2536
0.9 EAIC 1272.5743 2737.4345 2469.2567
EBIC 1274.9826 2741.0468 2473.4711
Table 5

EAIC and EBIC of the Naïve, Normal (N-N), and Skew Normal (SN-N) prior for the Spatial regression model with measurement error variance 0.3..


Prior
T λX Parameter Naive N-N SN-N
49 0.3 EAIC 174.4723 329.846 260.8013
EBIC 173.2331 327.9872 258.6326
0.6 EAIC 176.6635 346.5873 286.8002
EBIC 175.4243 344.7285 284.6315
0.9 EAIC 180.0722 385.6472 316.7996
EBIC 178.833 383.7883 314.631
100 0.3 EAIC 346.6795 665.5882 533.479
EBIC 346.6795 665.5882 533.479
0.6 EAIC 350.8364 698.126 593.5298
EBIC 350.8364 698.126 593.5298
0.9 EAIC 358.7151 792.5358 685.0883
EBIC 358.7151 792.5358 685.0883
400 0.3 EAIC 1367.4139 2631.9434 2079.6499
EBIC 1369.8221 2635.5557 2083.8643
0.6 EAIC 1377.4296 2729.3906 2300.703
EBIC 1379.8378 2733.003 2304.9174
0.9 EAIC 1426.5864 3165.1932 2892.4515
EBIC 1428.9947 3168.8056 2896.6659
Table 6

EAIC and EBIC of the Naïve, Normal (N-N), and Skew Normal (SN-N) prior for the Spatial regression model with measurement error variance 0.7.


Prior
T λX Parameter Naive N-N SN-N
49 0.3 EAIC 183.9067 376.9027 306.301
EBIC 182.6675 375.0439 304.1324
0.6 EAIC 186.2684 383.4209 319.1254
EBIC 185.0291 381.562 316.9568
0.9 EAIC 199.8853 432.0987 365.3364
EBIC 198.6461 430.2398 363.1678
100 0.3 EAIC 364.5025 748.9659 618.5672
EBIC 364.5025 748.9659 618.5672
0.6 EAIC 377.1581 779.0555 655.9786
EBIC 377.1581 779.0555 655.9786
EAIC 398.2557 878.7195 779.1036
0.9 EBIC 398.2557 878.7195 779.1036
400 0.3 EAIC 1442.9183 2972.8864 2411.4332
EBIC 1445.3266 2976.4987 2415.6477
0.6 EAIC 1477.3687 3074.5225 2589.6749
EBIC 1479.7769 3078.1348 2593.8893
0.9 EAIC 1582.8831 3508.7772 3244.6129
EBIC 1585.2913 3512.3896 3248.8273

3. Results and Discussion

Tables 1, 2 and 3 show that for the Spatial regression model and Skew-normal data, the average RelBias (in absolute value) and the average RMSE for all T, λX , three measurement error variance and the coefficient βx of the normal prior (N-N) are quiet similar to the Skew normal prior (SN-N). However, for the Naive model are larger than for the normal (N-N) and Skew normal prior (SN-N).

We observed that the naïve estimate of the regression coefficient βx is attenuated toward zero. Additionally, the values of RelBias and RMSE of the coefficient βx for the three estimators increase with the measurement error variance σU2 , but decrease with the spatial dependence parameter λX . According to [4] that the stronger dependence implies that neighbor areas can provide more information, and hence the estimates are more resistant to the effect of measurement error.

Note also that the RelBias and RMSE of βx in the case of the normal and Skew-normal prior with the measurement error variance σU2=0.7 are smaller than σU2=0.3 . Moreover, for the measurement error variance σU2=0.1 the RelBias of βx with the spatial dependence parameter λX=0.9 are larger than λX=0.6 , but for the RMSE the opposite holds.

Tables 4, 5 and 6 show the overall fit statistics for the Spatial measurement error model. Compare to the normal model, the EAIC and EBIC all tend to favor the Skew-normal model for all sample sizes (T), the three dependence parameter λX , and the three measurement error variance σU2 . Note that the Naive model have the smallest EAIC and EBIC values, but this model does not account for the measurement error. Therefore, the above results show that the Skew-normal prior outperform the normal, symmetrical prior and the Naive model without measurement error correction.

4. Concluding Remarks

This paper analyzed by way of Monte Carlo simulation Bayesian inference of Spatial Regression models with a Skew-normally spatially lagged covariate measured with errors. The simulation examines the performance of Bayesian estimators in the case of (i) Naive models without measurement error correction; (ii) Normal distribution for the error-prone covariate and random errors; (iii) Skew-normal distribution (SN) for the error-prone covariate and normal distribution for random errors.

The simulation results show that the Skew-normal prior estimator outperforms the normal, symmetrical prior and the Naive models without measurement error correction.

References

1 

LeSage, J. P. (1999). The Theory and Practice of Spatial Econometrics. Department of Economics. University of Toledo.

2 

Anselin, L. (2007). Spatial Econometrics, in A Companion to Theoretical Econometrics. Badi H. Baltagi, Ed., pp. 310-330, John Wiley & Sons. New York.

3 

Waller, L. A, Gotway C. A. (2004). Applied Spatial Statistics for Public Health Data, Vol. 368. John Wiley & Sons: Hoboken, New Jersey, U.S.A.

4 

Li Y. et al. (2009). Spatial linear mixed models with covariate measurement errors, Stat. Sinica 19(3), 1077-1093.

5 

Huque M. H. et al. (2014). On the impact of covariate measurement error on spatial regression modelling, Environmetrics. 25, 560-570. [doi: 10.1002/env.2305].

6 

Huque M. H. et al. (2016). Spatial regression with covariate measurement error: A semiparametric approach. Biometrics. 72(3), 678-86. [doi: 10.1111/biom.12474].

7 

Arellano-Valle R. B., et al. (2005). Skew-normal measurement error models. J. Multivariate Anal., 96, 265-281. [doi: 10.1016/j.jmva.2004.11.002].

8 

Kheradmandi A. et al. (2015). Estimation in skew-normal linear mixed measurement error models. J. Multivariate Anal. 136, 1-11. [doi: 10.1016/j.jmva.2014.12.007].

9 

Muff S. et al. (2015). Bayesian analysis of measurement error models using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. C. Appl. Stat. 64(2), 231-252.

10 

Hossain S. et al. (2009). Bayesian adjustment for covariate measurement errors: A flexible parametric approach, Statist. Med. 28, 1580–1600. [doi: 10.1002/sim.3552].

11 

Azzalini A. (1985). A class of distributions which includes the normal ones Scand. J. Stat. 12(2), 17-18.

12 

Plant, R.E. (2012). Spatial Data Analysis in Ecology and Agriculture Using R. CRC Press. New York.

13 

LeSage, J. P. (2014). Spatial econometric panel data model specification: A Bayesian approach, Spat. Statist. 9, 122-145. [http://dx.doi.org/10.1016/j.spasta.2014.02.002].

14 

Gelman A., Carlin J. B., Stern H. S., Dunson D. B., Vehtari A., and Rubin, D.B. (2014). Bayesian Data Analysis, Chapman & Hall/CRC, New York, NY.

15 

Su Y S. et al. (2015). R2jags: A package for running jags from R, R package version 0.5-7.

16 

Spiegelhalter D. J. et al. (2014). The deviance information criterion: 12 years on, J. R. Stat. Soc. Ser. B. Stat. Methodol. 76, 485-493.

FULL TEXT

Statistics

  • Downloads 3
  • Views 26

Navigation

Refbacks



ISSN: 2518-6841