Monte-carlo-simulation In Excel (simulation Der Losgre

Monte-carlo-simulation In Excel (simulation Der Losgre
Methods We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I. pop; and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. Results For simulated data with outlier patterns, Tango's MEET, Moran's I and I. pophad powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I.
1. Monte Carlo Simulation Excel Help
View James Colwell’s profile on LinkedIn, the world's largest professional community. James has 7 jobs listed on their profile. See the complete profile on LinkedIn and discover James’ connections and jobs at similar companies. History of the Deer Lodge Valley to 1870, Virginia Lee Speck. History of the development and operations of Missoula Snow Bowl, Inc., 1961-1971, Jens Gran. History of the Milk River valley, Carl Martin Gunderson. History of the School of Music, Montana State University (1895-1952), John Roswell Cowan. Monte Carlo Simulation is a process of using probability curves to determine the likelihood of an outcome. You may scratch your head here and say “Hey Rick, a distribution curve has an array of values.
pop(with 50% of total population as the maximum search window) had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data.
SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. Conclusion SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article.
Tango's MEET and Oden's I. popperform best in global clustering scenarios among the selected methods. The use of SaTScan for data with global clustering patterns should be used with caution since SatScan may reveal an incorrect spatial pattern even though it has enough power to reject a null hypothesis of homogeneous relative risk. Tango's method should be used for global clustering evaluation instead of SaTScan. In medical research and epidemiological studies, it is important to understand the spatial heterogeneity (clustering) of disease cases across the study regions. If global or local clustering patterns among the responses (e.g. Cancer cases) exist, it is essential to consider the spatial correlation of the individuals in a statistical model to evaluate the association between the response (e.g., cancer death and incidence) and the risk factors.
Some statistical methods that are used in spatial analysis involve aggregated summaries (count data) and Poisson regression models that assume independence of neighboring locations without overdispersion. These assumptions are violated if spatial dependence exists such as in Kulldorff et al. which examined unusually high breast cancer rates in the northeastern part of the United States and research done by Pickle et al. 2001 to detect a local cluster of an extreme lung cancer mortality rate in a single county in Montana caused by arsenic exposure.
It is important to identify models with possible spatial correlation when performing statistical analysis. For example, a risk factor (such as the distance a region is from a contaminated water source when measuring incidence of cholera) may turn out to be significant in a model assuming independent responses. However, this distance-oriented risk factor may not be identified when spatial correlation in residuals is considered in the model.
Many statistical methods for testing spatial clustering and heterogeneity were developed and can be classified into two groups: methods for global clustering evaluation and for local cluster detection ,. Global clustering measures assess spatial trends (the tendency of spatial clustering) across an entire study region. Cluster detection methods identify specific local clusters.
Both clustering patterns (global and local) occur in many cancer incidence and mortality data sets. Simulation studies have been conducted to evaluate methods for global clustering evaluation and local cluster detection , , but none were designed to study and compare the performance of the methods on data with two extreme situations, such as global clustering and local outlier patterns. The purpose of this paper is to investigate the performance of the methods for testing spatial heterogeneity on data with the two extreme situations. We simulate disease data with homogenous populations to allow for a stronger power study. We provide guidance for the proper use of the statistical methods selected in the two situations of global clustering and cluster detection.
Among methods for global clustering evaluation, we selected Tango's MEET , because it has been shown to be the most powerful test for testing spatial heterogeneity , ; Moran's I is selected as a reference for comparison since it has been used widely in these areas; as an alternative version of Moran's I adjusting for heterogeneous population density, Oden's I. pop is selected because it has not been compared with Tango's MEET in earlier works. Among cluster detection methods, SaTScan elliptic version is selected because it has proven to be very powerful in detecting local clusters (with either regular shapes or irregular shapes) with good precision, and reasonable computation time ,.
We also selected a local version of Moran's I among the cluster detection methods because of the ability of the statistic to perform well on outlier detection even though it is not a good method for detecting large clusters ,. The rest of the paper is organized as follows. First, we briefly present selected global indices of spatial autocorrelation, local indices of spatial association (LISA) and SaTScan elliptic version. Second, we discuss the steps and parameters utilized in the simulation study.
Next, the results of the simulation study are included, and the application of the selected methods on HIV and lung cancer mortality data in the United States are given. The paper ends with a discussion. 2.1 Global indices of spatial autocorrelation Global indices of spatial autocorrelation as defined by Waller and Gotway provide a summary over the entire study area of the level of spatial similarity observed among neighboring observations.
In this section we briefly describe the three common global indices of spatial autocorrelation we intend to evaluate. In the following sections, i and j denote geographic units (e.g. Counties), y iis the number of cases at geographic unit i, n iis the population at risk at geographic unit i, N is the total number of geographic units, w ijis a weight assigned to the pair of geographic units i and j. Tango's Index. (5) Where, v i= n i/ n +, v j= n j/ n +, e i= y i/ y +, e j= y j/ y+,. Oden notes that symmetry is not required for I. popand w ii≠ 0 (but can be fixed at any specified value).
For this study we refer to I. popas I. popADJ with the adjacent neighbor weight function and I. popPD with the population density weight function. Note that I. popPD should be more sensitive to the global clustering patterns if a larger λ is assigned, and more sensitive to local clusters with a smaller λ value.
Cluster detection methods While global indices of spatial association evaluate the tendency of global spatial clustering across an entire region, local indices of spatial association (LISA) detect patterns in geographic units that deviate extremely from neighboring units (local outliers). SaTScan is another tool that has been widely used for local cluster detection, which is good for detecting large clusters. It may also evaluate outliers (local clusters with a small geographic or population size) when the outlier pattern is very strong or a small maximum search window is used. Local Moran's I.
Poisson model based SaTScan circular version and SaTScan elliptic version has been widely used for cluster detection on aggregated count data. For each centroid (each geographic unit is a centroid), we construct many areas (circles or ellipses) with varying sizes (and angles) sharing the same centroid. We only evaluate SaTScan elliptic version in this paper because the elliptic version has better power and precision compared with circular version for most of situations.
Defining the shape of ellipses by the ratio of the long radius to the short radius, we use the combination of shapes 1.5, 2, 3, and 4 in our analysis. We rotate each ellipse of fixed shape in the circle (the radius of the circle is the length of the long axis of the ellipse) sharing the same centroid, each rotation moves the ellipse to a new angle. The number of angles for those shapes 1.5, 2, 3 and 4 is selected to be 4, 6, 9, and 12, respectively. For a given zone Z, the likelihood is then defined as. Which is independent of the Z.
Where do i find microsoft office for mac. The Z that maximizes the λ over all the Z's in G is the most likely cluster. When searching for large clusters, we use 50% of total population as the maximum size of Z, and for local small cluster or outlier detection, we use 5% of the total population as the maximum size of Z in the simulation. Simulating geographic data We simulate count data with fixed total number of cases (or sample size) for i = 1.,3109 representing data from the 3109 counties of the continental United States (US) (multinomial data.) All distance measures are calculated using the latitude and longitude coordinates of the county centroids. We use the actual configuration of U.S. Counties but not the real U.S.
Populations for the counties, because the U.S. Has a complex configuration at the county level with a varying number of neighbors for each county and it can provide a real and complicated structure of the adjacent neighbors and nearest neighbors. Waller et al. discussed the issue with power analysis of the tests of clusters and clustering in heterogeneous populations. They found that power depends on the local population at risk. Here, in order to evaluate the performance of the methods on data with varying relative risk patterns without confounding from heterogeneous population, we simulate data assuming homogeneous county population in the US. We set n i(the population at risk) equal to 5,000 for all simulations, thus n + = 15,545,000.
Note that the performance of the tests will be poorer when data are generated from a heterogeneous population, because more spatial variation will be introduced through the population variation. We experimented with simulations that had 5,000-50,000 fixed cases (results not shown) to obtain a relevant range of power. In order to simulate data with global clustering pattern, the relative risks are defined to have a directional west-east trend with either a linear or exponential increasing function (this trend can also reflect the pattern in data with a north-south trend since a simple rotation is all that is necessary.) With the minimum relative risk (on the west) as 1, we experimented with maximum relative risk (on the east) with values ranging from 1.5-10 (results not shown). However, we were unable to detect differences in the performance of the selected statistics when the relative risk was above 2 (all methods had power close to 1). Therefore we used a maximum relative risk of 1.5 in our final study. The monotone increasing function is defined as.
For the relative risks with an exponential trend, where a is an integer from 0-99. The longitudinal coordinates of the continental United States ranges from -124.161 to -67.623. We divide this large interval into 100 sub-intervals of equal length where each interval corresponds to a value of i used in equations and. Both equations allow for a relative risk of 1.5 between the lowest and highest longitudinal coordinates.
As shown in Figure, the relative risk increases from west to east monotonically, while the relative risk with the exponential trend (figure ) increases faster than that with the linear trend (figure ). In order to simulate data with small local clusters (outlier), we set all values of r iequal to 1, except for one geographic unit which has a relative risk of 4. We only consider a single outlier in the simulated data. The number of adjacent neighbors for all counties in the continental U.S.
Ranges from 0-14. In order to evaluate if the power varies by the number of adjacent neighbors around one outlier, we select three counties with different number of adjacent neighbors as the outliers with relative risk of 4, which are Manassas City, Virginia (with 2 adjacent neighbors), Jackson County, Kansas (with 6 adjacent neighbors), and Fulton County, Georgia (with 10 adjacent neighbors).
Figure includes the maps of the assigned relative risks for the three patterns. Step 1: Simulate 10,000 data sets using the multinomial distribution (equation 6) under the null hypothesis (no outlier or global clustering pattern). Step 2: Calculate TangoPDM, TangoADJ, Moran's I, I. popPD, I. popADJ, and SaTScan elliptic version (SaTScan-E) statistics for each data set from step 1. Calculate Local Moran's I for each county i in each data from step 1. Step 3: Find the 95th percentile for TangoADJ, Moran's I, I.
popPD, I. popADJ, and SaTScan-E. The 95 th percentile is the critical point for the empirical distribution of each of the statistics. Find the 5th percentile for TangoPDM, which serves as the critical point for Tango's I with the PDM weight function. Find the 95th percentile for the local Moran's I at each county i.
There are then 3109 critical points for Local Moran's I (one for each county.). Step 4: Simulate 1000 alternative data sets using the multinomial distribution (equation 7) under each of the alternative hypotheses described in the previous section.
The five settings mentioned above are. For the cluster detection methods SaTScan-E and local Moran's I, we also report the chance of a county being detected inside clusters, which is defined to be the number of times that a county is counted inside a detected cluster or as an outlier out of the 1000 replicates. If that ratio for a county is close to 1, then there is a large chance that a particular county is inside a cluster or an outlier as determined by this method. If the ratios for the counties inside the true cluster or a true outlier are all high, we claim that the method has a high chance to find the correct cluster location, which indicates good precision of cluster detection. Table includes a summary of all the statistics and weight functions we use for this study.
Since the adjacent neighbor weight function is one of the most common weight functions used in spatial data analysis we included it in our study. For comparison we also used a population density weight function which utilized the population when determining weights. Methods for global clustering evaluation cannot detect local cluster locations; therefore we do not evaluate the chance of a county being detected inside a cluster as defined in step 6c for those methods. Note that we evaluate the performance of Moran's I, Tango's statistics, Oden's I.
Monte Carlo Simulation Excel Help
popand SaTScan-E for data with global clustering pattern and outlier pattern. We report statistical power for all methods except local Moran's I. Local Moran's I is only used for local outlier identification and we only report chance of geographic unit being detected as an outlier in the outlier detection for Local Moran's I since a power calculation is not possible due to fact that Local Moran's I provides a statistic for each geographic unit.
For SatScan-E we report both power and the chance of geographic units being detected as inside clusters. The results of the chance of being inside clusters are presented in maps and the powers are provided in a table.
Among all the selected methods for global clustering evaluation, TangoPDM has the highest power to reject the null hypothesis of homogeneous relative risk over the entire study region and claim that the simulated data have global clustering pattern (results presented in Table.) The I. popstatistics with varying weights also has good power (0.8) for the population density weight function.
Specifically, when there is a large λ (50% of total population) in the population density weight function, the statistics have good power because they are more sensitive to large clustering pattern according to the formulation. All methods using an adjacent neighbor weight functions (TangoADJ, Moran's I, I. popADJ) that only evaluate limited adjacent neighbor regions have low power (around 0.1-0.2) for detecting the global clustering patterns. A for global clustering evaluation, we used 50% of total population as the maximum search window. With the outlier detection, we use 5% of the total population as the maximum search window in SaTScan-E.
+ Power calculations are not available for Local Moran's I due to the fact that a statistics is produced for each geographic unit in the study. It is difficult to define power in the presence of multiple comparisons. Designed for cluster detection, SaTScan-E with maximum spatial window as 50% of total population has moderate power (around 0.75 (See Table )) in identifying the data with spatial heterogeneity successfully. The type of increasing function in the global trend (exponential or linear) was not a key factor in power for all the statistics for the data with the same maximum relative risk. Power is slightly higher for data with exponential function compared with linear function because the relative risk in data with exponential function increases faster than that in data with linear function, even though the maximum relative risk in both types are the same (maximum relative risk = 1.5). We also note that with maximum relative risk above two, all methods obtained a statistical power above.95. SaTScan-E also detects cluster locations with significantly higher risks.
We compute the chance of a county being inside the detected clusters out of the 1000 replicates and present the chances in Figure. The counties with orange through red color have moderate to high chance (60% to 100%) being detected as inside clusters of high risk. The blue color areas indicate counties with low chance to be detected. The patterns in Figure do reflect a spatial variation (high risk in the west vs. Low risk in the east).
The chances of counties being inside detected clusters are lower for data with linearly increasing function (right) compared with that with exponentially increasing function (left), which is expected because the risks increase faster with an exponential function. Compared with the true risk pattern in Figure, SaTScan-E only detects clusters with high risk in limited eastern areas, but does not capture the global clustering pattern as shown in Figure. Also, we cannot evaluate the precision of SaTScan-E because there is no true cluster in the data with global clustering pattern. Figure 3 SatScan and local version of Moran's I results for Global clustering and local cluster patterns. Graphical representation showing the chance of geographic units being detected as inside clusters (the number of times a geographic unit was selected as a unit inside a cluster over 1000 simulations). (A) Results from SatScan-E on data with global clustering pattern as shown in Figure 1, (B) Results from SatScan-E on data with outlier as the three counties shown in Figure 1, and (C) Using local version of Moran's I on data with outlier pattern as shown in Figure 2.
Performance of methods for data with local clusters (outliers) Based on our simulation study, the ability of the statistics to identify the spatial heterogeneity in data with only outliers is not very sensitive to the number of adjacent neighbors. SatScan-E with maximum search window as 5% of total population performed significantly better (with power above 0.95) than all methods for global clustering evaluation (with power lower than 0.2) in terms of power (Table ). This result was not surprising since SaTScan-E is designed for local cluster detection and with a small search window it searches for clusters with a smaller size (outliers). Methods for global clustering evaluation do not perform well because there is no strong global clustering tendency when the data have a single outlier including one county. We also compared the chance of the true outlier counties being detected as inside clusters from the two methods for local cluster detection (SaTScan-E vs. LISA) in Figure ( vs. SaTScan-E has a very good chances (above 0.9) to detect the true outlier counties.
The maximum chance of being detected as inside clusters for local Moran's I are lower than those from SaTScan-E (around 0.4-0.5). However, we note that local Moran's I did detect the location of the true simulated outlier with a higher chance (0.4-0.5) than all other counties (with chance. HIV disease, as a non-cancer disease, has the public's attention in recent decades because of its high mortality. To understand the spatial pattern of HIV mortality, we use an HIV mortality data during 1987-2004, which is available from the National Center for Health Statistics and extracted from the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute (NCI).
The total number of HIV deaths during 1987-2004 is 406,465 in our data set. First, we simply map the observed mortality rate and the smoothed rate in Figure (observed) and 4B (smoothed) using GIS. In Figure, the rates are smoothed using the software Headbang with the county population at risk as the weight for smoothing with 30 neighboring counties. The smoothed map provides a better sense of spatial pattern with less noise compared with the observed map. From the maps, we visually see a south to north global clustering trend of the mortality rates similar to the global clustering trend we evaluated in our simulation study except in a different direction (West to East trend in our simulated data set.). Figure 4 Example using data with a global clustering pattern. (A) Raw and (B) Smoothed HIV mortality rates per 100,000 for the years 1987-2004 using county population as weight for smoothing over 30 nearest neighbors (top).
(C) is detected clusters of high rates from SaTScan-E and (D) is detected clusters of low rates from SaTScan-E We then used the selected methods on the raw HIV mortality data for testing of the spatial heterogeneity. TangoPDM successfully detected the global clustering pattern (p-value. SatScan-E with maximum search window as 5% of total population at risk detects 15 significant clusters of high rates with p-values smaller than 0.05 (5A). When 50% of total population at risk is used, 8 clusters of high rates are detected with p-values smaller than 0.05 (5B). The relative risks in those detected clusters are ranging from 1.07 to 1.81. There are many clusters with only one or several counties (outliers) in both maps and there are more outliers detected in Figure. We notice a small cluster including only two counties (Silver Bow County and Deer Lodge County) in the state of Montana, which is a typical type of outlier with high relative risk (1.81).
The mortality rate inside this outlier region is 67.1 per 100,000 people vs. 37 per 100,000 outside. The observed mortality rates in Montana are presented in Figure and the smoothed rates (using Headbang software) is in Figure, which also shows a possible outlier location in Silver Bow county and Deer Lodge County. It has been known that Silver Bow County was the location of the pollution source caused by a copper plant which explained the high risks of lung cancer mortality in this area. This is also consistent with the results noticed by Lee et al. that an extreme lung cancer mortality rate in a single county (Silver Bow County) in Montana was caused by arsenic exposure. Here, we can claim that there exist both global clustering pattern and outliers in white male lung cancer mortality during 1950s-60s in this county.
In this article, we explored the performance of several global indices of spatial autocorrelation, local Moran's I and SaTScan-E to detect global clustering and identify outliers. TangoPDM had the highest statistical power for identifying global clustering patterns among all methods considered. The power for I. popwith proper λ are very close to that for TangoPDM, but the user has to choose the value of λ, which is subjective and it is very hard to find the optimal λ. However, TangoPDM evaluates regions with varying total population and provides a single statistic and p-value, so it is easier for users without much knowledge of the geographic features of the study region to identify global spatial clustering. SaTScan with a large search window (50% of total population) may have moderate power to detect spatial heterogeneity but it may not reveal the correct global clustering pattern. SaTScan-E performed well in detecting the outliers in terms of power, which is much better than local Moran's I and the methods for global clustering evaluation.
From our simulation study we also found that for a large relative risk difference (greater than 2), SaTScan-E as well as all the other methods considered were able to detect spatial heterogeneity with a power above.98. For local cluster detections (outlier), the power is sensitive to the change in sample size with increased sensitivity. When the sample size was less than 25,000 we had very low power in detecting outliers. Even though SaTScan and local Moran's I did obtain a higher chance to locate the true location of outliers compared with other locations, the overall chance of detecting the true outlier is low. The weight function does affect the power of the methods for evaluating global clustering pattern.
For the methods using adjacent neighbor weight function, the order of the power is Tango's Index, I. pop, and Moran's I, which implies that Tango's index is the best in detecting spatial heterogeneity among the selected methods for global clustering evaluation even with the same weight function. When comparing the power for I. popwith the weight functions PD (with λ = 50% of total population) and ADJ, we notice that the power for the data with global clustering is much higher with the PD weight function compared to the ADJ weight function.
A limitation of our simulation is that we simulated homogeneous county populations across an entire region. This assumption likely does not hold in real data. Heterogeneous population densities complicate comparisons of statistical power between hypothesis test evaluating spatial clusters or global clustering. Our intent was to investigate statistical power while controlling for as many variations and confounding factors as possible, in order to effectively evaluate and compare each statistics. However, we do provide an example based on real mortality data with heterogeneous population across the U.S. Our example based on heterogeneous population revealed that TangoPDM method detected the global clustering pattern which validates the simulation results based on a similar global clustering pattern.
Although SaTScan was designed for cluster detection, it can still be used to access global clustering patterns if larger clusters exist. However, SaTScan can be misleading in detecting clusters as shown in Figure. Therefore one should be careful when applying SaTScan to evaluate global clustering patterns. SaTScan did find the outlier (Silver County) in Montana with real heterogeneous population and unusually high lung cancer mortality rate among white males using a 5% and 50% window. Usually, it is difficult to detect outliers using a 50% window. In our case a 50% window was sufficient to detect an outlier. This result is partially due to the fact that we have a large relative risk of 1.8 and a large sample size which made outlier detection more feasible.
There are many directions for future research. First there are several alternative autocorrelation and spatial association patterns (not only global clustering and outlier patterns) that could be considered. For example, Song and Kulldorff simulate various cluster sizes for their power study.
In addition, there is much work to be done in defining appropriate spatial weight functions. We considered common spatial weights: adjacent neighbors and population density. However alternative weight functions (such as the weight functions included in Song and Kulldorf ) may influence the performance of each test. Other outlier detection methods (e.g.
Multi-item Gamma Poisson Shrinker (MGPS)) may be evaluated and compared with SaTScan-E and Local Moran's I in the future. Comparison with heterogeneous population data is also possible if one is interested in the performance of the tests in the presence of real heterogeneous population from the US. Acknowledgements The authors would like to thank Dr. Dave Stinchcomb and Dr. Ruth Pfeiffer of the National Cancer Institute for their many useful comments which greatly improved this paper.
Yongwu Shao of Information Management Science for his programming help. We are grateful to Dr. Linda Pickle of Statnet, LLC for providing statistical consultation on locating data sets. Financial support was provided to Dr.
Monica Jackson by the Statistical Research and Applications Branch of the National Cancer Institute as part of contract number 263-MQ-706620 and the Intergovernmental Personnel Act (IPA). Competing interests The authors declare that they have no competing interests. Authors' contributions MCJ, LH and EF conceived and designed the study, interpreted the simulation results, and prepared the manuscript.
MH and JL wrote the simulation routines. All authors read and approved the final manuscript. Supplementary material.

Monte-carlo-simulation In Excel (simulation Der Losgre

Monte Carlo Simulation Excel Help