Scale Effect and Spatial Data Aggregation

 This week's lab focused the effects that scale and resolution can have on the properties and subsequent analysis of data. The lab was split into three parts each focusing on a different data form: vector, raster, and spatial data aggregation and aerial influence.

In the first portion of the lab, we were provided with six shapefiles representing water features in Wake County, North Carolina. Three of these shapefiles represented the location of the same river and stream features at three respective scales. The remaining three were polygons representing the same lake and pond features at the same three scales as the previous shapefiles. 

The premise behind this analysis is that when vector data is recorded at a large scale ('zoomed in' - high ratio value for scale) it is more detailed. This higher detail accounts for an increase in data capture resulting in higher values for analysis. In some cases the converse may be true as features measurements may become overextended due to a generalizing in measuring said feature, however, in this lab this was offset by the tremendous increase in features recorded at a higher scale. 

The three scale that we were give were 1:1200, 1:24000, and 1:100,000. The total length of the polylines were first calculated followed by the perimeter of the polygons and their area measurements. When compared, our findings demonstrated that as the scale increased from 1;100,000 to 1:1200, the overall lengths of the polylines increased by 500%. 
In regard to the polygons, there were an increased number of polygons at the higher scale (once again by almost 500%) and their perimeter and area measurements also increased with scale. 

Below is a comparison of the polylines and polygons at their different scales. One can see the accuracy and amount/detail of features increase with the scale.




We were next tasked with finding the influence resolution (cell size) has on a digital elevation model. For this exercise we were provided with a DEM for a small coastal watershed in a mountainous region of California. The initial cell size was 1 meter for the provided raster. We then used the Resample tool to interpolate this raster at a 2, 5, 10, 30, and 90 meter cell size. 
The premise behind this exercise is that a greater and more accurate slope model (aswell as other analysis models) will be formed with a smaller cell size (higher resolution). This is because a greater amount of values per area are able to be modeled. If a model of a small area has a large cell size (lower resolution), then those data values get averaged out over a the area of the cell and less detail is communicated. 
From the DEMs we created for each requisite cell size a slope model was created respectively. In comparing the average slope of each of these models, the data showed that as the cell size increased (resolution decreased) the average slope value decreased. This was visually communicated by the models with the more sharp slopes from the hgh resolution models becoming washed out as resolution decreased. 

The final part of the project had us witness the influence that area can have on data aggregation. We created a series of scatterplots showing the seeming change in relationship that aggregations of data had as the relative area was changed. We modeled the relationship between percent of the population below the poverty line and the percentage of them identifying as non-white. These percents were then analyzed with the Ordinary Least Squares tool to illustrate the distribution relative features with different areas. These were by county, zip code, house districts, and the original block group. In the below scatterplots, you can see how the intersect point, slope, and r^2 value changed with each different area:
Block Group:
Zip Codes:


House Voting Districts:

Counties:

Strategies such as the one above are used to manipulate data to communicate incorrect notions to the public. 

We then performed a quick investigation into gerrymandering with the use of the Polsby-Popper score. We were given all of the voting districts in the United States and had to determine which ones appear suspect for gerrymandering. Through the use of GIS, isolating these violations has become much simpler. 
Each voting district should be represented by one polygone unless it has some form of geographic influence which would require more i.e. islands. If there are multiple polygons representing one voting district, it merits an investigation as to why. Additionally, as we found with several instances in our investigation, some of these multipart polygons extend over an unusual amount of contiguous geographic space. Through the use of select by attributes to create new feature classes and adding a field to calculate the geometry into parts (to identify multipart polygons), the multipart polygons were isolated. 
The polygons were investigated visually and any that had reasonable multipart features due to geography were removed. A new field was added where we then calculated the 'compactness' of these voting districts. The compactness means how close to the center of the area of the overall district is the polygon. In essence, the more compact the district the less jagged it is and the closer its extremities are to its center. This is measured on a scale of 0-1 with 0 being not compact and 1 being compact. 
To find this we used the Polsby-Popper equation. This was computed for each suspect voting district. 
The one with the least compact design AND had a suspect extra polygon was for district 2408 (per GEO ID) in Washington, DC. It had a Polsby-Popper score of .08.




Comments