Using 'big data' to explain visits to lakes in 17 US states
We use large dataset on US lakes from 17 states to estimate the relationship between summertime visits to lakes as proxied by social media use and the lakes' water quality, amenities, and surrounding landscape features and socioeconomic conditions. Prior to estimating these relationships we worked on 1) selecting a parsimonious set of explanatory variables from a roster of more than 100 lake attributes and 2) accounting for the non-random pattern of missing water quality data. These steps 1) improved the interpretability of the estimated visit models and 2) widened our estimated models' scope of statistical inference. We used Machine Learning techniques to select parsimonious sets of explanatory variables and multiple imputation to estimate water quality at lakes missing this data. We found the following relationships between summertime visits to lake and their attributes across the 17-state region. First, we estimated that every additional meter of average summer-time Secchi depth between 1995 and 2014 was associated with at least 7.0% more summer-time visits to a lake between 2005 to 2014, all else equal. Second, we consistently found that lake amenities, such as beaches, boat launches, and public toilets, were more powerful predictors of visits than water quality. Third, we also found that visits to a lake were strongly influenced by the lake's accessibility and its distance to nearby lakes and the amenities the nearby lakes offered. Finally, our results highlight the biased results that "big data"-based research on recreation can generate if non-random missing observation patterns in the data are not corrected.
Collection organization
Level of Description | Summary | Catalog Record | |
---|---|---|---|