For species distribution models, species frequency is termed prevalence and prevalence in samples should be similar to natural species prevalence, for unbiased samples. However, modelers commonly adjust sampling prevalence, producing a modeling prevalence that has a different frequency of occurrences than sampling prevalence. The separate effects of (1) use of sampling prevalence compared to adjusted modeling prevalence and (2) modifications necessary in thresholds, which convert continuous probabilities to discrete presence or absence predictions, to account for prevalence, are unresolved issues. We examined effects of prevalence and thresholds and two types of pseudoabsences on model accuracy. Use of sampling prevalence produced similar models compared to use of adjusted modeling prevalences. Mean correlation between predicted probabilities of the least (0.33) and greatest modeling prevalence (0.83) was 0.86. Mean predicted probability values increased with increasing prevalence; therefore, unlike constant thresholds, varying threshold to match prevalence values was effective in holding true positive rate, true negative rate, and species prediction areas relatively constant for every modeling prevalence. The area under the curve (AUC) values appeared to be as informative as sensitivity and specificity, when using surveyed pseudoabsences as absent cases, but when the entire study area was coded, AUC values reflected the area of predicted presence as absent. Less frequent species had greater AUC values when pseudoabsences represented the study background. Modeling prevalence had a mild impact on species distribution models and accuracy assessment metrics when threshold varied with prevalence. Misinterpretation of AUC values is possible when AUC values are based on background absences, which correlate with frequency of species.

}, keywords = {distribution models, prediction, sampling}, doi = {10.5194/we-13-13-2013}, author = {Hanberry, Brice B. and Hong He} } @proceedings {743, title = {Using mixed models to quantify variability in fish populations}, journal = {Georgia Chapter of the American Fisheries Society}, year = {2013}, abstract = {Monitoring programs are widely used to provide essential information for the restoration and management of fish populations. It is generally assumed that these monitoring surveys produce representative data on how fish populations vary over space and time. For example, observed fish-population metrics may vary among repeated samples from a single location, from site to site within a lake, from lake to lake, and among sampling years. We will discuss the use of mixed models to partition variability into multiple spatial and temporal components. Models for estimating variance components have been applied to a wide variety of aquatic indices including water chemistry variables, measurements of species richness, stream habitat characteristics, metrics of fish growth, and catch-per-unit effort data. To date, most variance-components frameworks have been based on linear models that assume normally distributed error structures. However, assuming a normal distribution for observations of abundance is often not ideal because these counts are typically non-negative integers with high variances and low means, not to mention other issues that arise when log-transforming data such as how to treat zero observations during the analysis. We will use data collected by fishery-independent surveys to illustrate the idea of variance partitioning and discuss its relevance for monitoring programs. We will also describe the negative binomial distribution within the mixed-model framework as an alternative to log-transformation (e.g., an alternative assumption about the mean-variance relationship) that can be applied to discrete count data in a variance-partitioning context.}, keywords = {distribution models, fish populations, management, monitoring programs, restoration, variance}, author = {Irwin, Brian J. and Wagner, T.} }