Statistics Resources
Useful Apps for Statistics
Most students collect and manage their data in Google Sheets or Excel. These spreadsheet programs can be useful in calculating descriptives statistics (averages, range, standard deviation) and can also be used to calculate some basic inferential statistics (ttest, chisquare, etc.). However, the overall functionality is somewhat limited. Therefore, it is worth knowing about some alternatives:
 Statskingdom.com  a free webpage with many statistical tests that you can run. Very user friendly. Provides examples of how to report data for many tests and also tests assumptions (normality).
 Link to Miami University software downloads
 Microsoft Excel, along with all Microsoft Office 365 apps (Powerpoint, MS Word), is available free for all Miami students.
 SPSS  this is a large install file and SPSS is a major statistical program. We only recommend downloading if you are planning on using advanced statistics regularly.
Getting Started With Spreadsheets and Descriptive Statistics
As you get started, you want to organize your data in your spreadsheet and report your descriptive statistics.
Video Resources
 Spreadsheet data management basics (7:01)  Using Google Sheets to calculate Sum, Average, Max, and Min.
 Useful spreadsheet functions (9:00)  Remove duplicates, Remove white space, Find and replace, and concatenate functions.
 Pivot tables in Google Sheets (7:26) Pivot tables are critical when you have a large dataset with dozens or hundred of rows. Knowing how to use a pivot table can be a very useful skill.
 Calculating mean and standard deviation in larger data set (7:57)  Details how to freeze your top row in a dataset with many rows, paste specialtransposed (rotating data), and more.
 Measures of Central Tendency (2:32)  Using spreadsheets to caculate basic measures including Mean, Median, and Mode.
 Measures of Dispersion (8:24)  How to use speadsheest to caculate Range and Standard Deviation; Also demonstrates the math behind Standard Deviation.
 Writing up descriptive statistics (1:10)  Simple explanation of how to write up descriptive statistics in your research paper.
Inferential Statistics, Interpreting PValues, and What Test to Use
When it comes to generating a Pvalue, one of the most challenging aspects is determining what statistical test to use. This often relates to your hypothesis and goals, but also the structure of the data you collected. The following video resources can help you identify what statistical test is appropriate for your data.
Video Resources
 Interpreting Pvalues (17:36)  Start here if you are unclear on what a pvalue is or why it is important.
 What statistical test to use (11:06)  Walks through how to look at your data and determine what statistical test may be most appropriate.
 How your variables determine your statistics (4:19)  Understanding the types of variables (categorical, numerical, etc.) is critical to determining what statistical test you should use.
 Considerations for your sample size (3:51)  A common question is “How many samples should I take?” This video provides some insights on how to assess that question for the data you are collecting.
Comparing Two (Or More) Groups with Numerical Data
When comparing two groups, you are most likely going to be using a ttest (or nonparametric alternative) or a chisquare test. Below are links with details on how to use these tests.
If your data looks like the table below, with two groups and numerical data as replicates, you will likely use a TTest:
Table 1. Example data table for which you would likely use a TTest. Note that you may have many rows of data and you do not have to have an equivalent number of rows in each column (it would be ok to have 7 rows of data under Group 1 and 9 rows of data under Group 2, for example).
Group 1 
Group 2 
12 
8 
5 
3 
6.4 
12 
Etc. 
etc. 
Realworld examples: 1) You are comparing bee abundance on native versus nonnative flowers; You are evaluating a Pre and Posttest of student scores or participant opinions; 2) You are assessing the number of trash items on beaches (Category 1) compared to in parks (Category 2).
Checking Normality of your Data
Before using a ttest or ANOVA, you are expected to check the normality of your data. In statistics, “normality” refers to the degree to which your data has most values centered around the mean, with fewer extreme high or low values. This is also known as a “bellcurve” histogram, where most values in the dataset are near the mean and fewer values are at the upper or lower ranges. Technically, you should only use a ttest or ANOVA if your data is normally distributed although some statisticians content that both tests are robust to slight deviations from normality.
How do I test my data for normality?
You can plot a histogram of the data and see if it generally looks like a bell curve. However, this is a bit subjective. A more common method is to use the ShapiroWilks test which evaluates if a dataset significantly deviates from a normal distribution.
Video Resources
 How to check your data for normality on Statskingdom.com (5:08)  Example uses fabricated pre and post test student scores. The hypothesis is that student scores increased.
What if my data is not normally distributed? One option is to uniformly transform the data so it more closely resembles a normal distribution. For example, you could take the log10 of each value in your dataset and then reevaluate normality.
Video Resources
 Transforming your data to improve normality (3:15)  How to apply a uniform mathematical transformation to all data points to possibly improve normality.
 How to create a histogram in Google Sheets (11:07)  How to create a histogram of your data to visually assess normality.
Nonparametric Tests (for when your data is not normally distributed)
If transformations do not work, you can use a nonparametric statistical test. Below is a table with the nonparametric alternatives of a few statistical tests.
Table 2. Examples of parametric tests and their nonparametric alternatives.
Parametric test (use if data is normally distributed) 
Nonparametric alternative (use when data is not normally distributed) 
Unpaired Ttest 
MannWhitney Test 
Paired Ttest 
Wilcoxon SignRank Test 
ANOVA 
KruskalWallis Test 
Video Resources
 Differences in types of ttests (one vs. twotailed, paired vs. unpaired) (7:05)  Example demonstrating ttest variations using Google Sheets.
 How to check your data for normality on Statskingdom.com (5:08)  Example using fabricated pre and post test student scores.
 Checking normality on prepost survey data (4:35)  Examples shows how to run a Wilcoxon sign rank test (nonparametric alternative to ttest) because the data is not normally distributed.
 Testing normality on numerical data (bee abundance on native versus nonnative plants) (4:06)  Example uses a MannWhitney test (nonparametric alternative to ttest) because the data were not normally distributed.
If your subjects are the same (same students take the pre and post test) you could do a "paired ttest". The way to set this up in excel is to have your first column be student names, your second column be scores from first test, and your third column be scores from second test. You then type "=t.test" in a blank cell and a few options will come up (this formula "t.test" may be slightly different depending on what version of excel you have). You want to choose the "arrays" which just means highlighting scores for test 1 for array 1, and scores for test 2 for array 2. Then you have to choose "1tail" or "2tails". Choose 1tail if you have a hypothesis that scores will specfically go up or will go down. Choose 2tails if you are not sure if they will go up or go down, you just think they will be different in either direction. In most cases, you will probably choose 1tailed because you probably thought the scores would go up which is why you are testing this. Finally, you choose "1" for the type of Ttest. 1 should equal "paired" ttest. A paired ttest just means that the test realizes that each row is linked or paired (in this case they are linked because they are the same student in each row). Thus, one student could go from a 65 average to a 75 and that would be a nice increase. Another student could go from a 90 to 100 and that is the same increase for that student.
It wont make sense until you try it! :) Putting fake data in and testing this is the way to understand. The formula will provide a pvalue which is the probability that any difference is due to chance alone. The lower the pvalue the more likely it is that there is a true difference in test scores. If the pvalue is less than 0.05 you can reject your null hypothesis (no difference in test scores) and say that the difference is significant.
It does not seem like Microsoft Excel or Google Sheets report the tstatistic and it is not critical that you report this in every paper. However, for those that are interested, it is fairly easy to calculate this yourself. In the numerator you have the mean of the first group minus the mean of the second group. In the denominator you have the square root of (the standard deviation of the first group divided by the sample size of the first group) + (the standard deviation of the second group divided by the sample size of the second group)). You can visualize the formula here  https://www.scribbr.com/statistics/ttest/
Before computers, folks would calculate the "tstatistic" using the above formula, then they would go to a large table in the back of a statistics textbook and look for their tstatistic given the sample size to see if their data was significantly different or not. With the advent of computing it was possible to just directly get the pvalue so tstat is less important but it is still common practice to report it.
You also want to report the sample size of both groups and you can do this in a few ways. You could say something like “Although scores were higher for group A (Mean = XX; Standard deviation = XX, N = XX) than group B (Mean = XX; Standard deviation = XX, N = XX), the difference between these groups was not significant (t = XX, p = 0.XX).
This is common! First, it is important to realize that something can fail to be “statistically significant” but still be “biologically significant”. For instance, initially a species may be declining in population number but it may be such a slow decline with so much variation that it is not “significant”. However the “decreasing trend” is still very biologically relevant. Also, if you have a pvalue that is greater than 0.05 but still lower than say 0.15 some researchers would discuss this as being “statistically insignificant” but still “suggestive of a trend” that might be significant if you had a larger sample size or controlled for some additional confounding variables. Instead of simply declaring "there was not a significant difference (p = 0.11)" it might be more appropriate to say "Given the available sample size and variation, it was not possible to detect a significant difference (p = 0.11)."
Also, sometimes, we have low "statistical power" which means we have a low likelihood of detecting a significant difference even if one exists. Low statistical power can result from low sample size and large variation in the groups (high standard deviation). It is possible to calculate statistical power using fancier statistical programs. The main “take home point” is that a pvalue >0.05 really indicates no significant difference that we could detect with the available data. A larger sample size might result in a significant difference.
Finally, if you find no significant difference despite being satisfied with your statistical power (you had good sample size, not much variation), you should not just give up. Now you can consider other hypotheses. This is what science and inquiry is all about it is a continual investigation where one piece of evidence leads to another question.
What if I have three (or more) groups? If you have a third Group (or more), and your data is organized as shown in Table 1 above (but with more columns), you can use an ANOVA (Analysis of Variance) or the nonparametric alternative, a KruskalWallis Test.
Video Resources
 Calculating an ANOVA (10:49) Understanding the term “omnibus test” and how to run an ANOVA using statskingdom.com. Example compares CO2 emissions per capita compared in countries classified as “Low”, “Medium”, and “High” gross domestic product.
 Writing up KruskalWallis and other test results (2:16)  There are style guides such as APA that provide examples of how to write up research results. However, sometimes the best guide is going to the published primary literature.
Comparing Two (or more) Groups with Categorical Data
If your data looks like the table below, where you have at least two groups and two categories and have counted up the frequency of occurrences in each cell, you will likely use a chisquare test.
Table 3. Example contingency data table for which you woud use a chisquare test.
Category 1 
Category 2 

Group 1 
35 
22 
Group 2 
11 
6 
Realworld examples: 1) You are interested in comparing the percentage of dogs with aversive vs. nonaversive collars. You randomly observe 100 dogs and note that 86 of 100 are wearing nonaversive collars. You compare that to an “expected” equal percentage of 50 aversive and 50 nonaversive to see if 86 out of 100 is significantly different than the null hypothesis of no difference. 2) You are comparing the racial demographics of those participate in a citizen science program with the racial demographics of the city where the program is located.
What if I have three (or more) groups or categories? The chisquare test can accommodate more than two groups and more than two categories. The above table is known as a Contingency Table. Because it has two categories and two groups, it is known as a 2 x 2 contingency table. However, depending on your data, you might have a 2 x 3 contingency table or other sizes (3 x 2, 3 x 3, 2 x 4, etc.).
Video Resources
 How to setup and run a chisquare test in Statskingdom.com (5:36)  Example looks at number of people interacting with nature, versus not directly interacting with nature, in different city parks.
 Chisquare: Testing Your Hypothesis (14:40)  By Craig Beals, Dragonfly graduate and current instructor, this video provides an overview of the chisquare test for determining if observed frequencies significantly differ from expected frequencies.
 Setting up Pearson chisquare test of independence data in an excel spreadsheet (13:22)  Dragonfly graduate, Katie Dell, discusses how to setup a Pearson’s chisquare test of independence.
This is up to you as the researcher. You could create one or more metrics of by combining responses from similar question types and then test if those metrics change as a whole; you could also take a single average of all responses. Or you could test each question separately. This really depends on your hypothesis do you think individual responses to different questions will change in isolation of each other or do you think all will increase together?
Important the more tests you run to see if there is significance, the more likelihood that one will just be significant by chance. If you think about it, we accept a pvalue <0.05 as being "significant" because a 5% chance that the results are different due to chance seems low. However, each time you run a statistical test you inflate the chance that a result will appear significant (p<0.05) simply due to chance. Think about it like this if you roll a 20 sided die, you have a 5% chance of landing a 1. However, if you roll the 20 sided die twice, you now have a 10% chance of landing at least one 1. This may be confusing (or may not) but the main idea is to be careful and skeptical of low pvalues if you run many tests.
Comparing species abundance, richness, and diversity
A relatively common question in ecological research is how the number of species differ through time or across land use types. Common measures of species include:
 Species abundance  the number of individuals of a species in an area
 Species richness  the number of different species in an area
 Species diversity  a measure of the balance or evenness of different species abundances in an area
Before you start your study, it is important to specify which measure(s) you are most interested in evaluating. The videos below clarifies the differences between abundance, richness, and diversity. They also demonstrate how to calculate the ShannonWeiner Index of the diversity, a useful statistic when comparing the biodiversity of two populations.
Video Resources
 Abundance, Richness, Diversity (4:22)  Shows the differences in these measures and how to calculate ShanonWeiner Index.
 ShannonWeiner Index (8:33)  Goes into more detail on how to perform Shannon's diversity index using excel.
You can compare abundance, species richness, and/or ShannonWeiner Diversity for the two water bodies (see video above for differences in measuring each of those metrics. Note that this will give you one measure for each water body so you will not be able to use a ttest or other inferential statistics. To analyzed inferential statistics on the two bodies of water, you would need replicates at each site (multiple samples at different times) or more lakes to compare. That can be a lot of work!
Data Visualization (Making Professional Figures and Graphs)
Once you have run your statistical tests, you need to create visuals to demonstrate the findings. Excel and Google Sheets have numerous chart making capabilities. Creating a clear and professional graph or figure is a key skill to develop. The Purdue OWL (Online Writing Lab) provides some guidance on creating visuals including Tables and Figures. The video resources below also will be helpful as you consider the best Figure and Table options for your work.
Video Resources
 Making professional Figures for your papers (part 1) (3:10)  Discusses differences between Figures and Tables.
 Making professional Figures for your papers (part 2) (5:31)  How to write clear, professional captions.
 Using Figures to tell the story of your research (2:42)  Be creative and embrace multipanel Figures. You can create these in powerpoint or google slides.
 Make the Tables in your gdoc look professional in five minutes (4:49)  These simple, small improvements can make your data table look much more professional.
 Building ANOVA bar graph significance brackets using google slides (7:09)  This is useful if you ran an ANOVA test and also a posthoc pairwise test (e.g., Tukey’s test) to see what pairs were significantly different.
 Scatterplot recommendations (3:40)  When and when not to use a piechart; The importance of scaling on correlation scatterplots.
Dragonfly Introduction to Statistics  Free Online Course
Need more guidance? We have developed a free and entirely optional Dragonfly Introduction to Statistics course that all Dragonfly students and alumni can selfenroll in. The course is open year round and you can start at anytime. There currently are three Modules:
 Module 1: Descriptive Statistics  General principles of experimental design, when to use median vs. mean, how to calculate standard deviation, how to write up descriptive statistics in your papers.
 Module 2: TTests, Normality, and PValues  includes paired vs. unpaired ttests, MannWhitney Test, Wilcoxon Sign Rank Test and more.
 Module 3: Beyond the TTest, Other Inferential Statistics  includes oneway ANOVA, chisquare tests, correlation/regression and more.
You can go work at your own pace following along with short video tutorials to make the calculations in your own spreadsheet. We estimate that each Module will take 1.53 hours to complete. The modules provide an introduction only and is designed to generally improve or refresh your knowledge about statistics. As a small incentive, successful completion of each module results in a digital Badge for that Module. :)
To get started, you can selfenroll here:
Additional Reading
Pfannkuch, M., BenZvi, D. "Chapter 31: Developing Teachers' Statistical Thinking." Teaching Statistics in School Mathematics  Challenges for Teaching and Teacher Education. A joint ICMI/IASE Study, The 18th ICMI Study, Springer, 2011.
Gotelli and Ellison. "Chapter 6: Designing Successful Field Studies", A Primer of Ecological Statistics. Sunderland, Massachusetts: Sinauer Associates, 2004.
Hurlbert, S.H. "Pseudoreplication and the Design of Ecological Field Experiments." Ecological Monographs. 54.2 (1984): 187211. ***Classic paper, Worth a read if you plan on doing ecological surveys!***
Platt, J.R. "Strong Inference: Certain systematic methods of scientific thinking pay produce much more rapid progress than others." Science. 146 (1964): 347353. ***This is a classic paper, Worth a read!***
The Research Methods Knowledge Base was created in 2006 by William M.K. Trochim, a Professor in the Department of Policy Analysis and Management at Cornell University. It includes helpful introductory information on social research methods including measurements, data analysis, sampling and more: http://www.socialresearchmethods.net
Qualitative Research: Studying how things work by Robert Stake. The Guilford Press, 2010. ***This book makes the bold (and refreshing) assertion that all research is both qualitative and quantitative, whether we admit it or not. It also gives one of the clearest explanations of the different approaches and the general concept of epistemology. There is also a chapter on Action Research. The great thing about this book, and many others, is that it is freely available as an ebook from Miami library. Just search for it at <http://www.lib.miamioh.edu/> ****
Patton, M. Chapter 3: Qualitative Research & Evaluation Methods. Sage Publications Inc., 2002.
Bateson, P. & Martin, P. (2007). Chapter 5. In Measuring behaviour (pp. 4861). New York: Cambridge University Press.
Chamberlin, T.C. "The Method of Multiple Working Hypotheses: With this method the dangers of parental affection for a favorite theory can be circumvented." Science. 148 (1965): 754759.***Classic paper, Worth a read!
*Cobb, P., Confrey, J., diSessa, A., Lehrer, R., and Schauble, L. "Design experiments in educational research." Educational Researcher. 32.1 (2003): 913.
Cox, G.W. (1985a). Exercise 10. In Laboratory manual of general ecology (pp. 6068). Dubuque, IA: Wm. C. Brown Publishers.
Cox, G.W. (1985c). Exercise 29. In Laboratory manual of general ecology (pp. 163167). Dubuque, IA: Wm. C. Brown Publishers.
Henderson, P.A. (2003a). Chapter 6. In Practical methods in ecology (pp. 8394). Malden, MA: Blackwell Science.
Henderson, P.A. (2003b). Chapter 7. In Practical methods in ecology (pp. 9599). Malden, MA: Blackwell Science.
McDonald, J.H. (2014). Types of biological variables. In Handbook of biological sciences (3rd ed.) (pp. 613). Baltimore, MD: Sparky House Publishing. Retrieved from http://www.biostathandbook.com/variabletypes.html.
McDonald, J.H. (2014). Choosing a statistical test. In Handbook of biological sciences (3rd ed.) (pp. 293296). Baltimore, MD: Sparky House Publishing. Retrieved from http://www.biostathandbook.com/testchoice.html.
Provenzo Jr, E., Ameen, E., Bengochea, A. "Photography and Oral History as a Means of Chronicling the Homeless in Miami: The StreetWays Project." Educational Studies. 47.5 (2011): 419435.
Thomson, G., J. Hoffman and S. Staniforth. Measuring the Success of Environmental Education Programs. Sierra Club of Canada, Canadian Parks and Wilderness Society, Global Environmental and Outdoor Education Council, 2010.
MEERA website (My Environmental Education Research Assistant),an initiative out of the University of WisconsinStevens Point, which for quite awhile was about the only environmental education organization even talking about evaluation. This site offers some great basics and leads you through conducting an evaluation. http://learningstore.uwex.edu/assets/pdfs/G36582.PDF NAAEE
Measuring Environmental Education Outcomes http://www.naaee.net/publications/MEEO