Photo of DataFest student working on a project on his laptop

Data Sources, Workshops, Coaching

As a student team, you can prepare for DataFest using a variety of open data sources, including:

  • Domestic Data lets you search and download data from the 2010 U.S. Census
  • International Programs allows you to choose specific countries from around the globe and form unique data sets which you can export as Excel or CSV files
  • Minnesota Population Center is one the world's leading developers of world-wide demographic data resources. Note: You will need to register with MPC before extracting data; however, this is a free resource, with tutorials.
  • World Bank Open Data is free and open access to data about development in countries around the globe
  • Safecar.gov provides zip files of up to date databases for complaints, defect investigations, recalls, and technical service bulletins (Note: the zipped data files are text documents and corresponding description files can be downloaded as separate text files)
  • KDnuggets
  • RDataMining

Workshops

StatHawks will be hosting a series of Data Science Nights this spring semester. In order to participate in the workshops, students will join the DataFest canvas site (available soon).

Students may complete the first session through Canvas or in person. Sessions 2-5 will be held in a classroom-style setting. Please note: students should bring their own laptops to the workshops.

Session 1: Basic Tools and Packages of R

Online resources (through Canvas site): Available starting January 30th

(Optional) In-person meeting: February 2nd 6:00-7:00 PM, FSB 0026

This session will assist students with installing/basics of the programming language. Students may complete this session online or in person. Additional office hours will be available if students have questions.

Session 2: Reading and Managing Large Data Sets with R

In-person meeting: February 9th 6:00-7:00 PM, Upham 316

Led by: Dr. Karsten Maurer: Department of Statistics

This session will cover the use of the data.table and dplyr packages in R for loading and work with a real large data set in R. Students will work on loading and creating monthly summaries of the New York City cab csv files.

Session 3: Data Visualization Using R

In-person meeting: February 16th 6:00-7:00 PM, Upham 316

Led by: Dr. Karsten Maurer: Department of Statistics

This session will cover the use of the ggplot2 and GGally packages in R for quick and simple exploratory plot making. Students will not get into advanced graphics, but will lay foundations for a plotting system that can extend to high-end, static and interactive graphics.

Session 4: Presentation and Insight Skills

In-person meeting: February 23rd 6:00-7:00 PM, FSB 0025

Led by: Dr. Debbie Coleman: Department of Marketing

This session will cover the necessary skills for presenting insights at DataFest. Students will learn how to tell a story with their data, using a marketing approach to an audience, faculty, and a panel of judges.

Session 5: DataFest Scenario (Part 1)

In-person meeting: March 2th 6:00-7:00 PM, Upham 316

Led by: Dr. Tom Fisher: Department of Statistics

During this session, students will have the opportunity to try a test run DataFest scenario. A real world data set will be given to the students. They will learn how to approach a large amount of data in a short amount of time.

Session 6: DataFest Scenario (Part 2)

In-person meeting: March 9th 6:00-7:00 PM, Upham 316

Led by: Dr. Tom Fisher: Department of Statistics

This session will allow the students to regroup after working with their data given to them the previous session. Students will have the opportunity to formally present their findings or simply discuss in a group settings what they did to their data. 

Additional Resources

For Miami students, if you do not have a coach in mind feel free to contact any of the following faculty members.