· Pick a large data set (try to have more than 10,000 data points) · Create 2-3 hypothesis of the data (what do you think the data will say before you run analysis) · Pull 3 random samples from that data set. Run statistical analysis on each sample. o Outline your quality control methods of each sample o One sample should be raw random number generation o Sample sizing is based on your discretion – but you need to defend your sample size o You can “purposely” bias a sample to demonstrate a statistical point (ie: you can purposefully select certain data points to show how bias can show inconsistencies in results) – just note this should not count as one of your 3 sample (you would maybe have 4-5 samples if you wanted to include a sample or 2 of bias) · Run the same statistical analysis on the entire data set o What differences are there among the samples? o What could drive these inconsistencies? o Which sample most clearly matched the “population”? o What did you learn? o Include some type of analysis (regression, time series, etc) · Provide a “report” (not an APA formatted paper) – a business report on the methodology of your sampling and analysis o Include conclusions and recommendations in your report and Include charts and graph in your report.