- Why a statistics minor?
- What are the requirements?
- The Applied Statistcs Path
- The Data Scientist Path
- The Theoretical Path
- Should I take STAT...
There are many reasons for taking on the applied statistcs minor! Interested in machine learning and data science? Interested in exploring data and building predictive models? Interested in learning how to infer information from a sample of data? Do you just love data? This minor is right for you! This minor consists of 2 required courses, and 3 additional courses that will let you go deep enough into an area of statistics.
The requirements for the applied stats minor are fairly simple they consist of 2 required courses as follows...
STAT 1000 (Applied Statistical Methods) or acceptable substitutes
- This will be your introduction to statstics with topics including descriptive statistics, probability, random sampling. hypothesis testing, regresion and analysis of variance.
- This will complete your statistics/math portion of required gen eds for CS majors
- Acceptable substitutes include STAT 1100 (Statistics and Probability for Business Management), ENGR 0020 (Probability and Statistics for Engineers 1), and STAT 0200 (Basic Applied Statistics). But, I suggest you preferably take STAT 1000 as it is a preq. for most of the future courses and some of the content is missing in the substitutes.
STAT 1221 (Applied Regression)
- This course will enter you into the world of modeling, starting with simple linear regressions into more complicated regression models.
- This course will give you a lot of fundamentals that apply quickly into data science, such as the basic ML model of linear regressions or using dummy coding to help make categorical predictors work in regression models.
From there, take three additional statistics courses from 1200-level and above. Except for STAT 1223 as this is a one credit writing intensive class meant exclusively for statistics majors.
To help out in understanding what to choose for your three additional statistics courses, I have made a few paths to explore.
If you are just in it for the love of statistics, then simply take any of the 1200-level or 1300-level courses that peak your interest. Though do be in mind some classes don't always appear every year, and some almost always have just one section available. Below I have made a list of what typically gets offered each year.
STAT 1201 (Applied Nonparametric Statistics)
- This courses teaches you about statistical tests that try to make as few assumptions about the data as possible (including that the data isn't normal distributed).
- Statistical tests included, Wilcoxin, Fisher (sign), Ansari-Bradley, Miller (jackknife), Kruskal-Wallis, Kendall, and Kolmogorov-Smirnov.
STAT 1211 (Applied Categorical Data Analysis)
- This course is aim at techniques needed for social science, medical science, and various other sciences where it's needed to investigate relationships between more qualitative variables.
- The courses deals with the chi-squared test and the standard 2x2 and RXC contingency tables.
STAT 1231 (Applied Experimental Design)
- This course is focused on how to design valid statistical experiments and how to analyze them.
STAT 1261 (Principles of Data Science)
- More details are in the Data Scientist Path
STAT 1301 (Statistical Packages)
- This courses teaches you how to perform various statistical methods and tests using SAS and R.
STAT 1331 (Financial Econometrics)
- This course is focused on financial econmetric models and how to use them to predict financial time series data (specifcally looking at asset returns).
STAT 1361 (Statistical Learning and Data Science)
- More details are in the Data Scientist Path
This is probably what most CS majors are looking for. How does one make models and be able to predict stuff? This sequence is fairly simple it goes...
STAT 1261 (Principles of Data Science) -> STAT 1361 (Statistical Learning and Data Science)
These courses are typically offered in order with STAT 1261 in the fall and STAT 1361 in the spring. STAT 1261 is basically just an introduction to R, tidyverse, and Edward Tufte's principles behind making visualizations with just a touch of machine learning. This course will get you very comfortable with manipulating data in R and creating really nice visualizations of your data in no time. In my personal view, R is a beast at doing all things data (seriously I rather do data manipulation and visualization in R over Python any day). This is mostly thanks to Hadley Wickham, this course is basically learning how to use all the tools he made.
Then next course is probably one of the most eye opening, well taught, and amazing statistics course that exist at Pitt. STAT 1361 will get your into the world of all the different types of models, resampling methods, random forests, etc. The homework for this class is time intensive but worth every moment as it really gets you understanding how the models work and how to use them. You'll also be introduced to a fundamental concept of all models, the bias-varience trade-off. This class will basically check off the part of any data science job app. talking about have you had experince with linear regressions, LDA, PCA (kind of), random forests, etc. And a plus to it all the lecturer who teaches it, Lucas Mentch, is an amazing lecturer. During his lecturers he will make sure you understand the main concept of what's being taught, and even ask questions to the students in a way where it's ok to be wrong because he will guide you to the right direction.
And that's two classes done. The third is a little bit up to you, but look around at which courses catch your eye.
So this path is a little weird for the reason of it going against the intentions of the minor. The minor is focused on more applied statistics and is meant to be fairly practical. But, by how the requirements are define, two theortical classes can be taken, though you'll likely need permission to do so.
So if you are in a position where you need more theoretical knowledge, say a more research oriented career, here is what to take.
- STAT 1631 (Intermediate Probability)
- STAT 1632 (Intermediate Mathematical Statistics)
Take these two plus an additional classes such as STAT 1361 and you'll have a very stronger theoretical statistics background. There are definetly places in machine learning where having a deep fundamental understanding in statitics will make some of the knowledge/material easier to understand and digest. Though do keep in mind you need to talk to the statistics advisor about taking these courses. As you need permissions to do so since they are more difficult versions of STAT 1151 (Introduction to Probability) and STAT 1152 (Mathematical Statistcs) and you won't have the prerequisites for them.
So you can take STAT 1151 (Introduction to Probability) to complete your statistics/math portion of your gen eds. This won't complete any portion of the minor, but it does have the potential to open the door to more advanced stat courses that can help complete the minor. In some cases, you will also need to take STAT 1152 (Introduction to Mathematical Statistics).
So what are some of the courses that you can now take...
- STAT 1321 (Applied Time Series)
- STAT 1651 (Introduction to Bayesian Statistics)
- STAT 1731 (Stochastic Processes)
If any seem interesting to you, then I would highly suggest just becoming a statistics majors (w/wo your CS major). This is because getting to these classes and doing the required parts of the minor get you fairly deep into the statistics majors to the point that you aren't far of from completing the major.
But two things to keep in mind with this...
- As of the time I'm writing this, there is a professor that teaches two of these courses (especially STAT 1321) that many in the major found to be a very poor lecturer. I'd suggest reading the rate my professor page to get context on what I mean by this.
- The theortical courses (STAT 1151 and STAT 1152) are not for everyone and are quite difficult and different. Most of the fun of statistics lives in the applied portions.