Introductions
My name is Charlotte…
- I’m an Assistant Professor in the Statistics Department (half-time)
- I’m from New Zealand, but I’ve lived in the US for 12 years
- Last year, I taught only online classes
- I’m a cat person
- In my “free” time I’m renovating a 117 year old house
Your turn
Introduce yourself to your neighbors:
- Where are you from?
- Why are you taking ST551?
- Two other interesting facts about yourself
What are we doing here?
ST551 Statistical Methods “Properties of t, chi-square and F tests; randomized experiments; sampling distributions and standard errors of estimators, delta method, comparison of several groups of measurements; two-way tables of measurements.”
- Understanding the fundamental concepts that allow statistical inference
- The subtleties of how study setup, choice of methods, and assumptions impact the kind of inferences we can make
Along the way:
- Learning a ton of statistical methods
- Practice communicating findings in a correct and clear way
- Building our R skills to analyse and explore data and use simulation to understand properties of methods.
Class logistics
Two important websites
Class webpage: http://st551.cwick.co.nz
- Syllabus
- Lecture slides
- Lab materials
- Homework assignments
canvas: https://oregonstate.instructure.com/courses/1653112
- Announcements
- Discussion
- Homework submission
- Grades
If you have questions…
Class content, homework, class logistics, R:
- Come to office hours: mine (Mon or Wed 2-2:50pm Weniger 255) or TAs (TBA)
- Use canvas discussion board!
Email me only for situations of a personal nature.
Highlights from the syllabus
Grade = 40% homework + 25% midterm + 35% final exam
- Lowest homework score dropped
- Midterm: Friday, October 27th in class
- Final: Thursday, December 7th 9:30am-11:20am
Getting started with R and RStudio
Two options for using R and RStudio
RStudio Server: https://rstudio.cosine.oregonstate.edu
Log in with your ONID credentialsInstall R and RStudio on your own machine.
R and RStudio are also installed on Lab computers and in Cordley 3003.
Never used R before? Use the server, and work through Chapter 1 - Hands on Programming with R
Fill out data collection forms
Statistical inference
Components of a dataset
Observational units The units on which measurements are made.
Variable(s) measured for each observational unit. Any characteristic that can be measured/recorded for each observational unit.
- Quantitative (e.g. height, temperature), or
- Qualitative (e.g. hair color, shirt size).
Goals of Statistics
- Summarize
- Succinctly describe a dataset (ideally using fewer numbers/words than it would take to describe the entire dataset)
- Provide plots/graphics to convey information about the dataset
Predict Use current dataset to predict values for new observations
Infer Use dataset to provide probabilistic answer to a question in a broader context
Two common types of statistical inference
A.K.A “Scopes of inference”
Causal inference
The pattern seen in the data was caused by an intervention or treatment (or value of another variable).
Population inference
The pattern seen in the data can be inferred to a wider population.
Example
I compare the commute times for students in the class that walked versus those that drove.
Observational unit: student
Variables: commute type, commute time
Observed pattern: Students in this class who walk have a lower average commute time than those who drive.
A population inference: OSU students who walk have a lower average commute time than those who drive.
A causal inference: Walking decreases your commute time, for students in ST551 Fall 2017.
Both population and causal inference: Walking decreases commute time.
Your turn: Fill in the blanks
I survey 50 OSU students with a GPA >= 3 and 50 OSU students with GPA <3.
Observation: The students surveyed with “high” GPA have, on average, fewer friends on Facebook, than those surveyed with “low” GPA.
Observational unit: ______
Variables: _____
Population Inference: _____
Causal Inference: _____
Inferential language in the media
Get in groups of 3-4. I’ll provide you an article.
Discuss then answer:
What type of inference is being implied by the headline?
If population inference, describe the population(s)?
If causal inference, what is the treatment/intervention?
(It may be both or neither)
Rewrite the headline to clarify the inference you think is being implied.
If you have time, can you figure out from the body of the article: what the observational units were, what variables were measured, and what pattern was observed in the data?
Next time…
When are population and/or causal inferences justified?