Note: For PS 0, I did not grade your answers for correctness. Instead, we discussed these again in class on 9/2. Here are a few comments/reminders for that discussion.
Case: a person or things in your study
Variable: an item of information about each case
Example: if we have a study where the case are people (often called subjects), then variables might include height, age, sex, favorite color, etc.
Population: the people or things you want to know about
Sample: the people or things in your data (the ones you do know about).
Ideally, the sample is a representative subset from the population. If possible, we employ some sort of random process to do the selection of individuals from the population to be in our sample.
These words describe variables.
In a cause-and-effect situation, an explanatory is a (potential) cause, and a response measures the effect.
We can use these terms even when the relationship isn’t causal. We might use explanatory variables to predict the values of response variables, even if the association between them is not causal.
Both of these are numbers.
Parameter: A number that describes a feature of a population.
Statistic: A number that describes a features of a sample.
We can calculate statistics from our data (we generally let computers do that work for us), but we usually don’t know the paramters.
Part of statistics is using sample statistics to estimate population parameters.
Types of variables
Numerical: a variable measured on some scale, typically with units, or a count. Examples: number of siblings, height, weight, age, length, area, volume.
Categorical: a variable that puts cases into groups. (Each possible value is called a level.) Examples: sex, favorite color, handednesses, smoker/non-smoker, etc.
Experiment: Researcher determine the values of one or more variables (usually by some random process)
Advantage to a randomized experiment: easier to establish causal relationships
Some kinds of experiments may not be ethical because it would not be right for the researcher to treat subjects in certain ways.
Observational study: The researchers do not determine the values of any of the variables, they merely observe, measure, record.
Harder to establish causal relationships
The setting may be less artificial than in an experiment. That may make it easier to generalize to the population of interest. (This depends on whether an experiment can be done in a way that isn’t unnatural.)
In statistics, the words bais or biased (and its oposite unbiased) do not mean the same thing as prejudice. (Although prejudice may lead some to use a biased statistical method.)
A biased estimate in statitics is one the tends to be too high or too low. If statisticians can determine the amount of bias in an estimate, they will correct for it to make the estimate unbiased. But sometimes the amount of bias is unknown or difficulat to estimate.
Example: Let suppose a small school has five families. These families have 1, 2, 3, 4, and 5 kids. So the average number of kids in a family is 3.
But if we ask each kid, the answers will be
1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5
The average of these numbers is 3.7. This is an overestimate.
The source of this over estimate is that there are more kids in large families, so if we sample kids, we are more likely to sample from large families than from small families. (And we will never sample from families with no kids.)