To make accurate predictions of parameters, the sample used should be a representative sample of the population, where the sample is very reflective of the characteristics of the population. A good, representative sample provides the researcher with a "miniature mirror" to view an entire population.
There are two basic techniques for achieving representative samples: random sampling and stratified, or quota, sampling.
Random Sampling
Random sampling is one of the media's misused terms. For example, newspapers claim to have selected a random sample of their readers, or television stations claim to have interviewed a random sample of city residents. However, all of these selections were done in an unorganized, careless way, which is not what random sampling does.
Random sampling requires that each member of an entire population has an equal chance of being included in the sample and no members of an entire population may be systematically excluded. If a researcher was trying to get a random sample of a population of students at a college, the researcher cannot simply select from the students who are free enough in the afternoon to meet with them. This would then exclude all of the students that are not free at that time. Unless the entire population is available for selection, the sample cannot be random.
For the researcher to obtain a random sample from the college, they would have to go to the registrar's office and get a list of names of the entire student population, then select at random the sample. Then the researcher must go out and find each student selected for the sample, because a sample can never be random if the subjects are allowed to select themselves. For example, instead of the researcher finding each student, they send each selected student an e-mail requesting to fill out a survey. However, only a few of the selected students fill out the survey. This is not a random sample, because the researcher allowed the subjects to select themselves on the basis of which the subjects decided whether or not they felt like filling out the survey. The subjects that did comply may have differed systematically on many other traits from the subjects who simply ignored the e-mail.
Stratified, or Quota, Sampling
Another major technique for selecting a representative sample is known as stratified, or quota, sampling. This technique is sometimes combined with random sampling, where once the strata has been identified, random samples from each subgroup are selected. This is then called stratified random sampling.
To obtain a stratified sample, the researcher must know what some of the major characteristics are of the population, then deliberately select a sample that shares these characteristics in the same proportions. For example, if 35% of a student population are sophomores and 60% of those sophomores are majoring in business, then a quota sample of the population must have the same percentages.
Sampling Error
Whenever a sample is selected, it must be assumed that the sample measures will not be quite exact as the measures obtained from the entire population.
To distinguish from the sample mean, M or x bar, the mean of the population is symbolized with the Greek letter mu, μ.
Then the sampling error is the difference between the sample mean and the population mean.
Note that the sample mean is expected to possibly deviate from the population mean; this is a normal, expected deviation. Sampling error is not a mistake, it is an expected amount of deviation. Additionally, sampling error should be random, where it can go either direction. The sample mean is just as often below the population mean as it is above the population mean. Therefore, if the means of 100 random samples from a given population were calculated, 50% of the resulting sampling errors will be positive (which is when the sample mean overestimates the population mean) and the other 50% of the resulting sampling errors will be negative (which is when the sample mean underestimates the population mean). Therefore, using probabilities, the probability of the sampling error being positive is P = 0.50 and the probability of the sampling error being negative is P = 0.50.
Outliers
When one or two data values in a large random sample fall very far from the mean, such as 5 or 10 standard deviation units away, these data values are called outliers.
Outliers indicate either that the distribution is not normal or that some measurement error has occurred. When outliers occur, they can increase the standard deviation to more than one-sixth of the range, making the distribution look platykurtic. When it is clear that an outlier was not produced from bias, most researchers discard the outlier.
Bias
Whenever the sample differs systematically from the entire population it was taken from, bias has occurred. Since researchers usually deal in averages, bias is defined as a constant difference in one direction between the mean of the sample and the mean of the population.
For example, the mean verbal S.A.T. score at a college is 500. If a researcher only selected for their sample students who did a poor performance in their English placement test, it will be likely that the mean verbal S.A.T. score among the sample is lower than the mean verbal S.A.T. score of the population.
Bias occurs when most of the sampling error falls on one side, where the sample means are consistently either overestimating or underestimating the population mean. Bias is a constant sampling error in only one direction. When there is bias, the probability that the sample mean is higher than the population mean is no longer P = 0.50. The probability may now be P = 0.10, or P = 0.90, or P = 1.00, or P = 0.00.