关于程序员:STATS101

STATS101
S1.21 STATS101/101G/108
2/22
1
BLOCK 1
These questions are worth one mark each.

Pick the option that correctly completes the statement.
Data from a categorical variable are:
Pick the option that correctly completes the statement.
A study which observes the same group of individuals or units over a long period of time is called
a:
Pick the option that correctly completes the statement.
Consider a well-designed experiment involving a group of volunteers.
A tail proportion of less than 5% in the randomisation test allows us to make:
Pick the option that correctly completes the statement.
Using random sampling:
Pick the option that correctly completes the statement.
A bootstrap confidence interval may be interpreted as an interval:
group or category names for each entity.
measurements or counts taken on each entity.
cross-sectional study.
longitudinal study.
sample-to-population inference.
experiment-to-causation inference.
allows for the calculation of the likely size of sampling errors.
will guarantee representative samples.
of plausible values for the parameter.
within which the parameter is certain to lie.
STATS101/101G/108 – 1213
3/22
Pick the option that correctly completes the statement.

All other things being equal, bigger sample sizes give:

Pick the option that correctly completes the statement.

The null hypothesis, H , is the:

Pick the option that correctly completes the statement.

When conducting a t-test a plot of the sample data is used to check for evidence of:

Pick the option that correctly completes the statement.

For a Chi-square test for independence, there will be evidence against the null hypothesis if there
are relatively:

Pick the option that correctly completes the statement.

The sign (+ or -) of the sample correlation coefficient, r, is:

Note: For Questions 11 to 20 be careful which option you choose because the order of the
True/False options may change from question to question.

Decide whether this statement is True or False.

wider confidence intervals.
narrower confidence intervals.
0
hypothesis we test.
research hypothesis.
non-Normal features.
independence.
small differences between the observed and expected counts in one or more cells.
large differences between the observed and expected counts in one or more cells.
not necessarily the same as the sign of the slope of the least squares regression line.
always the same as the sign of the slope of the least squares regression line.
STATS101/101G/108 – 1213
4/22
For highly skewed data the sample median is a more sensible measure of the centre than the
sample mean.

Decide whether this statement is True or False.

An observational study can be used to reliably establish the cause of an effect.

Decide whether this statement is True or False.

Under chance alone, when comparing two groups, the difference we observe would purely and
simply be due to which units just happened to have ended up in which group and nothing else.

Decide whether this statement is True or False.

Taking larger samples will not reduce the effects of selection bias and other nonsampling errors.

Decide whether this statement is True or False.

We can be certain that the true value of a population parameter is somewhere in a bootstrap
confidence interval for that parameter.

Decide whether this statement is True or False.

The level of confidence is the long-run success rate for a method which aims at producing
confidence intervals which contain the unknown value of the parameter.
False
True
True
False
True
False
True
False
False
True
STATS101/101G/108 – 1213
5/22

Decide whether this statement is True or False.

Statistical significance implies practical significance.

Decide whether this statement is True or False.

If the P-value for an F-test for one-way analysis of variance is large then the differences we see
between the sample means could be due to chance alone.

Decide whether this statement is True or False.

The greater the value of the Chi-square test statistic, the weaker the evidence against the null
hypothesis.

Decide whether this statement is True or False.

The correlation coefficient measures the strength and the direction of a linear relationship between
two numeric variables.

Maximum marks: 20
STATS101/101G/108 – 1213
7/22
2 Block 2: Questions 21 to 24
These questions are worth two marks each.

Questions 21 to 24 refer to the information in Appendix A.

Which one of the following statements about the study is false?
Refer to Figure 2.

Which one of the following statements could be false?

Refer to Figure 3.

Which one of the following statements is false?

This study is an experiment because the participants were randomly allocated to either the
TimeRestriction group or the NoTimeRestriction group.
The response variable was ReportedNumber.
The researchers were blinded because they did not know what number the participants
actually rolled.
The NoTimeRestriction group was the control group.
This study had a completely randomised design.
There were more participants in the NoTimeRestriction group who actually rolled a 1 than
participants in the TimeRestriction group who actually rolled a 1.
The standard deviation of the ReportedNumber for the NoTimeRestriction group is higher
than that of the TimeRestriction group.
The median ReportedNumber for the NoTimeRestriction group is less than that of the
TimeRestriction group.
Numbers less than 3 were reported less often by participants in theTimeRestriction group
than were reported by those in the NoTimeRestriction group.
Participants in the TimeRestriction group tended to report a higher number than participants
in the NoTimeRestriction group reported.
STATS101/101G/108 – 1213
8/22

Suppose that the researchers were also interested in seeing if the underlying mean time taken
to report their number by those in the NoTimeRestriction group was different to the underlying
mean time taken to report their number by those in the TimeRestriction group.

Let be the difference between the underlying mean time taken to report their number
by those in the NoTimeRestriction group and the underlying mean time taken to report their
number by those in the TimeRestriction group.

Which one the following are a correct pair of hypotheses for this test?

We have evidence that the time restriction caused the participants in the
TimeRestriction group to roll higher numbers.
The P-value for this randomisation test is less than 5%.
We have evidence that chance was not acting alone in the actual study.
We may claim that the time restriction had an effect on the mean ReportedNumber.
We have evidence that Group together with chance produced the observed result.
μNTR ? μTR
H0 : ˉˉxˉ NTR ? ˉˉxˉ TR ≠ 8
H1 : ˉˉxˉ NTR ? ˉˉxˉ TR = 8
H0 : μNTR ? μTR = 0
H1 : μNTR ? μTR ≠ 0
H0 : ˉˉxˉ NTR ? ˉˉxˉ TR = 0
H1 : ˉˉxˉ NTR ? ˉˉxˉ TR ≠ 0
H0 : μNTR ? μTR ≠ 0
H1 : μNTR ? μTR = 0
H0 : μNTR ? μTR = 8
H1 : μNTR ? μTR ≠ 8
Maximum marks: 8
STATS101/101G/108 – 1213
9/22
3 BLOCK 3: Questions 25 to 30
These questions are worth two marks each.

Questions 25 to 30 refer to the information in Appendix B.

Which one of the following could not be present in the data collected?

Questions 26 and 27 refer to Figure 4 and the accompanying information.

Which one of the following statements is false?
Suppose that it was decided to use t-procedures to calculate a 95% confidence interval for the
difference between the proportion of those interested in politics who said that they had voted and
the proportion of those not interested in politics who said that they had voted.

The sampling situation for calculating the standard error of the estimate is:

Nonresponse bias
Interviewer effects
Question effects
Behavioural considerations
Sampling error
The bootstrap confidence interval includes the difference in the sample proportions.
The smallest sample proportion is the proportion of those not interested in politics who said
that they did not vote.
It’s a fairly safe bet that the proportion of those interested in politics who said that they had
voted is somewhere between 11 and 19 percentage points higher than the proportion of
those not interested in politics who said that they had voted.
The majority of respondents said that they had voted.
In every resample the difference in percentage points was more than five.
STATS101/101G/108 – 1213
10/22

Questions 28 to 30 refer to Figure 5 and the accompanying information.

The test-statistic, , for this t-test is approximately:
Which one of the following statements is not a correct interpretation of the P-value for this t-
test?
A 95% confidence interval for is (0.06, 0.12). Suppose that we wish to calculate a
90% confidence interval using the same data.

Which one of the following statements is false?
one sample of size 3207, several response categories.
one sample of size 3412, several response categories.
two independent samples of sizes 3207 and 205.
one sample of size 3412, many yes/no items.
two independent samples of sizes 385 and 3027.
t0
0.09
0.03
4.32
5.59
1.96
At the 5% level of significance we can reject the null hypothesis.
At the 10% level of significance we can reject the alternative hypothesis.
At the 5% level of significance we can claim that is greater than .pN pL
At the 1% level of significance we can claim that is greater than .pN pL
The observed difference is a statistically significant result (at the 5% level).
pN ? pL
STATS101/101G/108 – 1213
11/22

The 90% confidence interval will:

be calculated using the same t-multiplier.
be narrower than the 95% confidence interval.
not include zero.
be calculated using the same standard error.
have a smaller margin of error.
Maximum marks: 12
STATS101/101G/108 – 1213
12/22
4 BLOCK 4
These questions are worth two marks each.

Questions 31 to 42 refer to the information in Appendix C.

Refer to Figure 6.
Which one of the following statements is false?