ETF2121 – ETF5912 Data Analysis in Business
Semester 2, 2021
Assignment 3
Due Date: Friday 22 October 2021 (Week 12)
Due by 11.55pm, Melbourne time
This is an individual assignment worth 15%. You can submit your assignment early if
you wish. Unless otherwise specied, feel free to report any decimal places you like when
presenting your answers. Please include EViews outputs in your answers.
You are required to complete your answers in a Word document and then upload your
completed le in PDF through Moodle submission. Your assignment can be either (i)
entirely handwritten or (ii) entirely typed or (iii) a mixture of handwritten and typed answers.
You may take photos of handwritten answers and paste them on a Word document.
If you prefer not to use Microsoft Word, feel free to try using apps such as Camscanner
(available on iPhones and Android phones) to convert photos of your handwritten answers
directly into PDF. This app can combine multiple pages into one PDF document.
You can access Microsoft Word via the MoVE website https://move.monash.edu/
Please save your Word document as PDF with its ?le name as your name and student ID.
1
Special Consideration
If you are experiencing interferences with your studies that are outside of your control (e.g.
illness, carer responsibilities), you may be eligible for special consideration. You can request
a short extension of up to ?ve calendar days by contacting Kew (Chief Examiner) via the
following email: hsein.kew@monash.edu
You can request a longer extension of more than ?ve calendar days by contacting the uni-
versity via the following website:
https://www.monash.edu/studen…
Question 1 [2 marks]
An insurance company is thinking about o¤ering discounts on its life insurance policies to
nonsmokers. As part of its analysis, it randomly selects 1600 men who are 60 years old
and asks them if they smoke at least one packet of cigarettes per day and if they have ever
su¤ered from heart disease.
The table below shows that 500 of those in the smokers group su¤ered from heart disease,
while 282 of those in the non-smokers group su¤ered from heart disease.
Su¤er from heart disease
Group Yes No Total
Smokers 500 500 1000
Non-smokers 282 318 600
Construct a 95% con?dence interval for the di¤erence of population proportions of men
su¤ering from heart disease for smokers versus non-smokers, and interpret.
2
Question 2
Please include EViews outputs in your answers.
For ETF2121 students, this question will be marked out of 15 and this mark will be converted
to a mark out of 7 marks.
For ETF5912 students, this question will be marked out of 15 and this mark will be converted
to a mark out of 5 marks.
It is widely believed that workers with more education have, on average, higher wages than
workers with less education. The data set comprises a random sample of 500 full-time workers
from age 25 to 60 and is stored in the EViews work?le wages.wf1. It includes the following
variables for the workers.
wage = hourly wages in dollars
educ = years of education
exper = years of job experience
Consider the following model
wagei = 0 + 1 educi + 2 experi + “i:
(a) [1 mark] Use EViews to run a regression of wage on educ and exper: Write down the
sample regression line. Report the results to 4 decimal places.
(b) [0.5 marks] Jenny has 13 years of education and 13 years of job experience. What is
Jenny?s predicted hourly wages?
(c) [0.5 marks] John has 14 years of education and 13 years of job experience. What is
John?s predicted hourly wages?
(d) [0.5 marks] Calculate the di¤erence in predicted hourly wage between John and Jenny;
ie. John?s predicted hourly wages minus Jenny?s predicted hourly wages.
(e) [0.5 marks] In part (a), what is the value of ^1?
(f) [2 marks] Compare your answers in part (d) and (e). Are they the same? Explain why
or why not?
Consider the following non-linear regression model:
wagei = 0 + 1 educi + 2 experi + 3 exper
2
i + “i:
(g) [3 marks] Use EViews to run a regression of wage on educ; exper and exper2: Is there
evidence that exper has a nonlinear e¤ect on wage? Do the six steps of the test. Use
= 0:05:
3
Consider the following regression model:
ln (wagei) = 0 + 1 educi + 2 experi + “i:
(h) [2 marks] Use EViews to run a regression of ln (wage) on educ and exper: Interpret the
coe¢ cient 2:
(i) [2 marks] Using = 0:05; test to determine whether each of the independent variables
is linearly related to ln (wage) by using the p-value approach; ie. test the following
hypotheses, H0 : j = 0 vs HA : j 6= 0 for j = 1; 2:
(j) [2 marks] Test the overall utility of the model by using the p-value approach. Use
= 0:05:
(k) [1 mark] When testing the overall utility of the model, why do we prefer to use only one
test for testing jointly all the slope parameters in part (j) rather than multiple tests
for testing each independent variable as conducted in part (i)?
Question 3 [6 marks]
You are employed as an analyst in a consulting ?rm in Melbourne. Your consulting ?rm has
a consulting contract with a major housing construction company. You have been asked by
your manager to write a brief report that uses statistical techniques that you have learnt in
ETF2121-ETF5912 lecture material from Week 1 to Week 9 (inclusive) to characterise the
housing market in Melbourne.
The housing construction company wants to target its building plans. The company is
interested in knowing how housing prices (dependent variable) are a¤ected by the size of
the house and the number of bedrooms. The company is interested to know whether four-
bedroom houses sell for more than three-bedroom houses. Also the di¤erence in the price of
a house that has a nice view compared to a house that does not have a nice view.
The construction company has given you a data set in an Excel ?le (housing.xls) that contains
information about 88 randomly selected houses in Melbourne to undertake this assignment.
The Excel ?le contains the following variables:
- hprice is the selling housing prices in dollars
- hsize is the size of house in square-meters
- bdr is the number of bedrooms
- view = 1 if house has a nice view
= 0 if house does not have a nice view
Please refer to Tutorial 1 (Week 2) Question B1 part (a) if you would like to revise on how
to read the data in the Excel ?le into EViews. Remember that every tutorial is recorded.
4
Your brief report should contain all of the empirical results using the data provided. For
example, your brief report could include simple/multiple regression models, hypothesis
testing and interpretation of the empirical results.
The aim of this brief report is to allow students to undertake statistical analysis by using
the techniques taught in lectures to investigate a real-world problem. This question is inten-
tionally open ended and so there are not necessarily “right or wrong answers”. The
quality of your brief report counts. For example, if you wrote “So many people wear heavy
coats during winter because they want to stay warm” would receive more marks than if you
wrote “So many people wear heavy coats during winter because they are fashion-conscious”.
Remember you are an analyst writing a brief report for your boss :) Of most importance
is a correct justi?cation of your empirical results. Feel free to use Excel functions to report
some of the empirical results if you wish. You can type your brief report, but feel free to
handwrite some parts or handwrite all of your brief report if you like.
Your brief report, ideally, does not exceed 600 words, excluding tables and graphs.
Question 4. ETF5912 students ONLY
This question will be marked out of 10 and this mark will be converted to a mark out of 2
marks.
Absenteeism is a serious employment problem in most countries. It is estimated that ab-
senteeism reduces potential output by more than 10%. A management consulting ?rm
launched a project to learn more about the problem. They randomly selected 100 companies
to participate in a one-year study. For each company, they recorded the average number
of days employee absent and several independent variables thought to a¤ect absenteeism.
The dataset is stored in the EViews work?le ?absent.wf1?. The dataset has the following
variables.
ABSENT – average number of days employee absent
WAGE – average employee wage (in dollar)
PT – percentage of part-time employees
SHIFT_no – availability of shiftwork (1 = no; 0 = yes)
The variable SHIFT_no is a dummy (indicator) variable. Shiftwork is an employment prac-
tice designed to divide the day into day-shift and night-shift. Some companies only have
day-shift and hence no availability of shiftwork (ie. SHIFT_no = 1). Some companies have
both day-shift and night-shift and hence availability of shiftwork (ie. SHIFT_no = 0).
5
Consider the multiple regression model
ABSENTi = 0 + 1 WAGEi + 2 PTi + 3 SHIFT_noi + “i:
(a) [2 marks] Do you think 1 and 3 will have obvious anticipated signs? Justify your
answer.
(b) [1 mark] Use EViews to estimate the multiple regression model. Interpret the coe¢ cient
3.
(c) [2 marks] Explain whether the following statement is true or false: holding ?xed all
other independent variables, increasing the WAGE by $1000 is associated, on average,
with an increase in the ABSENT by 2 days.
(d) [2 marks] Can we infer, at the 5% signi?cance level, that SHIFT_no is related to
absenteeism? Do the six steps of the test.
(e) [3 marks] Test the hypothesis that neither PT nor SHIFT_no a¤ects absenteeism. Use