Assignment 4
ECO4116
Due date: Wednesday – November 24, 2021 at 11:59pm
Note: Assignments are to be uploaded on Brightspace (assignments sent through email will not
be accepted). The uploading window will close as of 11:59 pm on November 24th, 2021, and no
further uploads will be possible beyond that point.
You must upload your assignment 4, and your STATA log file.
Background information: Faced with continued deficits (in part due to the advent of Covid-19)
the Canadian government have decided to re-evaluate its expenditures. More precisely, it wants
to reduce its overall expenditures, and re-allocate resources to where they will do most good.
Given the importance of education expenditures, as a proportion of overall spending, the Ontario
government has decided to take a critical at where it spends its education dollars.
You have been hired as a consultant to investigate the post-secondary education portfolio. As
part of the study, the government wants to better understand the labour market outcomes of
university educated workers, as compared to, say, those that only have graduated high school, to
see whether it worth spending so much of the budget on post-secondary education.
This fourth assignment represents another step in the analysis. For this assignment you are to use
microdata to estimate the return to education, i.e. carry out regression analysis. You are also to
include components that are typically found in academic papers.
This assignment builds on assignment 3.
You are to load the August 2021 Canadian Labour Force Survey (LFS). It can be accessed
through the ODESI database that is available through the library website. You should download
both the LFS dataset and also the codebook. The codebook describes the variables that are
available in the LFS files.
A STATA do-file will be provided on Brightspace which you can modify for your analysis.
You must create a working dataset that consists of the hourly wage (in dollars), a gender dummy
(a female or male dummy, your choice), three or four education dummies (your choice). Note
that your education dummies must be mutually exclusive and exhaust the sample of interest. You
are also to have two additional variables that you believe belongs in the equation.
For this assignment you are to carry out three regressions – see below for details.
General Instructions
You must submit two documents.
The first document is your assignment 4. You must address all the points raised in your
third assignment.
The second document is the STATA log file in print format (that was generated through a
do-file).
Write-up Instructions for the fourth assignment
General
o The assignment must be typed. Handwritten assignments will not be accepted.
o You must also use New Times Roman (12-point) font.
o 2.54 cm margins (left, right, bottom, and top)
o Line spacing – one and a half spacing.
o The text must be justified (i.e., flush on both the left and right side).
Specific writing detail
o The text must follow a paragraph structure (no bullets), with full sentences.
o This is a formal piece of writing, and as such, you should not include any
abbreviations (e.g., approx.) or contractions (e.g., don’t).
o You should use your own words, and not directly quote other researchers.
o What I care about is the conclusion you can draw from your figures, and not what
other think or say.
o No reference section is needed for this preliminary report, as you are only to
report on what your figures can conclude.
o Paragraphs should not exceed a page.
Figures and tables
o In research papers, one refers to figures (and tables) by their figure (table)
number. For example,“Figure 1 shows life on mars…”.
o Figures (and tables) should have a title (e.g., Figure 1: Life on Mars), and a
legend if more than one curve is shown. Notes at the bottom of the figure provide
any additional information required.
o A figure (table) should be self-explanatory. That means the reader does not need
to read the text to understand the figure (table).
Structure of the second assignment (second document)
o Titlepage – the title page must include an appropriate title (that reflects the work
carried out), the date, followed by your full name(s) and student number(s).
o Introduction – The introduction should not exceed a page and is to be made up of
four paragraphs. The first paragraph is a general introduction that is meant to grab
the reader’s attention. The second paragraph describes what the paper does, i.e.
that it estimates the return to education. You should also (briefly) tell the reader
the data you are using. The third paragraph presents the main findings (of the
regression analysis). The final paragraph provides an overview of the paper (e.g.
The rest of the paper is divided as follows. Section 2provides a brief overview of
the human capital literature. Section 3 discusses the data used in the empirical
analysis and Section 4… … In Section 5, I discuss the results in Section 4. Finally,
Section 5 concludes.)
o Literature review – this section cannot exceed a page in length. You are to have
an introductory paragraph, followed by three paragraphs that discuss three papers
to look at the return to education. These can be published papers or working
papers.
o Data – the data sections should have three or four paragraphs and should not
exceed 2 pages in length. Your summary statistics (and regressions) must now be
weighted. You will need to tell the reader (in your text) that the means (and
regressions) are weighted. Some put it in the main text, whereas others like to put
that information in a footnote, your choice. You also need to tell the reader at the
bottom of your tables (in the notes) that you are using weights. In the summary
statistics table, you could have something like“All means are weights”. In the
regression table you could add something like,“All regressions are weighted.”
This means that you will need to adjust the numbers in your table and als may
need to make adjustments to the text.
o
§ First paragraph – provides a brief overview of the dataset. The discussion
must include the name of the dataset, and the year and month you are
using. It must also include a short description of the LFS, i.e. its purpose.
For example,
For the empirical analysis of this paper, I rely on the public-use files of
the Labour Force Survey (LFS) data for August 2021. The LFS is a
Statistics Canada dataset whose purpose is to…[you add here an
overview of the dataset, i.e. what is its stated goal, and other information
that is relevant]
§ Second paragraph – provides a description of the sample restrictions, and
why the restrictions were imposed. You must restrict your sample to those
that are working, but not self-employed. You must also have a lower and
upper bound for age. For example,
The sample is restricted to employees who are 25 to 64 years [you do not
have to choose the same lower and upper bound, this is just an
illustration] of age. The focus is on employees only because the wage
question is only asked of those that are working and who are not self-
employed. The lower age restriction is imposed because…[you explain
why you chose that lower bound] The upper age limit reflects…[you
explain why you chose that upper bound]
§ Third and fourth paragraph – It provides a brief discussion of your table of
summary statistics. For example,
Table 1 presents the summary statistics of the sample…[you discuss some
of your numbers. You discuss the means, not the standard deviations. So
if the mean for your female binary variable was 0.500, you would simply
state that females represent 50.0% of the sample.]
o Econometric model – no more than 2/3 of a page. This section presents the
econometric equation and provides a clear definition of all variables of the model.
You must also tell the reader that you will be estimating the econometric model
by sequentially adding controls. The first specification regresses the wage (or the
log of the wage, your choice) on the education dummies. In the second
specification you add the gender dummy, and in the last specification you also
add the two variables of your choice. The equation you present will be the full
specification. For example,
The econometric model takes the following form:1
wagei = β0 + β1femalei + β2dropouti + β3certificatei+β4 bachelori +
- β5graduate+ …[include here the two variables with their respective
parameters]…+εi (1)
where wagei is the hourly wage of individual i. femalei is a binary variable that
equals one if female, and zero otherwise. The three education variables are
binary variables representing the highest educational attainment of the
individual. The certificate category includes any post-secondary certificate or
diploma below a bachelor’s degree. Finally, the graduate category includes all
graduate degree including masters and Ph.Ds. High school graduates are the
reference group….[you add her the description of your two
variables]…Equation (1) will be estimated where the controls are added
sequentially. In the first specification, I regress the wage on the education
dummies. I then add a gender control to the second specification. Finally the last
specification also includes…[mention the two variables that you defined above]
Note: for the third specification you must include two additional variables that
you believe relevant. In addition, the variable names and the i subscript in the text
should be in italics, e.g.“…where wagei is the hourly wage of individual i.”
o Results – no more than 2/3 of a page. This section discusses regression findings.
You must discuss economic and statistical significance. Are your education
results robust to the addition of controls? Does it have the profile one would
expect?
o Conclusion – no more that ? page that discusses your main findings.
o References – you must have reference entries for each of the three papers
discussed in your literature review. You must use a reference format that is
commonly used in economics.
o Tables – the table of summary statistics and the regression results are to be on
separate pages following the references.
1 If you decide to have the log of the hourly wage as your dependent variable, you would have ln(wage)i (or lwagei)
or the LHS of your equation. In your discussion below you would have“…where ln(wage)i is the log of the real
hourly wage…”
§ The table of summary statistics must include the mean and standard
deviation of each of your variables. The format of the table should follow one
observed in economic journals.
§ You must also include a regression results tables that provides the
parameter estimates and the standard errors. You must have three columns of
results, one for each regression.