ECON30130 Econometrics
R Project – Deadline April 10
Dr Benjamin Elsner
Rules & Guidelines
Ground rules
This assignment counts 30% of your final grade. You have to work through a set of tasks using R, and write
up your answers using Word, LaTeX, or R Markdown. The rules are as follows:
Below you will find a set of tasks. Please answer all questions and work through all tasks. There is no
word or page limit, but please be concise.
Deadline is April 10, 2022, at 11:59:59pm
For late submissions, UCD’s Late Submission of Coursework policy applies.
Papers are to be submitted on Brightspace → Assessment → Assignments
Submissions should be in one pdf, and should include: 1) the write-up of the assignment, 2) the R code.
Students are allowed to work in groups of up to five. If students work in a group, only one group
member should submit the paper on Brightspace. On the first page of the paper it should be clearly
stated that this was a group project and the names and student numbers of the group members should
be given.
UCD’s Student Plagiarism Policy will apply. I reserve the right to run plagiarism checks on Brightspace.
Questions should be posted on Brightspace.
A solution will not be provided after the deadline.
Grading
Students will receive a letter grade for this assignment. Grading is based on the following criteria:
Correctness of the analysis and interpretations
Writing (clear and concise)
Exposition: are graphs and tables done well? They don’t need to look fancy, but it has to be clear
what is shown. For regression tables, please use stargazer or alternative packages that give you nicely
formatted regression tables.
Bonus: a higher grade (1 notch, e.g. from B+ to A-) is given if all of the following are done:
project written with R Markdown (can be done via RStudio); please indicate on the first page if
- you do so; for an introduction, see here
all graphs and tables have been programmed with R, i.e. no copy & paste anywhere
all graphs done with ggplot (but not with the default grey background);
tidyverse functions (especially the pipe operator) are frequently used.
Some tips
The aim of this assignment is to get students to“figure things out.”In the tutorials, clear instructions and
coding examples were given along with a clean data set. However, this is far away from the work data analysts
1
are doing. Their projects typically have a clear goal, but the data are often messy and it is unclear how to
reach the goal of the analysis. Simply put, the analyst has to“figure things out”: how to best clean the
data set, how to best visualise data, how to bring the data into a format that is suitable for visualisation
and regression analysis, etc. If you’re working in a company, you neither refuse to do a project because“we
haven’t learned about a certain procedure in class”, nor can you run to your manager with every little error
message you encounter. Ultimately, data analysts are paid for solving problems themselves or collaboratively
with team members. The sooner you get into that mindset, the better. This assignment is similar to a project
one would encounter in a data analytics project.
How to figure things out?
Google is your friend. Get a strange error message? Type it into google; chances are someone else
had the problem before. You can also search StackOverflow, the forum for all things programming (R,
Python, C++, etc)
If one solution doesn’t work, try another one. Solving problems is often frustrating; it takes time and
a decent bit of grit. So if you encounter a problem, solve it or find a way around. There is always a
solution!
Preparation
For some of the tasks below, you will need to know how to incorporate binary variables into a regression.
Once you know how regression works, this is pretty straightforward. Here are some sources you may want to
consult:
? When a regressor is a dummy: Chapter 5.3 in Stock & Watson; here is a good video
? When the dependent variable is a dummy (also called linear probability model): Chapter 11.1. in Stock
& Watson. See also this video, this video and this video. The latter video is based on Stock & Watson’s
materials.
A. Theory Tasks
Suppose you want to quantify the extent of discrimination in an online market. You have data on all the
sales of a given product (say a smartphone) that took place on an online auction site in the U.S. in 2015.
You observe whether a product was sold, at what price, and whether the seller is a member of an ethnic
minority. On the auction site, consumers don’t directly observe minority status, but they can infer it from
the first names of the sellers. - Suppose you want to estimate the effect of minority status (i.e. a dummy that equals one if a person
belongs to a minority and zero if the person is white) on the sale price. Write down a regression equation
that would allow you to estimate this effect. - Explain what parameter you are interested in estimating and provide an interpretation of this parameter.
- Discuss the random sampling assumption and the conditional independence assumption (in the lecture
it was E(u|X) = 0). Are these assumptions fulfilled in this case (explain why or why not)? Explain
intuitively the likely consequences of these assumptions (not) being fulfilled for estimating the effect of
interest. - If you could run an experiment (regardless of ethical considerations) to estimate the effect of interest,
what would this experiment look like and why? (N.B.: the ideal experiment asked for here is different
from an experiment described further below.)
B. Empirical Analysis
Introduction
Jennifer Doleac and Luke Stein ran an experiment on ebay small ads, a platform that lists classified ads in
local markets in the U.S. (similar to adverts.ie in Ireland). In their experiment, they put up ads for new ipod
nanos. Their goal was to study whether buyers discriminate between black and white sellers, i.e. whether
they are less likely to contact a black seller, make lower offers and are less friendly in correspondence. To
experimentally vary the race of the seller, they showed the ad listings with a photo in which the same ipod is
held by a white hand, a black hand, or a white hand with a wrist tattoo (which buyers may see as a sign
of lower social status). In addition, they experimentally varied the quality of the ad text and whether the
ipod was held in the right or left hand (such that not all ads look the same), and the asking price (between
three price points). Each ad was online for 12 hours. The authors collected information on the number of
responses, the number of offers, the friendliness of the responses, the amount offered, among others.
Paper and data
You can find the paper here and on Brightspace:
? Doleac, J.L. and Stein, L.C. (2013), The Visible Hand: Race and Online Market Outcomes. Econ J,
123: F469-F492. https://doi.org/10.1111/ecoj….
Along with the assignment on Brightspace, you find the dataset data_doleacstein.dta, which is in Stata
.dta format. We will use this dataset for the analysis to follow. Each observation is one email that was sent.
The main variables for our analysis are shown in Table 1.
Tasks - Load the dataset into R and produce a table of summary statistics (number of observations, mean,
sd, median, min, max, number of missing observations) for the variables all variables listed in Table - except ad and texttype. Interpret the mean of responses, offers, white, black, tattoo, and
polite. - Generate a new dummy variable anyresponse that equals 1 if an ad received at least one response.
3
Table 1: Main Variables
Variable name Content
ad ad ID (for authors’use only)
responses number of responses received
price asking price
offers number of offers received
bestoffer best offer received for ipod
meanoffer mean offer if there were multiple offers
name dummy: 1 if buyer signed response with name
polite dummy: 1 if response was polite
texttype indicator for quality of text; 0=high quality, 1=medium quality, 2=low quality
black dummy: 1 if seller is black
tattoo dummy: 1 if seller is white and has a wrist tattoo
white dummy: 1 if seller is white (without a wrist tattoo) - Produce a frequency table for the number of ads that were put up for each seller type (black, white,
tattoo). The table should include the number of ads per seller type (absolute numbers and shares,
i.e. the share of ads that were assigned to a particular seller type). - Produce a frequency table with seller types on the horizontal and asking prices (90, 110 and 130 USD)
on the vertical axis. Each cell should show the share of all ads that were put up by a given seller type
for a given asking price (hint: search for cross tabulation). Do not show the absolute numbers, only the
shares. What does the result tell you about the quality of the randomisation in the experiment? - Run t-tests comparing the difference in means between white and black sellers for the following variables:
anyresponse, bestoffer, meanoffer, polite. The results of the t-tests should be presented in a table
that shows the following: each row is a variable; columns: mean of the variable for Whites, mean of
variable for Blacks, difference in means between Whites and Blacks, p-value of t-test. Interpret your
findings regarding magnitude and statistical significance.(Hint: you can use t.test which will save the
results of each t-test in an object that you can see under“Environment”. You can then combine these
objects to a table.) - Regress the dummy anyresponse on the dummies black and tattoo. Interpret the coefficients of the
slopes and intercept, comment on statistical significance, and compare your results to those in the table
produced in 4. - Another way of analysing the results of an experiment like this is through bar charts with error bars.
You plot the means for the treatment and control group and attach to each bar a so-called error bar
(y ± sd(y)). The error bars give an indication of the variation in each seller group. Produce such a
chart (separate bars for black sellers, white sellers, and sellers with a tattoo) for the following outcomes:
bestoffer, meanoffer. - Not only did the researchers randomise whether the seller is black, but they also randomised the quality
of the ad text. Create dummies highquality (1 if text of high quality), and mediumquality (1 if text
of medium quality). Run a regression of black on highquality and mediumquality and interpret
your result. Comment on the meaning of this result for the experimental design.