共计 5494 个字符,预计需要花费 14 分钟才能阅读完成。
2021/4/12 Assignment – Blog Post 1 – My PIC16B Blog – Awesome Python Student
Make enough progressthat you can benet
from your peers’feedback.
This means that you should have“decent”versions of all the required components by the
time you’re submitting your draft. This is enough for full credit on the draft stage. In the
revised submission, I’ll also be grading on code quality, style, and expository writing.
Prompt
§1. Create a Database
First, create a database with three tables: temperatures, stations, and countries.
Information on how to access country names and relate them to temperature readings is in
this lecture. Rather than merging, as we did in the linked lecture, you should keep these as
three separate tables in your database.
Make sure to close the database connection after you are nished
constructing it.
§2. Write a Query Function
Write a function called query_climate_database() which accepts four arguments:
country, a string giving the name of a country for which data should be returned.
year_begin and year_end, two integers giving the earliest and latest years for which
should be returned.
month, an integer giving the month of the year for which should be returned.
The return value of query_climate_database() is a Pandas dataframe of temperature
readings for the specied
country, in the specied
date range, in the specied
month of the
year. This dataframe should have columns for:
The station name.
The latitude of the station.
The longitude of the station.
The name of the country in which the station is located.
The year in which the reading was taken.
The month in which the reading was taken.
2021/4/12 Assignment – Blog Post 1 – My PIC16B Blog – Awesome Python Student
https://pic16b.github.io/HW-1/ 3/6
The average temperature at the specied
station during the specied
year and month.
(Note: the temperaturesin the raw data are already averages by month,so you don’t have to do
any aggregation at thisstage.)
For example:
query_climate_database(country = “India”,
year_begin = 1980,
year_end = 2020,
month = 1)
3152 rows × 7 columns
§3. Write a Geographic Scatter Function for Yearly Temperature Increases
In this part, you will write a function to create visualizations that address the following
question:
How doesthe average yearly change in temperature vary within a given
country?
Write a function called temperature_coefficient_plot(). This function should accept ve
explicit arguments, and an undetermined number of keyword arguments.
country, year_begin, year_end, and month should be as in the previous part.
NAME LATITUDE LONGITUDE Country Year Month Temp
0 PBO_ANANTAPUR 14.583 77.633 India 1980 1 23.48
1 PBO_ANANTAPUR 14.583 77.633 India 1981 1 24.57
2 PBO_ANANTAPUR 14.583 77.633 India 1982 1 24.19
3 PBO_ANANTAPUR 14.583 77.633 India 1983 1 23.51
4 PBO_ANANTAPUR 14.583 77.633 India 1984 1 24.81
… … … … … … … …
3147 DARJEELING 27.050 88.270 India 1983 1 5.10
3148 DARJEELING 27 050 88 270 India 1986 1 6 90
2021/4/12 Assignment – Blog Post 1 – My PIC16B Blog – Awesome Python Student
https://pic16b.github.io/HW-1/ 4/6
min_obs, the minimum required number of years of data for any given station. Only data
for stations with at least min_obs years worth of data in the specied
month should be
plotted; the others should be ltered
out. df.transform() plus ltering
is a good way to
achieve this task.
**kwargs, additional keyword arguments passed to px.scatter_mapbox(). These can be
used to control the colormap used, the mapbox style, etc.
The output of this function should be an interactive geographic scatterplot, constructed
using Plotly Express, with a point for each station, such that the color of the point reects
an
estimate of the yearly change in temperature during the specied
month and time period at
that station. A reasonable way to do this is to compute the rst
coefcient
of a linear
regression model at that station, as illustrated in these lecture notes.
For example, after writing your function, you should be able to create a plot of estimated
yearly increases in temperature during the month of January, in the interval 1980-2020, in
India, as follows:
assumes you have imported necessary packages
color_map = px.colors.diverging.RdGy_r # choose a colormap
fig = temperature_coefficient_plot(“India”, 1980, 2020, 1,
min_obs = 10,
zoom = 2,
mapbox_style=”carto-positron”,
color_continuous_scale=color_map)
fig.show()
−0.05
0
0.05
0.1
Estimated Yearly
Increase (°C)
Estimates of yearly increase in temperature in January
for stations in India, years 1980 – 2020
2021/4/12 Assignment – Blog Post 1 – My PIC16B Blog – Awesome Python Student
https://pic16b.github.io/HW-1/ 5/6
Please pay attention to the following details:
The station name is shown when you hover over the corresponding point on the map.
The estimates shown in the hover are rounded to a sober number of signicant
gures.
The colorbar and overall plot have professional titles.
It’s not necessary for your plot to look exactly like mine, but please attend to details such as
these. Feel free to be creative about these labels, as well as the choice of colors, as long as
your result is polished overall.
You are free (and indeed encouraged) to dene
additional functions as needed.
§4. Create Two More Interesting Figures
Create at least two more complex and interesting interactive data visualizations using the
same data set. In each case, you should construct your visualization from data obtained by
querying the database that you created in §1. The code to construct each visualization
should be wrapped in functions, such that a user could create visualizations for different
parts of the data by calling these functions with different arguments.
Alongside the plots, you should clearly state a question that the plot addresses, similar to the
question that we posed in §3. The questions for your two additional plots should be
meaningfully different from each other and from the §3 question. You will likely want to
dene
different query functions for extracting data for these new visualizations.
It is not necessary to create geographic plots for this part. Scatterplots, histograms, and line
plots (among other choices) are all appropriate. Please make sure that they are complex,
engaging, professional, and targeted to the questions you posed. In other words, push
yourself! Don’t hesitate to ask your peers or talk to me if you’re having trouble coming up
with questions or identifying plots that might be suitable for addressing those questions.
Written on March 31, 2021