共计 6709 个字符,预计需要花费 17 分钟才能阅读完成。
ETC5512: Assignment 1
Due date: 11.55pm, April 16, 2021
Learning objectives
This assignment is designed to assess whether you
have developed an understanding of the limitations of various types of data collection.
can utilise open data sources, by accessing two different formats, with the purpose of extracting data to solve a problem.
write a reproducible report to communicate your solution to a problem, in an informative and readable manner.
👣 You are very young in your studies in the MBAt, and we don’t expect you to be a master of R, or a genious data analyst, yet. This is a first step in that journey, and we would like you to focus on thinking about the problem being tackled, communicate your fresh ideas for tackling the problem, focus on a very simple analysis, primarily summary statistics and plots of the data. Re-visiting the material from lectures and tutorials thus far about data collection methods, and extracting open data, will help you get started.
📮 Turn-in
Please use assignment1_template.zip on Moodle as template. Produce a reproducible report of maximum 1000 words (about four pages)1 and submit it as a“zip”file containing the single html file and a single Rmd file that is self-contained and compiles without error when placed in the right location in the project structure given in the template ptovided. The html file you submit should be the result of compiling your Rmd file. Your Rmd should be named as FamilyName-GivenName.Rmd where FamilyName and GivenName replaced with your family name and given name, respectively. If you have a middle name or preferred name that you would like to include, please add these in between FamilyName and GivenName separated by a hyphen. You should include your data, in the data directory, BUT this should be a subset of the full BTS data containing only the records for the two airports. The reason is that the full downloaded data is too big for easy upload and download from moodle. The ALA data set can also be reduced in size to contain just the information necessary for studying the assigned problem.
This assignment is worth 25 marks in total. The assignment is marked on the quality of the report and the quality of the analysis. This is an individual assignment and the report that you submit for assessment must be your own work.
🔨 Task
You are commissioned as an independent Business analyst consultant for the chief data officer for Qantas2 to write a report comparing the efficiency of using DFW or LAX as the primary airport into and out of the USA, and to assess two locations in Victoria, Tullamarine and Bendigo, for a new plane storage facility that has the least impact on local endangered species.
For the first task, you will use the Bureau of Transportation Statistics aviation ontime performance database (https://www.transtats.bts.gov). You should be able to use the data downloaded during tutorials. (There is no need to make your tasks more complicated by using data like passenger numbers, fuel consumption or weather.)
For the second task, you need to download data from the Atlas of Living Australia, containing occurrence records within a 50km radius of each of the airports, for records dating back to Jan 1, 2000. You are asked to focus on the species list provided, which are citically endangered in Victoria. (Keep it simple, this is purely an impact statement on wildlife, not on the terrain or physical conditions of sites, so you only need to extract occurrence data.)
Advice on downloading from ALA:
I found it easier to download directly from the web site than to use the the R package ALA4R
Go to the“Search & Analyse”then“Search and download records”and select“Batch taxon records”
Cut and paste the list of taxa below
Species being considered
Anthochaera (Xanthomyza) phrygia
Thinornis cucullatus
Perameles gunnii
Petauroides volans
Petrogale penicillata
Neophema (Neonanodes) chrysogaster
Ornithorhynchus anatinus
Use“Education”as the reason
You will get an email, once your subset is ready. This is usually within 5 minutes. Click the link in the email to get the download.
Look at the files that have arrived with your data. There is a DOI3 There is also a citation file containing details of sources of the data and how to appropriately cite them – there might be a lot for your subset so for this assignment exercise, you can skip the citations.
Your report is written for the Qantas chief data officer, and although your report should not contain codes, the analysis and assumptions necessary to make should have an explanation that will make an impression on someone with a technically proficient background.
Note that, there is not one correct answer. It is more important to have clear explanation justifying your answer.
🔍 Analysis
Task 1: Airport efficiency (total of 8 marks)
Your analysis should contain the following elements:
(3pts) Computed summary statistics for both airports to compare and contrast their operations.
(5pts) Several (at least two) plots, that compare and contrast the two airports.
Task 2: Impact on wildlife (total of 8 marks)
You analysis should contain the following elements:
Summary statistics comparing both locations.
Two maps, showing occurrences at both locations
At least one other plot comparing and contrasting the two locations
Hint: You’ll need to think about the type of data collection that is ued for the Atlas. For example, if there are no records at a particular site, does it mean that there are no species living at that location?
📄 Report (total of 9 marks)
The report should satisfy the following criteria:
Two main sections each of length two pages, approximately, detailing your findings.
(3pts) A summary paragraph containing what you have learned about the problem: (1) the relative efficiency of the two airports, (2) impact on wildlife.
(1pt) A summary (possibly) a table of terms used in the analysis, and how different quantities were calculated. Clearly define how you are defining efficiency (is it delays, is it number of connecting flights into and out of the airport) and impact (number of one species, variety of species).
(2pt) A section describing the data, including (i) an overview of the database, (ii) why your analysis is an acceptable use of the open database, (iii) the samples you have used for the analysis should allow you to make inference (or not) more broadly.
(1pt) Detailed and concise explanation of the methods used in the analysis, without showing the code in the report. (Note: code should be in the Rmd file, in sufficient quality to reproduce your work.) Discuss any limitations of your analysis and/or interpretations, possibly based on the samples you are working with.
(1pt) Appropriate referencing to all literature, software, and data sources in an academic referencing style. (This won’t count in the word limit.)
(1pt) Appropriate spelling grammar checks so that the report is high quality.
You can add an appendix or supplementary material, containing tables and plots that you find interesting but not important enough to include in the main report. (Note that this may NOT be read during marking.)
Additional resources
If you have questions about writing a report, please consult the Q-manual.
Appendix
This is a glossary of the endangered spaecies included in this subset from the Atlas of Living Australia.
Regent Honeyeater – Anthochaera (Xanthomyza) phrygia
Hooded plover – Thinornis cucullatus
Eastern Barred Bandicoot – Perameles gunnii
Platypus – Ornithorhynchus anatinus
Greater Glider – Petauroides volans
Brush-Tailed Rock-Wallaby – Petrogale penicillata
Orange-Bellied Parrot – Neophema (Neonanodes) chrysogaster
Note that you can use the word counter in RStudio to check this limit with your document.↩︎
This is just a hypothetical scenario for your assignment. You are not really commissioned by Qantas.↩︎
Check the lecture notes on DOI to know what this means. This is a permanent link to the subset that you created that can be used to share your data and analysis with others.↩︎