DAT 500S – Machine Learning – Project Guidelines
Final Goal: Optimize the portfolio of (experimental) varieties to be grown at the target farm. Information about the target farm is available in the evaluation dataset. The optimal portfolio can have at most 5 varieties of soybean. It is not necessary but you are welcome to use the methods you learn in prescriptive analytics class to construct the optimal portfolio. If you are not familiar with optimization, come up with a meaningful heuristics to construct the portfolio. An example heuristic approach was discussed in class on November 21, 2020.
You are encouraged to divide the project work into three components: Descriptive Analytics, Predictive Analytics, and Prescriptive Analytics.
I.Descriptive Analytics
Perform an exploratory data analytics to unearth patterns in the given data to educate yourself about the given data. For example,
1.Plot the latitudes and longitudes on a map to visualize the locations of farms. Identify where the target/evaluation farm is located. It should be noted that most of the farms are located in the Midwest of the US.
2.Generate frequency distribution for varieties. Decide if you have enough data for each variety to build dedicated prediction models for every variety.
3.Check to see if there is any relationship between the locations and varieties. Explore if certain varieties are grown more often in some regions than in other regions.
4.Look for patterns in weather variables. Explore relationships between locations and weather related variables.
5.Plot the distribution of the yield variables. Based on the plot, what do you think a realistic goal for the optimal portfolio at the target farm?
II.Predictive Analytics
Decide a target variable to help you with the project goal. Variety_Yield and Yield_Difference are good candidates for the target variable. Based on the frequency distribution generated in the descriptive analytics, decide which varieties will have its own prediction model. Also, decide which varieties are going to be combined in the same model. Have an identifier for varieties in the combined model so that predictions can be made for individual varieties. Generate models using the following algorithms (if your target variable is continuous):
1.Linear Regression
2.LASSO
3.Regression Tree
4.Bagging
5.Random Forest
6.Boosted Trees
7.Neural Network
Generate models using the following algorithms (if your target variable is categorical):
1.Logistic Regression
2.Classification Tree
3.Bagging
4.Random Forest
5.Boosted Trees
6.Neural Network
7.Support Vector Machine
Using these models, predict the yield or yield difference for every potential variety at the target/evaluation farm. Depending upon the choice of your target variable, these predictions need not be yield or yield difference. Make predictions for multiple weather related uncertainties. Ensure that chosen weather related scenarios are suitable for the location of the target / evaluation farm.
III.Prescriptive Analytics
Optimize the portfolio of (experimental) varieties to be grown at the target farm. Experimental varieties are in the column identified as‘Varieties’. The optimal portfolio can have at most 5 varieties of soybean. It is not necessary but you are welcome to use the methods from the prescriptive analytics class or other optimization classes to construct the optimal portfolio. If you are not familiar with optimization, you can invent your own heuristic to make the recommendation. There will not be any grade penalty for not using optimization. Using a good heuristic will be sufficient to get a good score for this part of the project.
Your recommendation should explicitly identify the varieties to be grown and percentage of the farm land allocated for growing those varieties. The percentage of the farm land should add up to 100 percent. Here are two sample heuristics,
1.Naïve Heuristics
Based on the predictions, rank the varieties according to their yield potential and recommend the top 5 varieties to be grown at the farm. You could potentially allocate 20 percent of the land for each variety.
2.Mean-Risk Heuristics
Based on the predictions, rank varieties based on the mean yield and risk in yield. Recommend the top 5 varieties in these rankings. Allocate land based on the mean yield and risk in yield.
Key things to remember while writing the report
Perform a literature search using library resources to identify journal publications relevant to your topic. In the literature, do you find interesting methods to make similar recommendations? What do you think about those methods? How is your approach different from those methods? Did your project add incremental value to these existing publications? Note: Utilize at least six peer-reviewed journal (Management Science, Interfaces, Operations Research, Journal of Operations Management, Production of Operations Management, Journal of Portfolio Management, Journal of Finance, etc) or conference articles to synthesize your arguments about the existing methods in the literature.
The final report should include Title of the project, Abstract, Keywords, Introduction, Literature Review, Methodology and Analysis, Conclusion, and References.
Submit your project as a PDF file on Canvas by Dec 19, 11:59 PM CST.
Remember to include the following components in your report:
1)Title. Convey a message using 12 words. Readers should understand the content of the entire report by just reading the title.
2)Abstract. Summarize your report using 300 words. Some readers would read just the abstract to figure out if they would like to read the entire report. You should write a captivating summary of the entire report here.
Note: You should just read the title and abstract of many publications as part of your literature review before deciding on the articles that you would like to utilize in your project.
3)Keywords. Include three to five keywords relevant to your project.
4)Introduction. This section should introduce your project. You should include discussions about: What is the motivation behind this project? What is the goal of this project? Which organization benefits from this study? What are the Research Question(s) answered by this project? What methods were utilized? What are the important results and conclusions?
5)Literature Review. Utilize library databases like JSTOR, INFORMS, PUBMED, etc. to find relevant studies (peer-reviewed articles) to your topic. Do you find publications addressing this same problem? Did you add more value to the existing literature by completing this project?
Note: Do not base your opinion/findings based on articles that are not peer-reviewed i.e. utilizing newspapers and magazines articles alone are not adequate.
6)Methodology and Analysis. Concisely describe the methods and analysis used in the project.
7)Conclusion – Summarize your methods, analysis, results, and recommendations. What is unique about your work? What are the findings? Are there any surprises? Are the findings beneficial to any organization?
8)References – Include references from your Literature Review.
9)Tables and figures should be numbered and titled. Table titles should appear on the top. Figure titles should appear on the bottom. Every table and figure presented in the report should be discussed in the report.
10)Formatting: Submit a Word report with all of the above discussed components. Include your ID number on the first page (no title page) and include page numbers on all pages. Your report should be not less than 9 pages in length. It should not exceed 10 pages. You cannot have anything beyond 10 pages.
Font: Arial
Font Size: 12
Margins: 1 inch on all four sides
Spacing: 1.5 line spacing
http://www.6daixie.com/conten…