关于算法:COMP226-R语言分析算法

4次阅读

共计 19710 个字符,预计需要花费 50 分钟才能阅读完成。

COMP226 Assignment 2: Strategy DevelopmentContinuousAssessmentNumber
2 (of 2)Weighting 15%AssignmentCirculatedMonday 17 April 202
3 (week 9)Deadline Thursday 1
1 May 202
3 (week 12)SubmissionModeSubmit up to two files to the CodeGrade assignment on Canvas: strategy.R(required to get marks) and results.yaml (optional).LearningOutcomesAssessedThis assignment addresses the following learning outcomes: 鈥?Understand the spectrum of computer-based trading applications andtechniques, from profit-seeking trading strategies to execution algorithms. 鈥?Be able to design trading strategies and evaluate critically their historicalperformance and robustness. 鈥?Understand the common pitfalls in developing trading strategies with historicaldata.Summary ofAssessment 鈥?The goal is to implement and optimize a well-defined trading strategy within thebacktester_202
3 framework. 鈥?Marks are available for the correct implementation of 1
0 functions instrategy.R (70%). 鈥?Further marks (that require a correct implementation in strategy.R) areavailable for the results of a cross-validated optimisation that you can include inresults.yaml (30%). 鈥?CodeGrade pre-deadline tests and offline example outputs are available to helpyou check the correctness of your work.Submissionnecessary topass moduleNoLate SubmissionPenaltyStandard UoL policy; resubmissions after the deadline may not be considered.Expected timetakenRoughly 8-1
2 hoursBefore you move on and read more about the assignment and start working on it, please make sure youhave worked through “backtester.pdf” (and the corresponding lectures if you want), which is an intro to thebacktester_202
3 framework. Only return to this document when you already have the framework up andrunning.First, let’s recall the contents of the backtester_2023.zip:backtester_20231
0 directories, 28 filesIn the above listing, the following files/directories are specifically there for assignment 2: 鈥?a2_main_template.R 鈥?a2_periods.R 鈥?a2_test_checks_and_getTMA.R 鈥?strategies/a2_strategy_template.R 鈥?a2_example_yamls 鈥?DATA/A2The relevance of these files and directories will be explained below. The rest of the document is split intothree parts: 鈥?Part
1 describes the 1
0 functions that should be implemented to fully complete strategy.R; youshould start from a2_strategy_template.R; 鈥?Part
2 describes how to create (the optional) results.yaml; 鈥?Part
3 describes submission via CodeGrade and the available pre-deadline tests.In addition to pre-deadline tests on CodeGrade, example outputs are provided (in this document and asfiles) so that you can test whether you have implemented things correctly.As for assignment 1, the pre-deadline tests will determine your mark for the first part, corresponding to 70%of the overall marks that are available. Assuming that you have achieved full marks on the first part, thepre-deadline tests will check that the form of results.yaml is correct, and that it uses the expected studentusername (i.e., your one) and corresponding time periods; the pre-deadline tests do not check thecorrectness of the other fields in results.yaml, which will be checked post deadline only if you pass thepre-deadline test for results.yaml. For those other fields, you should use the examples provided (whichare in the subdirectory a2_example_yamls).Part 1: strategy implementation (70%)The trading strategy that you should implement is a triple moving average (TMA) momentum strategy, whichis described in slides 4.7. The specification of the strategy and the functions that it should comprise are givenin full detail, so the correctness of your code can and will be checked automatically.Two template files are provided to get you started: 鈥?strategies/a2_strategy_template.R, which should become the file strategy.R that youeventually submit; 鈥?a2_main_template.R, which uses DATA/A
2 and strategies/a2_strategy_template.R.If you source a2_main_template.R with no edits to these two files you will get an error:Error in if (store$iter > params$lookbacks$long) {: argument is of length zeroThis is because the strategy requires a parameter called lookbacks that you will need to pass in froma2_main_template.R. Read on to see what form this parameter should take, and, more generally, howyou should be editing these two files.a2_strategy_template.R contains 1
0 incomplete functions that you need to complete. The first 6functions (checkE01,…, checkE06) are error checks for the inputs to getTMA. These error checks are allone-liners, worth 3% each. They are intentionally meant to be straightforward to implement. The next threefunctions compute the moving averages (getTMA), use them to compute the position sign(getPosSignFromTMA), and compute the position size (getPosSize). The final, tenth function,getOrders combines the last three to implement that actual trading strategy. Recall that every strategy inthe backtester framework has a getOrders function.The TMA momentum strategy that you should implement uses three moving averages with differentlookbacks (window lengths). The short lookback should be smaller than the medium one, which in turnshould be smaller than the long lookback. In every trading period, the strategy will compute the value of thesethree moving averages (for the series that it trades on, which will be determined by params$series). Youwill achieve this by completing the implementation of the function getTMA.The following table indicates the position that the strategy will take depending on the relative values of thethree moving averages (MAs). You will compute this position (sign, but not size) by completing the functiongetPosSignFromTMA. The system is out of the market (i.e., flat) when the relationship between the shortMA and the medium MA does not match the relationship between the medium MA and the long MA.MA MA MA Positionshort MA < medium MA < long MA shortshort MA > medium MA > long MA longThe function getPosSignFromTMA takes the output of getTMA as input. The position size, i.e., the numberof units to be long or short, is determined by getPosSize. As for all strategies in the backtester framework,the positions are given to the backtester by getOrders. Here are the detailed specification and marksavailable for these 1
0 functions.FunctionnameInput parameters Expected behaviour Marks available for a correctimplementationcheckE0
1 …checkE06prices;lookbacks.The behaviour of these checksare specified as comments inthe template. Hints are givenbelow.3% for each of the 6 checks;18% in total.getTMA prices;lookbacks. Thespecific form thatthese argumentsshould take isspecified in thetemplate code via the6 checks that youneed to implement.The function should return a listwith three named elements,short, medium, and long. Eachelement should be equal to thevalue of a simple movingaverage with the respectivewindow size as defined bylookbacks. The windowsshould all end in the sameperiod, the final row of prices.12%getPosSignFromTMAtma_list is a listwith three namedelements, short,medium, and long.These correspond tothe simple movingaverages as returnedby getTMA.This function should returneither 0, 1, or -1. If the shortvalue of tma_list is less thanthe medium value, and themedium value is less than thelong value, it should return -1(indicating short). If the shortvalue of tma_list is greaterthan the medium value, and themedium value is greater thanthe long value, it should return 1(indicating long). Otherwise, thereturn value should be 0(indicating flat).10%getPosSize current_close:this is the currentclose for one of theseries. constant:this argument shouldhave a default valueof 5000.The function should return(constant divided bycurrent_close) roundeddown to the nearest integer.5%getOrders The arguments to thisfunction are alwaysthe same for allstrategies used in thebacktesterframework.This function should implementthe strategy outlined below in”Strategy specification”.25%All-or-nothing testsSince the check functions and getPosSignFromTMA function will only return a small number ofpossible correct values (
2 for the check functions, and
3 for getPosSign), these are implemented as”all-or-nothing” tests where you either get full marks for passing all tests or no marks if you fail at leastone test. As a very simple function, getPosSign is also marked with all-or-nothing tests, so from thefirst 1
0 functions, partial marks are only available for getTMA and getOrders.Strategy specificationThe strategy should apply the following logic independently to only the series in params$series(e.g., params$series could be c(1,3), which would mean trade only on series
1 and 3).It does nothing until there have been params$lookbacks$long-many periods.In the (params$lookbacks$long+1)-th period, and in every period after, the strategy computesthree simple moving averages with window lengths equal to: 鈥?params$lookbacks$short 鈥?params$lookbacks$medium 鈥?params$lookbacks$longThe corresponding windows always end in the current period. The strategy should in this period sendmarket orders to assume a position (make sure you take into account positions from earlier)according to getPosSignFromTMA and getPosSize. (Limit orders are not required at all, and canbe left as all zero.)HintsYou can develop the first 9 functions without running the backtester.For the checks you may find the following functions useful: 鈥?The operator ! means not, and can be used to negate a boolean. 鈥?sapply allows one to apply a function element-wise to a vector or list (e.g., toc(“short”,”medium”,”long”)). 鈥?all is a function that checks if all elements of a vector are true (for example, it can be usedon the result of sapply). 鈥?%in% can be used to check if an element exists inside a vector.To compute the moving average in getTMA you can use SMA from the TTR package.For getPosSize, you can use the function floor.For getOrders some instructions are given as comments in a2_strategy_template.R.If an error occurs within a function and you would like to inspect the contents of a variable that is localto the function, in addition to printing, you can also use global assignment (<<-) for debugging.Example output for checkE0
1 … checkE06 and getTMAThe file a2_test_checks_and_getTMA.R is provided to give you guidance on how you can test the sixfunctions, checkE0
1 … checkE06. For each one, two tests are provided: for a correct implementation, onetest should produce TRUE and the other FALSE. (You don’t need to use these tests, as you can also just relyon the tests on CodeGrade.)To use these tests, first source a2_test_checks_and_getTMA.R and also source the implementationsthat you would like to test. The tests that should return TRUE are test_checkE01() … test_checkE06();for tests that should return FALSE, there is single function, test_pass_all_checks, which takes thefunction to test as its only argument. Here’s an example of both types of test for E0
1 (where a correctimplementation of checkE0
1 has been sourced):> test_checkE01()[1] TRUE> test_pass_all_checks(checkE01)[1] FALSEThe way these tests work is clear from the source code in a2_test_checks_and_getTMA.R:################################################################################ Source the functions that you would like to test, e.g., with# source(‘strategies/a2_strategy_template.R’) or source(‘strategies/strategy.R’)###############################################################################source(‘framework/data.R’); dataList <- getData(directory=”A2″)prices <- dataList[[1]]prices_19_rows <- dataList[[1]]$Close[1:19]prices_20_rows <- dataList[[1]]$Close[1:20]prices_20_rows_renamed <- prices_20_rowscolnames(prices_20_rows_renamed) <- ‘Closed’bad_prices <- c(1,2,3)lookbacks_no_names <- list(5,10,25) # list elements not namedlookbacks_not_integer <- list(short=5,medium=as.integer(10),long=as.integer(20))lookbacks_wrong_order <- list(short=as.integer(15),medium=as.integer(10),long=as.integer(20))lookbacks <- list(short=as.integer(5),medium=as.integer(10),long=as.integer(20))test_checkE0
1 <- function() checkE01(prices,lookbacks_no_names)test_checkE0
2 <- function() checkE02(prices,lookbacks_not_integer)test_checkE0
3 <- function() checkE03(prices,lookbacks_wrong_order)test_checkE0
4 <- function() checkE04(bad_prices,lookbacks)test_checkE0
5 <- function() checkE05(prices_19_rows,lookbacks)test_checkE06 <- function() checkE06(prices_20_rows_renamed,lookbacks)test_pass_all_checks <- function(check_func) check_func(prices_20_rows,lookbacks)test_getTMA <- function() # same inputs as test_pass_all_checks() getTMA(prices_20_rows,lookbacks)The final test function in this file is for getTMA, where you should get the following return values for a correctimplementation:> test_getTMA()$short[1] 3081.5$medium[1] 3122.5$long[1] 3128.875If you want to do further testing, you can use the pre-deadline tests on CodeGrade, which applies to all 10functions, or you can extend a2_test_checks_and_getTMA.R by adding alternative examples yourself.Example output for getPosSizeHere is one example input for each of the three possible outputs:> getPosSignFromTMA(list(short=10,medium=20,long=30))[1] -1> getPosSignFromTMA(list(short=10,medium=30,long=20))[1] 0> getPosSignFromTMA(list(short=30,medium=20,long=10))[1] 1Example output for getPosSignFromTMAHere are two examples of correct outputs:> current_close <- 100.5> getPosSize(current_close,constant=100.5)[1] 1getPosSize(current_close,constant=100.4)[1] 0Example output for getOrdersThe following table gives the correct value of “profit” across
3 different time periods, using the “EXAMPLE”data, and the following parameters:params$lookbacks <- list(short=as.integer(5), medium=as.integer(50), long=as.integer(100))params$series <- 1:4start period end period profit
1 25
0 2086.184
1 100
0 4103.20450
0 150
0 -2179.298The examples of results.yaml (details below) can also be used to further establish the correctness ofgetOrders, along with all the tests on CodeGrade.Part 2: cross-validation (30%)WarningYou can only access the final 30% of marks if you get 70% for the first part; otherwise CodeGrade willnot process results.yaml.In this part of the assignment you are asked to do a cross-validated parameter optimization of profit, whereyou will use an in-sample and out-of-of-sample time period. Every student has their own in-sample andout-of-sample periods based on their MWS username (only the part before the @, e.g., for Vladimir Gusev,this username is is gusev, rather than the full email form gusev@liverpool.ac.uk). By having differenttime periods for different sutdents, there is not one single correct results.yaml.To get your in-sample and out-of-sample periods, use a2_periods.R as follows. Source it and run thefunction getPeriods with your MWS username as per the following example (where we use the fakeusername “x1xxx”). Use startIn, endIn, startOut, and endOut as the start and end of the in-sampleand out-of-sample periods respectively.> source(‘a2_periods.R’)> getPeriods(‘x1xxx’)$startIn[1] 1$endIn[1] 884$startOut[1] 885$endOut[1] 2000You will do two parameter sweeps. One on your in-sample period, and one on your out-of-sample period(normally one doesn’t do a sweep on the out-of-sample period in practice; we do it here to allow detailedcross-period performance analysis). The sweep will be over the following parameters: the short, medium, andlong lookbacks, and the subset of series that are traded on.Parameter Valuesshort lookback 5, 10medium lookback 50, 100long lookback 200, 300series All subsets of 1:
4 that have at least two elementsThe correct resulting number of parameter combinations is 88.HintYou can use expand.grid to create the relevant parameter combinations; alternatively you coulduse nested for loops.The following information (full example below) is needed in results.yaml:1. Your username and the corresponding periods. This information is used for a pre-deadline check to giveyou confidence that you are using the right periods.2. The parameter combination that gives the best profit on the in-sample period (where the seriesparameter is encoded in binary, see below); the corresponding profit.3. The parameter combination that gives the best profit on the out-of-sample period (where the seriesparameter is encoded in binary, see below); the corresponding profit.4. rank_on_out: The rank (a possibly fractional number between
1 and 88) that describes where theparameter combination from 2. ranks on the out-of-sample period.5. rank_on_in: The rank (a possibly fractional number between
1 and 88) that describes where theparameter combination from 3. ranks on the in-sample period.How to compute the rankUse the rank (package:base) with the argument ties.method=’average’.InterpretationAn ideal scenario is for the best in-sample parameter combination to also be the best out-of-sampleparameter combination. In practice, this is often not the case, as we have seen in the slides. Here, aswe did in the slides, we are exploring the difference between parameter combination performance onin-sample and out-of-sample periods, where “a good outcome” is for the rank_on_out andrank_on_in to both be close to
1 (where
1 is ideal).NOTE: you will not submit the code used to do the optimisation, which takes a long time to run; you will onlysubmit the results of the optimisation in results.yaml.Example output for results.yamlIn the a2_yamls subdirectory, three examples of results.yaml are provided for the fake usernames”x1xxx”, “x1yyy”, and “x1zzz”, and using the “EXAMPLE” data. For “x1xxx”, the yaml file contents are:username: x1xxxperiods: startIn:
1 endIn: 88
4 startOut: 88
5 endOut: 2000ins: short: 5.
0 medium: 50.
0 long: 300.
0 series1:
1 series2:
1 series3:
1 series4:
1 profit: 5963.6
3 rank_on_out: 2.0out: short: 10.
0 medium: 50.
0 long: 300.
0 series1:
1 series2:
1 series3:
1 series4:
0 profit: 3072.56
2 rank_on_in: 19.0Note how the params$series parameter is represented in the yaml, as
4 binary variables (taking values 0or 1): series1, series2, series3, and series4.Once you have correctly completed part 1, and have also created the code to do the parameter sweep andranking you can use these three examples to test your output. These examples are done using the”EXAMPLE” data so that they do no leak information about the correct answers on the “A2” data.Marks breakdown for results.yamlThe marks for results.yaml are only available if you have achieved 70% on the first part. Moreover, theyaml file must have the right format, and must show the correct username and periods — there is apre-deadline test that checks all of this for you.Here is an example blank results.yaml, shown with additional line numbers:
1 username:
2 periods:
3 startIn:
4 endIn:
5 startOut: 6 endOut: 7 ins: 8 short: 9 medium:1
0 long:1
1 series1:1
2 series2:1
3 series3:1
4 series4:1
5 profit:16 rank_on_out:17 out:18 short:19 medium:2
0 long:2
1 series1:2
2 series2:2
3 series3:2
4 series4:2
5 profit:26 rank_on_in:Note that the line numbers on the left are not part of the file; they are shown since they are used in thefollowing tables.Requred for passing the pre-deadline check:Field(s) Line numbers in example Marksusername
1 0periods 3-6 0Assuming that your submitted yaml passed the pre-deadline check, which checks the username and periodsfields and the format of the yaml file, the following marks are available for the remaining fields:Field(s) Line numbers in example MarksIn-sample best params (unique) 8-1
4 5In-sample best profit 1
5 5rank_on_out 16 5Out-of-sample best params 18-2
4 5Out-of-sample best profit 2
5 5rank_on_in 26 5Part 3: submission and pre-deadline testsTo get marks the submission of strategy.R is required; the submission of results.yaml is optional:There are pre-deadline tests for all 1
0 functions in strategy.R that are needed for the first 70% of marks.RandomisationTo reduce the incentive for trying to hardcode answers, tests involve randomness in the inputs. Thisdoes mean that there can be some (small) variance in the mark for wrong answers, but there is nonefor correct answers. This is a reasonable price to pay for being able to see all the tests that were runopenly.For the functions checkE01,…, checkE06, these are “all-or-nothing” tests (to prevent always returning TRUEor always returning FALSE from getting marks). If you do not pass all tests, you will be shown only the teststhat you failed. For example, here’s what happens for checkE0
1 if it just returns TRUE:When the output says “expected [1] FALSE” that means that the input arguments should have passed thischeck.One only sees tests where FALSE was expected but TRUE was returned. Here is what happens forcheckE0
2 if it always returns FALSE (one sees only tests where TRUE was expected):For getTMA, partial marks are possible. Here’s an example of the tests for a getTMA implementation thatpasses some but not all tests:The way this submission was created was to break a corerct implementation for the short TMA for certainlookback values. Note that the errors on CodeGrade show that the problem is only with the short TMA andonly for certain values of the lookback; this type of information may be useful in debugging. Note also that abroken getTMA (or getPosSignFromTMA or getPosSize) should also break getOrders, because itshould use them. For example, here’s the output for getOrders when the same submission as used forgetTMA is used:As for getTMA, partial marks are possible for getOrders. The tests for getOrders use the resulting profitfor comparison.Since getPosSignFromTMA should only return 0,1,-1, it is an “all-or-nothing” test. Here’s the output when awrong implementation that always return
1 is submitted (only failed tests where an expected output of
0 or -1are shown):For getPosSize, since it is very simple and there should not be lots of “edge cases” we again implement itas an “all-or-nothing” test. Here’s the output when the flooring of the position size has been ommitted:So for getTMA and getOrders partial marks are possible, for the other 8 functions it is all or nothing, andthen only failed tests are shown if the test is not passed.For results.yaml, the pre-deadline test checks that the username and periods are correct and that the formatof the yaml is correct. Here is any example of using the wrong username:Errors can also arise for the wrong periods or for a badly formatted yaml or one with the wrong fields.When the submitted yaml passes all pre-deadline tests, you will see the following:Only in this case will your results.yaml submission be marked post-deadline.WarningYour code will be put through the department’s automatic plagiarism and collusion detection system.Student’s found to have plagiarized or colluded will likely receive a mark of zero. Do not show yourwork to other students, and do not search for answers online.Good luck with the assignment.THE END

正文完
 0