关于后端:CIS-240-Stack-Attack

6次阅读

共计 12082 个字符,预计需要花费 31 分钟才能阅读完成。

CIS 240 Fall 2021
Homework #9: Stack Attack {150 pts}
“A good programmer is someone who always looks both ways before
crossing a one-way street.”(Doug Linder)
Preamble:
In this assignment you are going to write a small compiler that will transform code
written in a new stack-oriented language, called J, into LC4 assembly code much the
way that the lcc compiler converter converts C code into assembly. You are NOT
supposed to simulate the LC4 output. The compiler should only generate the
corresponding LC4 assembly into a file. Your compiler will be designed to use
the exact same calling convention that lcc does that way your J programs can
call subroutines compiled by lcc and vice versa.
Once again you are expected to do all of your work on the virtual machine provided
using the clang compiler. That is where your code will be tested and graded. You are
expected to provide all of the C source code and a makefile for producing the final
executable program, jc. This program will act much like lcc does, when provided
with a properly formatted source file in the J language it will produce the
corresponding assembly code. That is

./jc foo.j
will produce a new file called foo.asm if foo.j contains an acceptable program
otherwise it will print out helpful error messages.
The J Language
The J language is a stack-oriented language loosely inspired by Forth. You may want
to look at the following Wikipedia article for a quick description of stack oriented
languages.
http://en.wikipedia.org/wiki/…
In a stack oriented language all operands are passed on the stack and most
operations end up manipulating the stack. Here are a few quick examples of J
program fragments.
Egs. 1:
7 5 4 + *
This program pushes the values 7, 5 and 4 onto the stack (stack state: 7 5 4 – note
that the rightmost element in this list, 4, is viewed as being at the top of the stack)
then adds the top two elements (stack state: 7 9) then multiplies the results (stack
state: 63). This type of operational syntax is sometimes referred to as Reverse Polish
Notation (RPN) and you see it used on many revered HP calculators.
Egs 2:
5 7 gt if 12 else 25 endif ; this is a comment
This program pushes 5 and 7 on the stack (stack state: 5 7) then compares the top
two elements to determine if the topmost element is greater than the next element
(stack state: true/1). The if statement then queries the top element of the stack and
if it is non-zero, which it is in this case, it pushes the value 12 on the stack otherwise
it would push the value 25. So at the end of this program the stack state would be
(stack state: 12). Note that the true value is consumed by the if operator. Note also
the comment at the end of the line – everything from the semi-colon to the end of
the line should be disregarded by the compiler.
Parsing J source files:
As you can see from the examples a J source file can be thought of as a sequence of
tokens (egs. 7, -9, if, eq, +, defun) separated by whitespace characters (spaces, tabs,
newlines). Tokens can denote literal values or operations. You will want to proceed
by writing a program that attempts to read in the tokens from the file one at a time
and generates the appropriate assembly code in response to those tokens. For the
sake of simplicity, you can assume that no token will contain more than 200
characters. Remember that you do need to handle comments that follow semicolons.
Note that blank lines, tabs and spaces can be used to format the J code so
that it is more readable. You may want to write a specific function for reading the
next token from the source file. You may find the routines in <string.h> and
<ctype.h> useful.
Elements of the J language:
Operands:
For simplicity, all of the operands in the J language are 16 bit words. There are no
other data types.
Literals:
Tokens consisting entirely of digits optionally preceded by a minus sign are
interpreted as decimal numbers and should be pushed on the stack in the order they
are encountered. Additionally, you should implement hexadecimal literals which
begin with the string‘0x’followed by a sequence of hexadecimal characters. You
should read these hex values and then push them on the stack just as you would a
decimal literal.
Arithmetic Operations:
The J language has tokens that denote the basic arithmetic operations provided by
the LC4 assembly language.

    • : ADD
    • : SUB
    • : MUL
  • / : DIV
  • % : MOD
    In each case the value at the top of the stack is used as the first operand, Rs, and the
    next item on the stack is used as the second operand, Rt. These two values are
    popped from the stack and the value of the operation is pushed onto the stack.
    Comparisons:
    Let a denote the element at the top of the stack and b denote the next element.
    There are 5 comparison operators in the J language and they are listed below.
  • lt : a < b
  • le : a <= b
  • eq : a == b
  • ge : a >= b
  • gt : a > b
    Note that the top two values are popped off the stack and replaced by the result of
    the comparison which is either a 1 if the comparison produces a true value or 0 if it
    is false. Note that the values a and b are viewed as 2C values for the purposes of
    these comparisons.
    Logical Operations:
    The J language supports the 3 basic binary Boolean operations through the tokens
    and, or and not. The and and the or operations take the two top values on the stack,
    compute their respective operations in a bitwise manner using the corresponding
    LC4 instructions and then place the result back on the top of the stack. The not
    operation does much the same thing but it only consumes the top element of the
    stack.
    Stack Operations:
    Stack oriented languages always have operations to manipulate the state of the
    stack. The J language provides many of the basic stack operations offered by Forth
    and these are listed below.
  • drop : drops the top element of the stack
    o egs:
     stack state before drop: 7 -6 3
     stack state after drop : 7 -6
  • dup : Duplicates the top element of the stack
    o egs
     stack state before dup: 7 -6 3
     stack state after dup : 7 -6 3 3
  • swap : swaps the top two entries of the stack
    o egs:
     stack state before swap: 7 -6 3
     stack state after swap : 7 3 -6
  • rot : rotates the top 3 elements of the stack
    o egs:
     stack state before rot: 7 -6 3
     stack state after rot : -6 3 7
  • argN : This command copies an entry from one of the argument slots of the
    function to the top of the stack. Egs arg1 copies the first argument, the one
    right below the RV slot to the top of the stack arg2 copies the second
    argument to the top of the stack, arg3 would copy the third argument etc.
    The value N can range from 1 to 20 ie arg1, arg2, arg3, …, arg19, arg20 are all
    legal commands.
    If statements:
    The J language supports if .. else .. endif constructs as mentioned earlier. As in other
    languages the else clause is optional. The syntax is as follows;
    if < block A > else < block B> endif
    When the if statement is executed the system consumes the top element of the stack
    if this value is non-zero, ie true, then the statements in block A are executed,
    otherwise the statements in block B are executed. You may want to look at how lcc
    compiles if statements into LC4 assembly. You will notice that this is done by
    branching to appropriate unique labels. Note that a given J procedure or program
    may have many if statements so you must be sure to generate unique labels in the
    assembly code for each one. One way to keep track of what is going on is to keep a
    count of the number of if, and endif tokens that you have encountered as you parse
    the file from start to finish. If you think about this carefully you will be able to
    correctly deal with multiple if statements as well as nested if statements. You will
    probably want to use the file name or procedure name in the labels to make sure
    that the labels are unique even when the code is spread over many files.
    Comments:
    As we mentioned earlier we will use comments in the J code to document the design.
    If you encounter a semi-colon in the file all content between that point and the end
    of the line is regarded as a comment and is ignored by the compiler.
    Egs;
  • 2 3 4 + – ; This is a comment – everything up to the end of the line
    Functions:
    Like C, J is actually intended to operate as a procedural language which means that
    all of the code needs to be packaged in functions which will be turned into blocks of
    code just the way lcc does. Function definitions are a bit simpler in J since we can
    assume that the calling context has already pushed any required arguments onto the
    stack before calling the function, hence there is no need for an argument list.
    Function definitions are started with the defun token as illustrated in the following
    example:
    defun square ; a function to compute the square of the value at the top of the stack
    arg1 ; fetch the input argument, n from below the FP, RA and RV slots
    dup ; stack state : nn
    return ; copy n*n into RV slot and return
    Once the function has been defined we can invoke it by its name egs;
  • square ; computes the square of 4 – stack state after call: 4 16
    Function names must start with an alphabetical character and can consist of
    alphabetical characters, underscores‘_’and digits only. These identifiers are case
    sensitive.
    The return token is used to copy the value at the top of the stack into the RV slot
    associated with the current frame. It also invokes the function epilogue to reset the
    stack and frame pointer and to return from the function following the lcc
    convention. Note that every function should terminate with a return at some point.
    Functions are invoked by simply typing their name as shown above. Once again you
    are expected to use exactly the same calling convention that the lcc compiler does
    where R5 acts as the frame pointer and R6 acts as the stack pointer. The only small
    difference is that when you generate code to call a subroutine you should ensure
    that once the function has returned the return value of that function is included at
    the top of the stack instead of being just beyond it as is the case with lcc. This is
    easily accomplished by modifying the stack pointer after the subroutine has
    returned. Remember that it is the calling functions job to clean up arguments and
    return values as it sees fit once the subroutine has returned. In the J language we
    will need to handle this explicitly in the code – that is the jc compiler should not
    automatically generate code to pop the return value or the function arguments.
    Note that since we are adopting the lcc calling convention we should be able to call
    functions written in C and compiled with the lcc compiler from within our J
    programs and vice versa. This will be a useful way to add functionality to our
    system.
    Getting Started:
    In order to give you a couple of ideas about how to get started in thinking about and
    implementing this program we have provided you with a header file called
    token.h. This file defines a type called token which is intended to represent the
    kinds of tokens you will encounter in a J program. It also declares a function called
    read_token which reads the next available token from the file. It should return a 0
    if it was successful and a non-zero value if it encountered an error, you can imagine
    returning different values for different kinds of errors. The way the compiler
    proceeds is that it repeatedly reads tokens from the file and for each token it
    generates the requisite assembly. You can look at what the LCC compiler does with
    your C files for inspiration. Here is the pseudo code for the compiler – note how
    short it is, at least conceptually.
    Pseudo code for J compiler
    While read_token returns 0
    Consider the token you have just read and emit corresponding assembly into
    the output ASM file. (To do this consider each type of token and ask what
    assembly you need to emit to translate that element)
    Error Handling
    Your program should be able to handle any well formed J program so that’s what
    you should focus on. We will not be doing a lot of testing with malformed J programs
    so we are not expecting extensive error handling to catch every possible problem.
    That being said you probably want to do some basic error handling for your own
    sanity like checking for illegal tokens that you can’t process. Similarly, you may want
    to check for missing return statements and if statements that aren’t properly
    delimited with endifs.

Testing:
As usual you can’t really tell if your code works if you don’t test it. To this end we are
providing you with a variety of J programs of varying levels of complexity. We also
provide you with an os.asm a libc.asm that contain the operating system and a set of
I/O functions you can call and printnum.c that contains a C function for printing
function that will be called by many of the sample J programs. We also provide
script files that can be used in PennSim to run the code. (see the associated zip file
on Canvas)
You would be well advised to write your own test cases as well to test your code
since we will be testing on more than just the examples that are handed out.
Your Code.
Once again we expect you to use at least 3 source files for this assignment and you
must provide a Makefile that produces the final executable, jc, that is if we type
make jc in your source directory it should build the executable jc from scratch.
You can certainly split your code over more than 3 source files if you prefer. In
addition you should have a target called clean such that make clean removes the
jc program if it exists along with any other compiler outputs. Here is an example of
what the requisite lines in your Makefile should be
clean :
rm jc *.o
Extra Credit Challenge {20 pts}
You can score up to 20 extra credit points by writing a program that, when run on
PennSim produces a graphical animation of a solution to the Towers of Hanoi
problem using at least 5 discs. i.e. When your program runs it should produce a nice
animation on the PennSim graphics display of discs moving about in an orderly
manner to solve this problem. The constraints are as follows:
1) You must use a recursive algorithm to solve this – (this is the easiest way anyway)
2) Your program must be a combination of C and J. You will have to write C routines
to do things like drawing the discs but the part of the code that decides how to move
the discs must be in J. That is you can’t simply write the whole thing in C and then
call the solver from J with a single function call.

On the bright side the libc and os code that was provided contains implementations
of useful functions like lc4_get_event, lc4_draw_rect and lc4_reset_vmem.

Submission instructions.
When you’re done with your code, please submit the assignment on Gradescope.
Both your executable and your make target MUST be named“jc”. Submit it by
uploading each individual file (not in a folder) to the Gradescope submission portal.

正文完
 0