关于后端:EECS-370-C语言linker解答

42次阅读

共计 8193 个字符,预计需要花费 21 分钟才能阅读完成。

Project 2 EECS 370 (Fall 2023)Worth: 100 Points Point Allocation
Assigned: Saturday, September 16thPart 2A Due: 11:55 PM ET, Thursday, October 5th 40 PointsPart 2L Due: 11:55 PM ET, Thursday, October 26th 60 Points

0. Starter Code

For Project 2A, the assembler, you have 2 choices: build off your project 1a assembler OR startwith the starter code, which will be updated after all project 1a submissions have been collected.For project 2L, the LC2K linker starter code is meant to help you read in and parse object files. Itis probably a good idea to break it up into different functions, but is a good place to get started.

1. Purpose

The purpose of this project is to help you understand the assembling and linking process, whichwe can utilize to create multi-file LC2K programs. In order to do this, we will first create a newassembler (P2A) which will take an assembly file as input and outputan intermediate object file.Our linker (P2L) will take object file(s) as input and create the final machine code.In Project 1a, you wrote an assembler which took an LC2K assembly file as input and producedan executable file as output. This approach is fine if all the codeneeded is contained in one file,but what happens if we want to use other pieces of code? Libraries contain functions that makecoding easier, and are often written in assembly and stored as object files. Splitting code intomultiple files encourages modularity andorganization. Multiple files are also important for largeprojects: if you modify one source file, you only need to recompile andreassemble that one andthen link everything together, greatlyreducing the total time to create an executable. Now thatwe have a better understanding of translation software, we can create separate assembler andlinker.Here is an example that will help explain the purpose of the linker:main.asHere, we have two basic programs. main.as loads $1 with 5 . We then call the subOnefunction. We then decrement the value of 5 and return to main.as until our result is 0 . Eventhough main.as tries to call subOne.as without knowing at assemble time where it is defined(since it is in a separate file), our program will still work.The linked result would look like this:Note: The linker will take in object files and produce a machine code file with the linking process.Assembly files are neither an input nor output. This is just an example of how linking nctiondefinition starts

2. Problem

This project has two parts. In the first part, you will create a program that assembles an assemblyfile into an object file. The P2A assembler is an extension of the P1A assembler. The keydistinction for P2A is that instead of outputting a machine code (mc) file, you will output anobject (obj) file which contains additionalinformation to assist in the linking process: a header,a symbol table, and a relocation table. In the second part, you will write a program to link objectfiles into a single executable consisting of machine code, which your project 1 simulator will beable to run.

3. Assembler

Your new assembler will take in a single assembly file (see section 3.1) as input and output asingle object file (see section 3.2).
So far we have created an assembler which can translateassemblylanguage into machine code.However, let’s consider a basic program that prints“Hello World”:helloWorld.cIf we were to compile this into assembly, we would need to branch to the printf() function andexecute the code at that memory location. This is great because we don’t need to rewriteprintf every time we create a new project, we can just However, ourcurrent assembler can’t handle undefined references. To fix this, we aregoing to create anassembler that allows for external references (i.e. references to labels that are NOT defined in thefile) and a program called the linker to resolve those undefined external references.


#include <stdio.h>
int main(){printf("Hello World");
return 0;
}

Assembly language programs will be of the same format as those from Project 1, with a few extrarestrictions.The first part of the assembly file must contain only assemblyinstructions. The second partshould contain only .fill assembler directives. For example, suppose an assembly file iscomposed of M instructions and N .fill s. Lines 0 to (M-1) contain actual instructions, and linesM to (M+N-1) contain .fill s, with no mixing between them. We refer to all of our instructionsas belonging to the Text section of our program.Moreover, everything that contains a .fillstatement is considered to be in the Data section of our program. It is important that all of yourtestcases separate these two sections such that no .fill directives arein the Text section andno instructions are in the Datasection. Below the data section is the Stack , which is initiallyempty; for an instruction to access the stack, e.g load aword from the stack, we will use the labelStack to denote the startof the stack section.3.1.2 Local and Global LabelsLC2K files may now use global symbolic addresses, which means we must nowdistinguish
between local and global labels. The scope of a local label is thefile the label is defined in. (Thisis analagous to a variable or function with the static keyword in C. The scope of a local variablein C is at most a function.) The scope of a global label isall object files linked together (more onthis in part 2l). Because of this, different object files can use local labels with the samename andstill be linked together. Local labels will start with alowercase letter [a, b , … , z] whileglobal labels start with a capital letter [A, B, …, Z] . This is unique to LC2K as a way todistinguish between local and global labels. For example, staddr isa local label whereas Staddris a global label.Local symbolic addresses must be defined at assembly time. However, a globalsymbolic addresscan be undefined at assembly time. It is assumed thatundefined global labels are defined inanother file to be resolved at link time,so they should be temporarily resolved as address 0 inthe text and data segments. Defined symbolic addresses should beresolvedexactly as they werein Project 1. That is, it is entirely possible that a global label isdefined and referenced in the samefile; if this is the case, the label should be resolved just like a local label. The Stack label shouldbe treated as an undefined global label for the purposes of the assembler.Just like P1A, you can assume assembly files max out at 65536 total instructions and data,although we’ll test you on much, much less than that. Assuggested in the starter code, you mayassume that no input LC2K file is more than 1000 lines.3.1.3 LC2K Peculiarities Part 1Firstly, if a beq instruction contains a symbolic address, the label it refersto must be a locallydefined label. This label can be either alocal or global label. A beq should not branch to anotherfile, and a programmer should use jalr in this case.Secondly, in LC2K, loading orstoring to an absolute address no longer makes much sense. Thelocations of data and text within the final executable file willlikely be different than in theoriginal object file, leading to unintended execution. While this isn’t something we will enforcewith error checking, it is recommended that labels are used when dealingwith loads and stores.In reality, there are reasons to useabsolute addressing: memory mapped IO for example (ifyou’re curious about this, take EECS 373 shameless plug) or cacheanalysis(see Project 4). If youcome across a label with a constant offset, assemble as in Project 1.Thirdly, local labels should not be included in the symbol table. However, a local symbolicaddress does need a relocation table entry as the address ofthe local label might change. Theseaddresses can be fixed bycalculating the new local label locationduring linking.3.1.4 SummaryIn summary, assembly file formatting rules are:

1. Do not mix instructions with directives (.fill s)
2. Instructions come first
3. Directives (.fill s) come second
4. Defined symbolic addresses (defined local and global labels) are resolved exactly as theywere in the Project 1 assembler
5. Undefined global symbolic addresses are temporarily resolved as address 0
6. Local labels start with a…z and must be defined at assembly
7. Global labels start with A…Z and can be undefined at assembly
8. Branches cannot use undefined global symbolic addresses

7. Turning in the Project

Use autograder.io to submit your files. You have been added as a student to the class, so youshould see EECS 370 listed as a class.
Here are the files you should submit for each project part:1. assembler (part 2a)a. C program for your assemblercalled”assembler.c”b. Suite of test cases. (Each test case is anassembly-language programin a separate file, ending in .as, .s, or .lc2k. Test case names shouldonlyinclude letters, numbers, underscores, and periods.)2. linker (part 2l)a. C program for your linker called “linker.c”b. Suite of test cases (each testcase is a set of assembly-language programsusing the naming scheme specified in [section 4.9]).

8. Sample Test Cases

Example 2aHere is a multi-file assembly-language program that counts down from 5, stopping when it hits0, and then halts: WARNING: Text within parentheses SHOULD NOT be included in your assembler’s orlinker’s output. There also should not be any trailing spaces ortabs. This text is added hereonly to help students identify the different sections of the object files in these examples.Example 2lThis example uses the object files from example 2a. Here is the machine code produced after thelinking process:count5.mc :This code can be simulated using your project 1 simulator. WARNING: Be careful when copying and editing these examples!./linker main.obj subone.obj count5.mc

正文完
 0