原文地址:PPM 图像处理器
Introduction
用 C 语言代写一个类似 Photoshop 的图像处理软件,能够对 PPM 格式对图像文件进行处理。
Requirements
This is a programming assignment, meaning you will write and submit the source code for a program. All programs must compile (and run) on the ugrad machines using our standard compiler options.
From this point forward in the semester, your programs will need to not only compile and run cleanly (i.e. with no warnings, errors, or crashes), but also not have any memory leaks (as reported by valgrind).
NOTE: For this assignment, you will be working with images. This will mean that you will want to view images, which will that if you want to work remotely on the ugrad machines, you will need to set up X-tunnelling using Xming or XQuartz (see the XWindows section of the tools reference page on Piazza).
Also, PPM images can be quite large; be aware of file sizes, and try not to fill up your hard drive (or your disk quota on ugrad) with thousands of cat pictures…
Program Description
This program will be an image-manipulation program, in the vein of Photoshop. Yours will have a command-line-based user interface (UI), so there will be no graphical interface, and the range of operations will be limited, but the algorithms you will use are the same ones used in programs like Photoshop or GIMP.
At a basic level, the program will be able to read image files from disk, perform one of a variety of image manipulation tasks, and then write the result back to disk as a new image file. Since your program will not have a GUI, we will use external programs to view the images. If you are on ugrad (either locally, or remotely with X-tunnelling), you can use the program feh, which is a very simple command-line image viewer; simply run the program with the name of an image file as a command-line argument, and it will display the image on your screen. If you are using a different platform, you are welcome to use an image viewer of your choice; feh is easy to install using most linux package managers, but there are other open source image viewing programs, as well as alternatives for Windows and OSX.
While there are many formats for storing image files, your program will only need to read and write one, the PPM format. This is essentially the simplest and easiest format to read and write, which is why it was chosen; its main drawback is that it does not use any kind of compression, so images stored in this format tend to be on the large side when compared to formats like JPEG or PNG or GIF.
Your program will be a command line tool, always run with the name of the executable file project followed by the name of an input PPM file, the name of a desired output PPM file, and the specific lower-case name of an operation, as listed below. Some operations require additional arguments, which will also be supplied at the command line by the user, at the end of the line. There is no input entered by the user interactively.
Note that regardless of the desired operation, the first two arguments after the executable name project are always interpreted as the input file name followed by the output file name. The next argument is interpreted as the operation name, and the operation’s arguments (if any) come after that.
The operations your program will be able to recognize and perform are all of the following (the bolded words are the operation names to be entered at the command line by the user):
- swap – swap color channels (i.e. what was red becomes blue, what was blue becomes green, etc.)
- bright – change brightness (up or down) by the given amount
- invert – invert the colors (i.e. black becomes white, white becomes black, etc.)
- gray – convert to grayscale (i.e. a full-color image becomes shades of gray)
- crop – crop the image given corner pixel locations
- blur – blur the image (smooth and soften its edges) by a specified amount
- edges – detect edges with intensity gradient above given threshold
For example, at the command prompt, a user of your program might type
./project nika.ppm nika-swap_once.ppm swap
to take as input a PPM file named nika.ppm and create an output file named nika-swap_once.ppm which contains a swapped version of the nika image. Or, if the desire is to crop the portion of the nika image with top left corner (col=60, row=15) and bottom right corner (col=600, row=565), then the user could type
./project nika.ppm nika-crop.ppm crop 60 15 600 565
to generate the output file named nika-crop.ppm.
When you first begin this assignment, we suggest that you aim to implement PPM reading and writing first. As an initial test to be sure you’re on the right track, read in a PPM file and write it out unchanged, and use the Unix diff tool and visual inspection using the feh tool to confirm that your code works. You’ll note when using diff that your output file doesn’t contain a comment line that the original nika.ppm file does; but when viewing the image, you should see no differences at all. Once this works well, begin successively working through the other commands as listed.
Development Plan
For this assignment, the development plan is largely in your hands. You are required to create and submit a Makefile for this project, and the executable it generates should be named project. You should also rigorously test your code, but you are not required to hand in specific tests with your submission. Beyond that, it is largely up to you how to structure your program, and what order to implement features in, but you must take a modular approach.
We recommend that you break your code into several files, and have as little code as possible in main(). Here is a suggested (but not required) breakdown of features into files:
- Makefile – you must include a Makefile that can build your program; it’s how the graders will compile your submission. You are required to build a target whose name is project.
- project.c – main program. The main() function should be extremely simple; it might literally call a single function, then return 0.
- ppm_io.c – contains implementations of functions for reading, writing, creating, destroying, copying, etc. images (using the PPM format) – see scaffolding folder in public repo.
- ppm_io.h – header for ppm i/o stuff (struct and function declarations) – see scaffolding folder in public repo.
- imageManip.c – contains implementations of all image manipulation algorithms
- imageManip.h – header for imageManip stuff (struct and function declarations, possibly some #defines)
Note that this is not a list of ALL the files that should be included in your submission; it is only a listing of possible source code files.
We recommend that before you start coding, you make a development plan. This means that you sit down and plan out (on paper) how you will break the program down into modules (e.g. functions or groups of functions), what each module will do, and how they will interact. Aim to write small, clean helper functions for better readability and easier testing, as well as greater reusability. Then, make a plan for what order you will implement the modules in. You will also want to test your modules; it’s a good idea to use test-driven design, which means that you will design tests for your functions before you actually start trying to write the functions themselves.
For each module, it’s important to think about precisely what it should do, and also how you can test it to be sure it’s doing what you want. There are lots of ways of testing your code; for this assignment, a lot of your tests will likely involve the visual inspection of output images to see if they look the way they’re supposed to. Still, having an idea of how you’ll test each piece before you start writing it (and then testing/fixing it before you move on to the next one) will make your life a lot easier.
Scaffolding Folder
The scaffolding (i.e. starter code) folder for this project (available in the public class repository) provides you with ppm_io.h, a header file containing typedefs for structs to hold a pixel and an image, respectively. It also supplies ppm_io.c, which contains code for writing an image to a PPM file, and a Makefile for the project. Finally, it contains nika.ppm, an example PPM image to work with (but you can use convert on ugrad to create more from your own jpg files). In a subfolder named results, the starter folder includes the PPM files which resulted in calling the operations displayed on this page (this page is actually displaying the png versions). We encourage you to store the provided nika.ppm image and all created images in a subfolder of your own repository named data, to keep your images separate from your source code files. You don’t need to submit any PPM files to us; keeping them in a separate folder will help you avoid accidentally including them. TIP: if you’re using the data subfolder, we suggest you execute your code from within the data folder by typing ../project, so you can refer to input filenames while the program is executing directly as nika.ppm, rather than data/ nika.ppm, saving yourself a lot of typing while testing.
Error Reporting
The approach your program will take for error reporting is to have your main method return a 0 value indicating success or a positive value indicating failure. Which positive value your program should return is indicated in the table below. If more than one error condition occurs, your program should return the error code listed earliest in the table below.
In addition to returning the specified value, your program should also output an informative error message to stderr. (The text of the error messages will not be specified; we’ll leave the exact wording up to you.)
Return value | Error condition it signifies |
---|---|
0 | No errors detected |
1 | Failed to supply input filename or output filename, or both |
2 | Specified input file could not be opened |
3 | Specified input file is not a properly-formatted PPM file, or reading input somehow fails |
4 | No operation name was specified, or operation name specified was invalid |
5 | Incorrect number of arguments or kind of arguments specified for the specified operation |
6 | Arguments for the specified operation were out of range for the given input image, or otherwise senseless |
7 | Specified output file could not be opened for writing, or writing output somehow fails |
8 | Any other error condition not specified above |
Implementation Details
This section contains detailed descriptions of formats, algorithms, and the like that will be necessary for this assignment.
PPM image format
For this assignment, we will use a very simple image-file format called PPM. It’s an uncompressed format, meaning that the images will take up a lot of disk space (compared to JPG or PNG files), but it’s very easy to read and write from C code (which is why we’re using it).
NOTE: you can use a unix program called convert
to convert between image formats; e.g. to convert an existing file called “selfie.jpg” into a PPM, you would type:
$ convert selfie.jpg selfie.ppm
This works for most image format file extensions; it converts to/from most known image formats, including .jpg, .gif, .png, .ppm, .tiff, and .pdf, and is installed on the ugrad machines. If it’s not installed on your local machine (or virtual machine), most linux package managers can install it (or can install ImageMagick, which is the suite of tools that convert
is part of). Additionally, most image manipulation programs (e.g. Photoshop, GIMP, etc.) can export to PPM format.
If you haven’t used GIMP, it’s a free, open-source image manipulation program with good cross- platform support; try it out! It may be helpful to have access to either GIMP or Photoshop in order to try things out to see what kind of output you should expect from your program for various kinds of input images, but keep in mind that these programs may not use exactly the same algorithms we will here.
The PPM format itself is pretty simple (compared to most other image formats). Basically, at the top of the file will be a special “tag” that marks the file as a PPM; this should be P6. Then, there are three numbers, separated by whitespace; these numbers represent the size of the image as columns, rows, and colors. Columns and rows specify the width and height of the image (in pixels) respectively. (BEWARE: columns come before rows in this format!) Colors encodes how many different shades of each color a pixel can take on; for this assignment, this number must always be 255 (you must reject any image that uses a different value, but you’re unlikely to encounter one). Immediately after the 255, the binary data encoding pixels will begin.
Optionally, there may be lines starting with a #, which are comments and should be ignored; these may be intermixed with the above information. You don’t need to store these; if you read a file and then re-write it, it’s fine if the comments get lost. The files we test your code with, however, will have either 0 or 1 comment lines just after the P6 tag, but no comment lines between the other header values (see nika.ppm in the course public repo for an example).
All of this will be ANSI text, so you can use the normal text functions (e.g. fgetc(), fscanf(), fprintf() etc.) to read/write them.
After the color size specification, there will be a single whitespace character (usually a newline, but that’s not guaranteed), after which the remainder of the file will be the actual pixel values. Basically, each “pixel” consists of three values; the first value is the “red” channel, the second value is the “green” channel, and the third value is the “blue” value. Taken together, these three values specify a single color, which is the color of that pixel. Since the max color value is 255, each of these values will be in the range 0-255, which fits exactly in one byte of memory.
The easiest way to read the pixel values is to create a struct that contains three unsigned char variables, one per color channel. Then, create an array of your pixel structs with rows * cols elements. At that point, you can just use fread() to read the entire array of pixels from the file in one go. Similarly, you can use fwrite() to write the whole pixel array with a single function call. We’ve started this off for you in the provided ppm_io.h and ppm_io.c files.
For the formal “official” description of the PPM format, see the netpbm site.
The starter folder includes one primary example image, called nika.ppm. We’ve included a copy on this page (below), so you know what it should look like; you should also be able to open it using feh or any other image program. Again, you are welcome to work with this image, but you are encouraged to convert images of your own to the PPM format so you have more than one test image to work with.
Also note that the images embedded in this page are not actually PPM formatted, since most web browsers don’t know how to read PPMs (and also PPMs are kind of large for web use).
Swapping color channels
Swapping color channels is very simple; for each pixel, you just need to copy each component’s value to the next component (note that this will require a “temporary” value to store one of them). So, the old value for the green component becomes the new value for the red component, the old value for the blue component becomes the new value for the green component, and the old value for the red component becomes the new value for the blue component.
If you apply swap to the nika.ppm image, the result should look like:
Adjusting brightness
Adjusting brightness is another per-pixel operation; in this case, you will simply add some value to each channel of each pixel (if the number is positive, this will brighten the image, if it’s negative, it will darken the image). By adding the same amount to each channel, you won’t change the color, just the brightness (i.e. dark green may become light green, but it won’t become purple).
The only trick here is that you need to make sure you do “saturating” math, meaning you don’t accidentally overflow (or underflow) your unsigned chars. So, when you do your addition, you need to check the result and clamp the result to the range [0, 255].
You will likely want to write a helper function to do the saturation; it would likely take a double, clamp it to the appropriate range, and return the value as an unsigned char. You can use this helper function for several of the later operations as well.
If you apply the brightness transform to the nika.ppm image, the result should look like:
Inverting colors
Inverting color values is very straightforward; simply take the value of each component of each pixel, and calculate its “inverse” by subtracting its value from 255. If you apply the invert transform to the nika.ppm image, the result should be as shown below. If you invert that resulting photo, you should get the original photo back.
Grayscale conversion
This is another fairly simple operation, but it does require a little math. Basically, for each pixel, you will calculate a single grayscale value based on the three color values, and then assign that same value to all three color channels (if all three color channels have the same value, you know the pixel will show up as some shade of gray).
For this program, we will use the NTSC standard conversion formula:
intensity = 0.30*red + 0.59*green + 0.11*blue;
If you look around online, you will discover that there are actually several different formulas that can be used, which result in slightly different results; please use the NTSC version for this assignment (see Wikipedia on grayscale conversion). Note that once the value is calculated, you will need to cast it back to an unsigned char before you assign it to each channel.
If you apply the grayscale transform to the nika.ppm image, the result should look like:
Crop
Cropping an image is pretty simple; you just need to specify the two corners of the section you want to crop. That will mean 4 integer values: the column and then row of the upper-left corner, and the column and then row of the lower-right corner. By looking at the differences between those values, you can calculate the size of the new image; this will let you allocate the correct amount of space for the pixel array. Once you’ve done that, you can just use a loop to go through the pixels of the specified region in the original image, and copy each component of each pixel to the new image.
If you crop the nika.ppm image from (top col=60, top row=15) to (bottom col=600, bottom row=565), the result should look like:
Blurring an image
The blur operation is more complex than the previous operations, because you need to consider more than one pixel at a time. At the simplest level, a blur works by taking each pixel, and setting its value to some kind of average of all the pixels in a small neighborhood around it. The simplest blur possible would just set each pixel to the average of itself and the pixels adjacent to itself (computed for each color channel separately). However, this kind of blur isn’t as pretty as we might like. What we will do for this assignment is similar, but more clever.
For the kind of blurring effect that a program like Photoshop would give you, you will need to weight the importance of neighboring pixels according to a Gaussian distribution. This is referred to as a “Gaussian blur” (Wikipedia), and this is what you will implement for this assignment.
First, we need to create an NxN matrix that holds the values of a 2D (symmetric) Gaussian distribution with a given variance (we assume 0-mean). N should be big enough to be at least 10*sigma positions wide (to span approximately 5*sigma positions in each direction), and N should always be an odd number (so there’s an equal number of rows/columns on either side of the center. If dx and dy store the two coordinates as offsets from the center (i.e. delta-from-mean), then the Gaussian value can be calculated as:
In code, this might look like:
double g = (1.0 / (2.0 * PI * sq(sigma))) * exp(-(sq(dx) + sq(dy)) / (2 * sq(sigma)));
Note that you will need to include math.h and link with -lm to get the exp() function. You will have to define sq() yourself (to square the argument, either as a function or as a #defined macro), and PI (likely as a #defined constant).
As an example, if sigma is 0.5, then we would get a 5×5 matrix like this:
0.000000 0.000029 0.000214 0.000029 0.000000
0.000029 0.011660 0.086157 0.011660 0.000029
0.000214 0.086157 0.636620 0.086157 0.000214
0.000029 0.011660 0.086157 0.011660 0.000029
0.000000 0.000029 0.000214 0.000029 0.000000
If sigma was 1.0, we would get an 11×11 matrix:
0.000000 0.000000 0.000000 0.000000 0.000000 0.000001 0.000000 0.000000 0.000000 0.000000 0.000000
0.000000 0.000000 0.000001 0.000007 0.000032 0.000053 0.000032 0.000007 0.000001 0.000000 0.000000
0.000000 0.000001 0.000020 0.000239 0.001072 0.001768 0.001072 0.000239 0.000020 0.000001 0.000000
0.000000 0.000007 0.000239 0.002915 0.013064 0.021539 0.013064 0.002915 0.000239 0.000007 0.000000
0.000000 0.000032 0.001072 0.013064 0.058550 0.096532 0.058550 0.013064 0.001072 0.000032 0.000000
0.000001 0.000053 0.001768 0.021539 0.096532 0.159155 0.096532 0.021539 0.001768 0.000053 0.000001
0.000000 0.000032 0.001072 0.013064 0.058550 0.096532 0.058550 0.013064 0.001072 0.000032 0.000000
0.000000 0.000007 0.000239 0.002915 0.013064 0.021539 0.013064 0.002915 0.000239 0.000007 0.000000
0.000000 0.000001 0.000020 0.000239 0.001072 0.001768 0.001072 0.000239 0.000020 0.000001 0.000000
0.000000 0.000000 0.000001 0.000007 0.000032 0.000053 0.000032 0.000007 0.000001 0.000000 0.000000
0.000000 0.000000 0.000000 0.000000 0.000000 0.000001 0.000000 0.000000 0.000000 0.000000 0.000000
Basically, the sigma parameter lets you control how strong the blur effect is; the larger your sigma, the more blurry your image will become.
Once you have this Gaussian “filter” matrix, you’ll need to “convolve” it with your image. That’s fancy math terminology that basically means you loop over the pixels of your input image, and for each pixel, you place your filter on your image centered at that pixel, and then for each element of the filter, multiply it by the pixel value underneath that element. Then, you set the value for the pixel of the output image to be the normalized sum of those values. Basically, this is a fancy way of describing a weighted average; the values in the matrix are the weights that get applied to the corresponding pixels of the input image. To normalize, you just sum up the values in the filter matrix, and divide the sum of the weighted pixel values by it; that ensures that you’re not brightening or darkening the image when you blur it.
You will also need to be careful near the edges of the image, since the parts of filter may extend past the border of the image. In your calculations, just skip the filter positions that hang off the edge. (This is why the normalization is important even if the values in your matrix sum to 1).
You will blur all three color channels in this fashion.
You will likely want to implement this in several functions. For instance, you may want a function that generates a Gaussian matrix of a given size and variance, a function to find the filter response for one pixel of the input image, and finally a function that calls the first function, then loops over the pixels of the input image and calls the second function for each pixel in order to generate the output image.
If you apply the blur transform with different sigma values to the nika.ppm image, the result should look like the images shown below. (Note that blur may be a bit slow, and the larger a radius you use, the slower it will be.)
Edge Detection
While edge detection algorithms can get quite sophisticated, it turns out that one of the best edge detectors today, the “Canny edge detector”, (Wikipedia) follows a relatively simple algorithm. Here, you will implement a few stages of the original Canny edge detector.
Since for edge detection we only really care about the intensity change of an image as it shifts from the boundary of one object to another; we need to first convert our image to grayscale. Next, in order to reduce the number of false positives our edge detector picks up, we need to denoise the image by applying a Gaussian filter. This, in fact, is exactly the same as the blur function you are asked to implement for this assignment! Once we’ve denoised the image we can go ahead and compute the intensity gradient for each interior point (i.e. points not on the boundary) of the image in both the horizontal (x) and vertical (y) directions.
Once you’ve calculated the magnitude of the intensity gradient for each pixel; you can perform the final step of thresholding each pixel and classifying it as an edge or not an edge. Specifically, you should set the value of all channels to 0 (black) if the magnitude exceeds the threshold, else set the values to 255 (white). You can ignore the boundary points and leave them as they are.
If you apply the edge detection transform to the nika.ppm image with the sigma and threshold arguments shown below, the results should look as shown below:
Submission
Remember that programs which do not compile (with standard compiler flags on the ugrad machines) will not receive credit. Additionally, points will be deducted for any compiler warnings. Points will also be deducted for any warnings, errors, or memory leaks reported by valgrind. All executables should be buildable using a Makefile, and should build and run cleanly.
Submit your project via Gradescope. Your submission should contain all source code and files necessary to compile your program (including a Makefile) as well as a README file (which includes both partner names and JHEDs) and a git log file from your shared midterm project repo. The log should indicate that all team members were contributing code and pushing their contributions to the repository. Your submission should not contain any compiled binaries (executables or object files), or any testing-related files (in particular, please do not submit any image files).
The requirements for your git log are the same as in previous assignments, except note that we expect both members of your team to be contributing commits to your shared midterm project repo.
Only one team member should submit the project on Gradescope. The same team member should submit all versions of the project in his/her account. Be sure that source file headers include the names and JHEDs of all team members, so that each student gets credit for this work.
Formative and Summative Assessment Goals
This project builds on all the previous assignments, as well as in-class work, and it will make use of several new features, including pointers and dynamic memory allocation. It is significantly larger in size and complexity than earlier programming homework assignments, and will be completed with a partner over a period of nearly three weeks.
The summative assessment goal is to evaluate your growing skill as a C programmer, and your ability to program in a principled way as program complexity grows. As always, this involves both C-specific skills like code-writing and debugging, and broader programming skills like problem solving, planning, design, algorithmic thinking, testing, teamwork, etc.
As program size and complexity continues to grow, this will also have the effect of assessing your time management and planning skills; while these skills are not directly related to your understanding of CS concepts, they are important skills in their own right. Most assignments for most classes build and assess these skills to some extent, but the longer duration assignments for this course will do so more rigorously than most.
The formative assessment goal is to build your programming skills, particularly relating to pointers and dynamic allocation. This assignment is also the first assignment designed to build skill with maintaining and using a codebase over a longer period of time; this skill will become increasingly important over the second half of the course.
This assignment also gives you a bit more freedom than the earlier ones, in that you are allowed (and expected) to come up with your own tests. This means you have greater control, but it also means that it is up to you to ensure that your functions work correctly and robustly; you cannot merely program to the tests that have been provided.
As with all programming assignments in this course, this assignment will both build and test your knowledge of the C programming language, your ability to work in a team, your knowledge of sound coding practices and stylistic conventions, your ability to plan, design, and carry out an incremental development plan, your ability to test and validate a program, and your ability to debug a program.
(本文出自 csprojectedu.com,转载请注明出处)