CS 3114 Data Structures & Algorithms J4: Geographic Information SystemVersion 6.00 This is a purely individual assignment! 1Geographic Information SystemGeographic information systems organize information pertaining to geographic features and provide various kinds of accessto the information. A geographic feature may possess many attributes (see below). In particular, a geographic feature has aspecific location. There are a number of ways to specify location. For this project, we will use latitude and longitude, whichwill allow us to deal with geographic features at any location on Earth. A reasonably detailed tutorial on latitude andlongitude can be found in the Wikipedia at en.wikipedia.org/wiki/Latitude and en.wikipedia.org/wiki/Longitude.The GIS record files were obtained from the website for the USGS Board on Geographic Names (www.usgs.gov/corescience-systems/ngp/board-on-geographic-names/download-gnis-data).The file begins with a descriptive header line,followed by a sequence of GIS records, one per line, which contain the following fields in the indicated order:Figure 1: Geographic Data Record FormatName Type Length/Decimals Short DescriptionFeature ID Integer 10Permanent, unique feature record identifier and official feature name FeatureName String 120FeatureClass String 50 See Figure 3 later in this specificationState Alpha String 2State The unique two letter alphabetic code and the unique two number code for a US StateNumeric String 2CountyName String 100The name and unique three number code for a county or county equivalentCountyNumeric String 3PrimaryLatitudeDMSString 7The official feature locationDMS-degrees/minutes/secondsDEC-decimal degrees.Note: Records showing "Unknown" and zeros for the latitude and longitude DMS anddecimal fields, respectively, indicate that the coordinates of the feature are unknown.They are recorded in the database as zeros to satisfy the format requirements of anumerical data type. They are not errors and do not reference the actual geographiccoordinates at 0 latitude, 0 longitude.PrimaryLongitudeDMSString 8PrimaryLatitudeDECRealNumber 11/7PrimaryLongitudeDECRealNumber 12/7SourceLatitudeDMSString 7Source coordinates of linear feature only (Class = Stream, Valley, Arroyo)DMS-degrees/minutes/secondsDEC-decimal degrees.Note: Records showing "Unknown" and zeros for the latitude and longitude DMS anddecimal fields, respectively, indicate that the coordinates of the feature are unknown.They are recorded in the database as zeros to satisfy the format requirements of anumerical data type. They are not errors and do not reference the actual geographiccoordinates at 0 latitude, 0 longitude.SourceLongitudeDMSString 8SourceLatitudeDECRealNumber 11/7SourceLongitudeDECRealNumber 12/7CS 3114 Data Structures & Algorithms J4: Geographic Information SystemVersion 6.00 This is a purely individual assignment! 2Elevation(meters) Integer 5 Elevation in meters above (-below) sea level of the surface at the primary coordinatesElevation(feet) Integer 6 Elevation in feet above (-below) sea level of the surface at the primary coordinatesMap Name String 100 Name of USGS base series topographic map containing the primary coordinates.DateCreated String The date the feature was initially committed to the database.Date Edited String The date any attribute of an existing feature was last edited.Notes: See https://geonames.usgs.gov/dom... for the full field descriptions. The type specifications used here have been modified from the source (URL above) to better reflect the realities ofyour programming environment. Latitude and longitude may be expressed in DMS (degrees/minutes/seconds, 0820830W) format, or DEC (realnumber, -82.1417975) format. In DMS format, latitude will always be expressed using 6 digits followed by a singlecharacter specifying the hemisphere, and longitude will always be expressed using 7 digits followed by ahemisphere designator. Although some fields are mandatory, some may be omitted altogether. Best practice is to treat every field as if itmay be left unspecified. Certain fields are necessary in order to index a record: the feature name and the primarylatitude and primary longitude. If a record omits any of those fields, you may discard the record, or index it as far aspossible.In the GIS record file, each record will occur on a single line, and the fields will be separated by pipe ('|') symbols. Emptyfields will be indicated by a pair of pipe symbols with no characters between them. See the posted VA_Monterey.txt filefor many examples.GIS record files are guaranteed to conform to this syntax, so there is no explicit requirement that you validate the files. Onthe other hand, some error-checking during parsing may help you detect errors in your parsing logic.The file can be thought of as a sequence of bytes, each at a unique offset from the beginning of the file, just like the cells ofan array. So, each GIS record begins at a unique offset from the beginning of the file.Line TerminationEach line of a text file ends with a particular marker (known as the line terminator). In MS-DOS/Windows file systems, theline terminator is a sequence of two ASCII characters (CR + LF, 0X0D0A). In Unix systems, the line terminator is a singleASCII character (LF). Other systems may use other line termination conventions.Why should you care? Which line termination is used has an effect on the file offsets for all but the first record in the datafile. As long as we’re all testing with files that use the same line termination, we should all get the same file offsets. But ifyou change the file format (of the posted data files) to use different line termination, you will get different file offsets than areshown in the posted log files. Most good text editors will tell you what line termination is used in an opened file, and also letyou change the line termination scheme.All that being said, this project is not auto-graded, and the grading of correctness will depend on whether you report thecorrect search results, not on the file offsets you report.CS 3114 Data Structures & Algorithms J4: Geographic Information SystemVersion 6.00 This is a purely individual assignment! 3Figure 2: Sample Geographic DataRecordsNote that some record fields are optional, and thatwhen there is no given value for a field, there arestill delimiter symbols for it.Also, some of the lines are "wrapped" to fit intothe text box; lines are never "wrapped" in theactual data files.FEATURE_ID|FEATURE_NAME|FEATURE_CLASS|STATE_ALPHA|STATE_NUMERIC|COUNTY_NAME|COUNTY_NUMERIC|PRIMARY_LAT_DMS|PRIM_LONG_DMS|PRIM_LAT_DEC|PRIM_LONG_DEC|SOURCE_LAT_DMS|SOURCE_LONG_DMS|SOURCE_LAT_DEC|SOURCE_LONG_DEC|ELEV_IN_M|ELEV_IN_FT|MAP_NAME|DATE_CREATED|DATE_EDITED1479116|Monterey Elementary School|School|VA|51|Roanoke (city)|770|371906N|0795608W|37.3183753|- 79.9355857|||||323|1060|Roanoke|09/28/1979|09/15/20101481345|Asbury Church|Church|VA|51|Highland|091|382607N|0793312W|38.4353981|-79.5533807|||||818|2684|Monterey|09/28/1979|1481852|Blue Grass|Populated Place|VA|51|Highland|091|383000N|0793259W|38.5001188|-79.5497702|||||777|2549|Monterey|09/28/1979|1481878|Bluegrass Valley|Valley|VA|51|Highland|091|382953N|0793222W|38.4981745|-79.539492|382601N|0793800W|38.4337309|- 79.6333833|759|2490|Monterey|09/28/1979|1482110|Buck Hill|Summit|VA|51|Highland|091|381902N|0793358W|38.3173452|-79.5661577|||||1003|3291|Monterey SE|09/28/1979|1482176|Burners Run|Stream|VA|51|Highland|091|382509N|0793409W|38.4192873|-79.5692144|382531N|0793538W|38.4252778|- 79.5938889|848|2782|Monterey|09/28/1979|1482324|Mount Carlyle|Summit|VA|51|Highland|091|381556N|0793353W|38.2656799|-79.5647682|||||698|2290|Monterey SE|09/28/1979|1482434|Central Church|Church|VA|51|Highland|091|382953N|0793323W|38.4981744|-79.5564371|||||773|2536|Monterey|09/28/1979|1482557|Claylick Hollow|Valley|VA|51|Highland|091|381613N|0793238W|38.2704021|-79.5439343|381733N|0793324W|38.2925|- 79.5566667|573|1880|Monterey SE|09/28/1979|1482785|Crab Run|Stream|VA|51|Highland|091|381707N|0793144W|38.2854018|-79.528934|381903N|0793415W|38.3175|-79.5708333|579|1900|MontereySE|09/28/1979|1482950|Davis Run|Stream|VA|51|Highland|091|381824N|0793053W|38.3067903|-79.5147671|382057N|0793505W|38.3491667|-79.5847222|601|1972|MontereySE|09/28/1979|1483281|Elk Run|Stream|VA|51|Highland|091|382936N|0793153W|38.4934524|-79.5314362|383121N|0793056W|38.5226185|- 79.5156027|757|2484|Monterey|09/28/1979|1483492|Forks of Waters|Locale|VA|51|Highland|091|382856N|0793031W|38.4823417|-79.5086575|||||705|2313|Monterey|09/28/1979|1483527|Frank Run|Stream|VA|51|Highland|091|382953N|0793310W|38.4981744|-79.5528258|383304N|0793341W|38.5512285|- 79.5614381|780|2559|Monterey|09/28/1979|1483647|Ginseng Mountain|Summit|VA|51|Highland|091|382850N|0793139W|38.480675|-79.527547|||||978|3209|Monterey|09/28/1979|1483860|Gulf Mountain|Summit|VA|51|Highland|091|382940N|0793103W|38.4945636|-79.5175468|||||1006|3300|Monterey|09/28/1979|1483916|Hamilton Chapel|Church|VA|51|Highland|091|381740N|0793707W|38.2945677|-79.6186591|||||823|2700|Monterey SE|09/28/1979|1484097|Highland High School|School|VA|51|Highland|091|382426N|0793444W|38.4071387|-79.5789333|||||879|2884|Monterey|09/28/1979|09/15/20101484099|Highland Wildlife Management Area|Park|VA|51|Highland|091|381905N|0793439W|38.3181785|-79.577547|||||954|3130|Monterey SE|09/28/1979|. . .CS 3114 Data Structures & Algorithms J4: Geographic Information SystemVersion 6.00 This is a purely individual assignment! 4AssignmentYou will implement a system that indexes and provides search features for a file of GIS records, as described above.Your system will build and maintain several in-memory index data structures to support these operations: Importing new GIS records into the database file Retrieving data for all GIS records matching given geographic coordinates Retrieving data for all GIS records matching a given feature name and state Retrieving data for all GIS records that fall within a given (rectangular) geographic region Displaying the in-memory indices in a human-readable mannerYou will implement a single software system in Java to perform all system functions.Program InvocationThe program will take the names of three files from the command line, like this:java GIS <database file name> <command script file name> <log file name>Note that this implies your main class must be named GIS, and must be in the default Java package.The database file should be created as an empty file; note that the specified database file may already exist, in which case theexisting file should be truncated or deleted and recreated. If the command script file is not found the program should write anerror message to the console and exit. The log file should be rewritten every time the program is run, so if the file alreadyexists it should be truncated or deleted and recreated.System OverviewThe system will create and maintain a GIS database file that contains all the records that are imported as the program runs.The GIS database file will be empty initially. All the indexing of records will be done relative to this file.There is no guarantee that the GIS record file will not contain two or more distinct records that have the same geographiccoordinates. In fact, this is natural since the coordinates are expressed in the usual DMS system. So, we cannot treatgeographic coordinates as a primary (unique) key.The GIS records will be indexed by the Feature Name and State (abbreviation) fields. This name index will supportfinding offsets of GIS records that match a given feature name and state abbreviation.The GIS records will also be indexed by geographic coordinate. This coordinate index will support finding offsets of GISrecords that match a given primary latitude and primary longitude.The system will include a buffer pool, as a front end for the GIS database file, to improve search speed. See the discussion ofthe buffer pool below for detailed requirements. When performing searches, retrieving a GIS record from the database filemust be managed through the buffer pool. During an import operation, when records are written to the database file, thebuffer pool will be bypassed, since the buffer pool would not improve performance during imports.When searches are performed, complete GIS records will be retrieved from the GIS database file that your programmaintains. The only complete GIS records that are stored in memory at any time are those that have just been retrieved tosatisfy the current search, or individual GIS records created while importing data or GIS records stored in the buffer pool.Aside from where specific data structures are required, you may use any suitable Java library containers you like.Each index should have the ability to write a nicely-formatted display of itself to an output stream. CS 3114 Data Structures & Algorithms J4: Geographic Information SystemVersion 6.00 This is a purely individual assignment! 5Name Index InternalsThe name index will use a hash table for its physical organization. Each hash table entry will store a feature name and stateabbreviation (separately or concatenated, as you like) and the file offset(s) of the matching record(s). Since each GIS recordoccupies one line in the file, it is a trivial matter to locate and read a record given nothing but the file offset at which therecord begins.Your table will use chaining to resolve collisions, with a contiguous physical structure (physical array) to store the chains.The initial size of the table will be 1024, and the table will resize itself automatically, by doubling its size whenever the tablebecomes 70% full. Obviously, you should base this on a slight modification of the hash table you implemented for J3.You will use the same hash function that was supplied for J2, and apply it to a concatenation of the feature name and stateabbreviation field of the data records. Precisely how you form the concatenation is up to you.You must be able to display the contents of the hash table in a readable manner.Coordinate Index InternalsThe coordinate index will use a bucket PR quadtree for the physical organization. In a bucket PR quadtree, each leaf storesup to K data objects (for some fixed value of K). Upon insertion, if the added value would fall into a leaf that is already full,then the region corresponding to the leaf will be partitioned into quadrants and the K+1 data objects will be inserted intothose quadrants as appropriate. As is the case with the regular PR quadtree, this may lead to a sequence of partitioning steps,extending the relevant branch of the quadtree by multiple levels. In this project, K will probably equal 4, but I reserve theright to specify a different bucket size with little notice, so this should be easy to modify.The index entries held in the quadtree will store a geographic coordinate and a collection of the file offsets of the matchingGIS records in the database file. Obviously, this is derived directly from your solution to J2.Note: do not confuse the bucket size with any limit on the number of GIS records that may be associated with a singlegeographic coordinate. A quadtree node can contain index objects for up to K different geographic coordinates. Each suchindex object can contain references to an unlimited number of different GIS records.The PR quadtree implementation should follow good design practices, and its interface should be somewhat similar to that ofthe BST. You are expected to implement different types for the leaf and internal nodes, with appropriate data membershipfor each, and an abstract base type from which they are both derived. Of course, these were all requirements for the relatedminor project.You must be able to display the PR quadtree in a readable manner. PR quadtree display code is given in the course notes.The display must clearly indicate the structure of the tree, the relationships between its nodes, and the data objects in the leafnodes.Buffer Pool DetailsThe buffer pool for the database file should be capable of buffering up to 15 records, and will use LRU replacement. Youmay use any structure you like to organize the pool slots; however, since the pool will have to deal with record replacements,some structures will be more efficient (and simpler) to use. You may use any classes from the Java library you think areappropriate.It is up to you to decide whether your buffer pool stores interpreted or raw data; i.e., whether the buffer pool stores GISrecord objects or just strings.You must be able to display the contents of the buffer pool, listed from MRU to LRU entry, in a readable manner. The orderin which you retrieve records when servicing a multi-match search is not specified, so such searches may result in differentorderings of the records within the buffer pool. That is OK.CS 3114 Data Structures & Algorithms J4: Geographic Information SystemVersion 6.00 This is a purely individual assignment! 6A Note on Coordinates and Spatial RegionsIt is important to remember that there are fundamental differences between the notion that a geographic feature has specificcoordinates (which may be thought of as a point) and the notion that each node of the PR quadtree corresponds to a particularsub-region of the coordinate space (which may contain many geographic features).In this assignment, coordinates of geographic features are specified as latitude/longitude pairs, and the minimum resolution isone second of arc. Thus, you may think of the geographic coordinates as being specified by a pair of integer values.On the other hand, the boundaries of the sub-regions are determined by performing arithmetic operations, including division,starting with the values that define the boundaries of the “world”. Unless the dimensions of the world happen to be powersof 2, this can quickly lead to regions whose boundaries cannot be expressed exactly as integer values. You should usefloating-point values to represent region boundaries, when carrying out splitting operations and quadtree traversals.Your implementation should view the boundary between regions as belonging to one of those regions. The choice of aparticular rule for handling this situation is left to you. The specification for the PR quadtree project describes how I madethat decision, but there is absolutely no requirement that you follow the same approach.When carrying out a region search, you must determine whether the search region overlaps with the region corresponding toa subtree node before descending into that subtree. The Java libraries include a Rectangle class which could be (too) useful.You may make use of the Rectangle class, but you will be penalized 10% if your submitted solution makes use of any of thefollowing Rectangle methods: contains(), intersection(), and intersects(). Note though, that it is acceptable to make useof those methods during development, but you must implement your own versions of them in your final submission.Other System ElementsThere should be an overall controller that validates the command line arguments and manages the initialization of the varioussystem components. The controller should hand off execution to a command processor that manages retrieving commandsfrom the script file, and making the necessary calls to other components in order to carry out those commands.Naturally, there should be a data type that models a GIS record.There may well be additional system elements, whether data types or data structures, or system components that are notmentioned here. The fact no additional elements are explicitly identified here does not imply that you will not be expected toanalyze the design issues carefully, and to perhaps include such elements.Aside from the command-line interface, there are no specific requirements for interfaces of any of the classes that will makeup your GIS; it is up to you to analyze the specification and come up with an appropriate set of classes, and to design theirinterfaces to facilitate the necessary interactions. It is probably worth pointing out that an index (e.g., a geographiccoordinate index) should not simply be a naked container object (e.g, quadtree); if that's not clear to you, think more carefullyabout what sort of interface would be appropriate for an index, as opposed to a container.Command FileThe execution of the program will be driven by a script file. Lines beginning with a semicolon character (';') are commentsand should be ignored. Blank lines are possible. Each line in the command file consists of a sequence of tokens, which willbe separated by single tab characters. A line terminator will immediately follow the final token on each line. The commandfile is guaranteed to conform to this specification, so you do not need to worry about error-checking when reading it.The first non-comment line will specify the world boundaries to be used:world<tab><westLong><tab><eastLong><tab><southLat><tab><northLat>This will be the first command in the file, and will occur once. It specifies the boundaries of the coordinate space to bemodeled. The four parameters will be longitude and latitudes expressed in DMS format, representing the vertical andhorizontal boundaries of the coordinate space.It is certainly possible that the GIS record file will contain records for features that lie outside the specified coordinatespace. Such records should be ignored; i.e., they will not be indexed.CS 3114 Data Structures & Algorithms J4: Geographic Information SystemVersion 6.00 This is a purely individual assignment! 7Each subsequent non-comment line of the command file will specify one of the commands described below. One commandis used to load records into your database from external files:import<tab><GIS record file>Add all the valid GIS records in the specified file to the database file. This means that the records will be appended tothe existing database file, and that those records will be indexed in the manner described earlier. When the import iscompleted, log the number of entries added to each index, and the longest probe sequence that was needed wheninserting to the hash table. (A valid record is one that lies within the specified world boundaries.)Another command requires producing a human-friendly display of the contents of an index structure:debug<tab>[ quad | hash | pool ]Log the contents of the specified index structure in a fashion that makes the internal structure and contents of the indexclear. It is not necessary to be overly verbose here, but you should include information like key values and file offsetswhere appropriate.Another simply terminates execution, which is handy if you want to process only part of a command file:quit<tab>Terminate program execution.The other commands involve searches of the indexed records:what_is_at<tab><geographic coordinate>For every GIS record in the database file that matches the given <geographic coordinate>, log the offset atwhich the record was found, and the feature name, county name, and state abbreviation. Do not log any other datafrom the matching records.what_is<tab><feature name><tab><state abbreviation>For every GIS record in the database file that matches the given <feature name> and <stateabbreviation>, log the offset at which the record was found, and the county name, the primary latitude, and theprimary longitude. Do not log any other data from the matching records.what_is_in<tab><geographic coordinate><tab><half-height><tab><half-width>For every GIS record in the database file whose coordinates fall within the closed rectangle with the specified heightand width, centered at the <geographic coordinate>, log the offset at which the record was found, and thefeature name, the state name, and the primary latitude and primary longitude. Do not log any other data from thematching records. The half-height and half-width are specified as seconds.If a <geographic coordinate> is specified for a command, it will be expressed as a pair of latitude/longitude values,expressed in the same DMS format that is used in the GIS record files.For all the commands, if a search results in displaying information about multiple records, you may display that data in anyorder that is natural to your design.Sample command scripts, and corresponding log files, will be provided on the website. As a general rule, every commandshould result in some output. In particular, a descriptive message should be logged if a search yields no matching records.CS 3114 Data Structures & Algorithms J4: Geographic Information SystemVersion 6.00 This is a purely individual assignment! 8Log File DescriptionSince this assignment will be graded by TAs, rather than script automation, the format of the output is left up to you. Ofcourse, your output should be clear, concise, well labeled, and correct. You should begin the log with a few lines identifyingyourself, and listing the names of the input files that are being used. No one will object if you use the posted log files as amodel for formatting your own output, but you are certainly not expected to match the formatting of those exactly.The remainder of the log file output should come directly from your processing of the command file. You are required toecho each comment line, and each command that you process to the log file so that it's easy to determine which commandeach section of your output corresponds to. Each command (except for "world") should be numbered, starting with 1, andthe output from each command should be well formatted, and delimited from the output resulting from processing othercommands. A complete sample log will be posted shortly on the course website.A zip file is posted, containing scripts, GIS data files, and reference log files you can use in testing. The grading will be doneusing files in exactly the same format (perhaps even including some of the posted files).Administrative IssuesSubmissionYou will submit this assignment to the Curator System (read the Student Guide), where it will be archived for grading by acourse TA.For this assignment, you must submit a single zip file containing all the Java source code files for your implementation (i.e.,.java files). Submit only the Java source files. Do not submit Java bytecode (class) files. If you use packages in yourimplementation (and that's good practice), your zip file must include the correct directory structure for those packages, andyour GIS.java file must be in the top directory when the tar file is unpacked. That's easy to verify by running "unzip –l" on the tar file you are planning to submit.In order to correct submission errors and late-breaking implementation errors, you will be allowed up to five submissions forthis assignment. By default, we will grade the last submission you make.The Student Guide and link to the submission client can be found at: http://www.cs.vt.edu/curator/EvaluationThe quality of the OO design in your solution will be evaluated; you are expected to identify and implement a soundcollection of classes, whether the specification mentions them or not. The quality of your internal documentation will beevaluated. Finally, the correctness of your solution will be evaluated by executing your solution on a collection of test datafiles. Be sure to test your solution with all of the data sets that will be posted, since we will use a variety of data sets,including at least one very large data one (perhaps hundreds of thousands of records) in our evaluation.The quality of your design and internal documentation will determine approximately 30% of your grade for this assignment.Pay attention to the design discussion in class for J1.We will compile and test your solution on CentOS 8, using version 11.0.8 of the JDK. The choice of operating systemshould not matter with a Java program (write once, run everywhere, right?), but you should compile and test your solution onrlogin before submitting it.Late submissions will be penalized at a rate of 10% per day or fraction thereof. There will also be a final deadline, afterwhich we will not accept submissions.CS 3114 Data Structures & Algorithms J4: Geographic Information SystemVersion 6.00 This is a purely individual assignment! 9Pedagogic PointsThe goals of this assignment include, but are not limited to: implementation of a hash table generic in Java (as in J3) implementation of a bucket PR quadtree generic in Java (as in J2) implementation of a buffer pool in Java design of appropriate index classes to wrap the containers design of appropriate data objects to store in each index understanding how to navigate a file in Java understanding how to parse delimited strings in Java (based on your solutions to J1 and J3) creation of a sensible OO design for the overall system, including the identification of a number of useful classes notexplicitly named in this specification implementation of such an OO design into a working system incremental testing of the basic components of the system in isolation satisfaction when the entire system comes together in good working orderPledgeEach of your program submissions must be pledged to conform to the Honor Code requirements for this course. Specifically,you must include the pledge statement provided below. The pledge statement should be in the file that contains your mainclass (which must be named GIS.java).// On my honor://// - I have not discussed the Java language code in my program with// anyone other than my instructor or the teaching assistants// assigned to this course.//// - I have not used Java language code obtained from another student,// or any other unauthorized source, including the Internet, either// modified or unmodified.//// - If any Java language code or documentation used in my program// was obtained from another source, such as a text book or course// notes, that has been clearly noted with a proper citation in// the comments of my program.//// - I have not designed this program in such a way as to defeat or// interfere with the normal operation of the grading code.//// <Student's Name>// <Student's VT email PID>We reserve the option of assigning a score of zero to any submission that is undocumentedor does not contain this statement.Change LogVersion Date Page Change(s)6.00 Mar 8 Base document6.10 Mar 11 5 Changes to the hash table requirements to bring it in line with J3; other minor changes
...