3/15/22, 2:09 PM COMP9315 22T1 - Assignment 1https://cgi.cse.unsw.edu.au/~... 1/8DeadlinePre-requisites:Late Penalty:Marks:Submission:COMP9315 22T1 Assignment 1Adding a PersonName Data Type toPostgreSQLDBMS ImplementationLast updated: Tuesday 22nd February 2:16pmMost recent changes are shown in red ... older changes are shown in brown.AimsThis assignment aims to give youan understanding of how data is treated inside a DBMSpractice in adding a new base type to PostgreSQLThe goal is to implement a new data type for PostgreSQL, complete with input/output functions,comparison operators, formatting functions, and the ability to build indexes on values of the type.SummaryFriday 18 March, 9:00pmbefore starting this assignment, it would be useful to complete Prac Work P040.04 marks off the ceiling mark for each hour lateThis assignment contributes 15 marks toward your total mark for this course.Webcms3 > Assignments > Ass1 Submission > Make Submissionor, on CSEmachines, give cs9315 ass1 pname.c pname.sourceMake sure that you read this assignment specification carefully and completely before starting workon the assignment.Questions which indicate that you haven't done this will simply get the response "Please read thespec".We use the following names in the discussion belowPG_CODE ... the directory where your PostgreSQL source code is located (typically/localstorage/YOU/postgresql-14.1/)PG_HOME ... the directory where you have installed the PostgreSQL binaries (typically/localstorage/YOU/pgsql/bin/)PG_DATA ... the directory where you have placed PostgreSQL's data (typically/localstorage/YOU/pgsql/data/)PG_LOG ... the file where you send PostgreSQL's log output (typically/localstorage/YOU/pgsql/data/log)IntroductionPostgreSQL has an extensibility model which, among other things, provides a well-defined processfor adding new data types into a PostgreSQL server. This capability has led to the development by3/15/22, 2:09 PM COMP9315 22T1 - Assignment 1https://cgi.cse.unsw.edu.au/~... 2/8PostgreSQL users of a number of types (such as polygons) which have become part of the standarddistribution. It also means that PostgreSQL is the database of choice in research projects which aimto push the boundaries of what kind of data a DBMS can manage.In this assignment, we will be adding a new data type for dealing with people's names. "Hmmm", yousay, "but aren't they just text strings, typically implemented as two attributes, one for family nameand one for given names?". That may be true, but making names into a separate base data typeallows us to explore how we store and manipulate them.One common way of writing names (e.g. used in UNSW student systems) isShepherd,John AndrewLin,XueminEilish,BillieMartin, Eric AndreLakshminarasimhan,Venkateswaran ChandrasekaraMarshall-Martin, Sally AngelaFeatherstone,Albert Basil Ernest George Harold Randolph Williami.e.FamilyName,GivenNamesWe give a more precise description of what text strings are valid PersonNames below.Adding Data Types in PostgreSQLThe process for adding new base data types in PostgreSQL is described in the following sections ofthe PostgreSQL documentation:37.13 User-defined Types37.10 C-Language Functions37.14 User-defined OperatorsSQL: CREATE TYPESQL: CREATE OPERATORSQL: CREATE OPERATOR CLASSSection 37.13 uses an example of a complex number type, which you can use as a starting point fordefining your PersonName data type (see below). There are other examples of new data types underthe directories:PG_CODE/contrib/chkpass/ ... an auto-encrypted password datatypePG_CODE/contrib/citext/ ... a case-insensitive character string datatypePG_CODE/contrib/seg/ ... a confidence-interval datatypeThese may or may not give you some useful ideas on how to implement the PersonName data type.For example, many of these data types are fixed-size, while PersonNames are variable-sized. Apotentially useful example of implementing variable-sized types can be found in:PG_CODE/src/tutorial/funcs.c ... implementation of several data typesSetting Up3/15/22, 2:09 PM COMP9315 22T1 - Assignment 1https://cgi.cse.unsw.edu.au/~... 3/8You ought to start this assignment with a fresh copy of PostgreSQL, without any changes that youmight have made for the Prac exercises (unless these changes are trivial). Note that you only needto configure, compile and install your PostgreSQL server once for this assignment. All subsequentcompilation takes place in the src/tutorial directory, and only requires modification of the filesthere.Once you have re-installed your PostgreSQL server, you should run the following commands:$ cd PG_CODE/src/tutorial$ cp complex.c pname.c$ cp complex.source pname.sourceNote the pname.* files will contain many references to complex; I do not want to see any remainingoccurences of the word complex in the files that you eventually submit. These files simply provide atemplate in which you create your PersonName type.Once you've made the pname.* files, you should also edit the Makefile in this directory and add thegreen text to the following lines:MODULES = complex funcs pnameDATA_built = advanced.sql basics.sql complex.sql funcs.sql syscat.sql pname.sqlThe rest of the work for this assignment involves editing only the pname.c and pname.source files.In order for the Makefile to work properly, you must use the identifier OBJWD in the pname.sourcefile to refer to the directory holding the compiled library. You should never modify directly thepname.sql file produced by the Makefile. Place all of your C code in the pname.c file; do notcreate any other *.c files.Note that your submitted versions of pname.c and pname.source should not contain any referencesto the complex type. Make sure that the documentation (comments in program) describes the codethat you wrote. Leaving the word complex anywhere in a pname.* file will cost 1 mark.The Person Name Data TypeWe wish to define a new base type PersonName to represent people's names, in the formatFamilyName,GivenNames. We also aim to define a useful set of operations on values of typePersonName and wish to be able to create indexes on PersonName attributes. How you representPersonName values internally, and how you implement the functions to manipulate them internally, isup to you. However, they must satisfy the requirements below.Once implemented correctly, you should be able to use your PostgreSQL server to build thefollowing kind of SQL applications:create table Students (zid integer primary key,name PersonName not null,degree text,-- etc. etc.);insert into Students(zid,name,degree) values(9300035,'Shepherd, John Andrew', 'BSc(Computer Science)'),3/15/22, 2:09 PM COMP9315 22T1 - Assignment 1https://cgi.cse.unsw.edu.au/~... 4/8(5012345,'Smith, Stephen', 'BE(Hons)(Software Engineering)');create index on Students using hash (name);select a.zid, a.name, b.zidfrom Students a join Students b on (a.name = b.name);select family(name), given(name), show(name)from Students;select name,count(*)from Studentsgroup by name;Having defined a hash-based file structure, we would expect that the queries would make use of it.You can check this by adding the keyword EXPLAIN before the query, e.g.db=# explain analyze select * from Students where name='Smith,John';which should, once you have correctly implemented the data type and loaded sufficient data, showthat an index-based scan of the data is being used. Note that this will only be evident if you use alarge amount of data (e.g. one of the larger test data samples to be provided).Person Name valuesValid PersonNames will have the above format with the following qualifications:there may be a single space after the commathere will be no people with just one name (e.g. no Prince, Jesus, Aristotle, etc.)there will be no numbers (e.g. noGates, William 3rd)there will be no titles (e.g. no Dr, Prof, Mr, Ms)there will be no initials (e.g. no Shepherd,John A)In other words, you can ignore the possibility of certain types of names while implementing yourinput and output functions.A more precise definition can be given using a BNF grammar:PersonName ::= Family','Given | Family', 'GivenFamily ::= NameListGiven ::= NameListNameList ::= Name | Name' 'NameListName ::= Upper LettersLetter ::= Upper | Lower | PuncLetters ::= Letter | Letter LettersUpper ::= 'A' | 'B' | ... | 'Z'3/15/22, 2:09 PM COMP9315 22T1 - Assignment 1https://cgi.cse.unsw.edu.au/~... 5/8Lower ::= 'a' | 'b' | ... | 'z'Punc ::= '-' | "'"You should not make any assumptions about the maximum length of a PersonName.Under this syntax, the following are valid names:Smith,JohnSmith, JohnO'Brien, Patrick SeanMahagedara Patabendige,Minosha Mitsuaki SenakasiriI-Sun, Chen WangClifton-Everest,Charles EdwardThe following names are not valid in our system:Jesus # no single-word namesSmith , Harold # space before the ","Gates, William H., III # no initials, too many commasA,B C # names must at least 2 lettersSmith, john # names begin with an upper-case letterThink about why each of the above is invalid in terms of the syntax definition.Important: for this assignment, we define an ordering on names as follows:the ordering is determined initially by the ordering on the Family Nameif the Family Names are equal, then the ordering is determined by the Given Namesordering of parts is determined lexicallyThere are examples of how this works in the section on Operations on PersonNames below.Representing Person NamesThe first thing you need to do is to decide on an internal representation for your PersonName datatype. You should do this, however, after you have looked at the description of the operators below,since what they require may affect how you decide to structure your internal PersonName values.When you read strings representing PersonName values, they are converted into your internal form,stored in the database in this form, and operations on PersonName values are carried out using thisdata structure. It is useful to define a canonical form for names, which may be slightly different to theform in which they are read (e.g. "Smith, John" might be rendered as "Smith,John"). When youdisplay PersonName values, you should show them in canonical form, regardless of how they wereentered or how they are stored.The first functions you need to write are ones to read and display values of type PersonName. Youshould write analogues of the functions complex_in(), complex_out that are defined in the filecomplex.c. Call them, e.g., pname_in() and pname_out(). Make sure that you use the V1 stylefunction interface (as is done in complex.c).Note that the two input/output functions should be complementary, meaning that any string displayedby the output function must be able to be read using the input function. There is no requirement foryou to retain the precise string that was used for input (e.g. you could store the PersonName value3/15/22, 2:09 PM COMP9315 22T1 - Assignment 1https://cgi.cse.unsw.edu.au/~... 6/8internally in a different form such as splitting it into two strings: one for the family name(s), and onefor the given name(s)).One thing that pname_in() must do is determine whether the name has the correct structure(according to the grammar above). Your pname_out() should display each name in a format thatcan be read by pname_in().Note that you are not required to define binary input/output functions, called receive_function andsend_function in the PostgreSQL documentation, and called complex_send and complex_recv inthe complex.cfile.As noted above, you cannot assume anything about the maximum length of names. If your solutionuses two fixed-size buffers (one for family, one for given) then your mark is limited to 6/10.Operations on person namesYou must implement all of the following operations for the PersonName type:PersonName = PersonName ... two names are equalTwo PersonNames are equivalent if, they have the same family name(s) and the same givenname(s).PersonName : Smith,JohnPersonName : Smith, JohnPersonName : Smith, John DavidPersonName : Smith, James(PersonName = PersonName ) is true(PersonName = PersonName ) is true(PersonName = PersonName ) is true (commutative)(PersonName = PersonName ) is false(PersonName = PersonName ) is falsePersonName > PersonName ... the first PersonName is greater than the secondPersonName is greater than PersonName if the Family part of PersonName is lexicallygreater than the Family part of PersonName . If the Family parts are equal, then PersonNameis greater than PersonName if the Given part of PersonName is lexically greater than theGiven part of PersonName .PersonName : Smith,JamesPersonName : Smith,JohnPersonName : Smith,John DavidPersonName : Zimmerman, Trent(PersonName > PersonName ) is false(PersonName > PersonName ) is false(PersonName > PersonName ) is true(PersonName > PersonName ) is false(PersonName > PersonName ) is trueOther operations: <>, >=, <, <=https://cgi.cse.unsw.edu.au/~... 7/8You should also implement the above operations, whose semantics is hopefully obvious fromthe descriptions above. The operators can typically be implemented quite simply in terms of thefirst two operators.family(PersonName) returns just the Family part of a namePersonName : Smith,JamesPersonName : O'Brien,Patrick SeanPersonName : Mahagedara Patabendige,Minosha Mitsuaki SenakasirPersonName : Clifton-Everest,David Ewanfamily(PersonName ) returns "Smith"family(PersonName ) returns "O'Brien"family(PersonName ) returns "Mahagedara Patabendige"family(PersonName ) returns "Clifton-Everest"given(PersonName) returns just the Given part of a namePersonName : Smith,JamesPersonName : O'Brien,Patrick SeanPersonName : Mahagedara Patabendige,Minosha Mitsuaki SenakasirPersonName : Clifton-Everest,David Ewangiven(PersonName ) returns "James"given(PersonName ) returns "Patrick Sean"given(PersonName ) returns "Minosha Mitsuaki Senakasir"given(PersonName ) returns "David Ewan"show(PersonName) returns a displayable version of the nameIt appends the entire Family name to the first Given name (everything before the first space, ifany), separated by a single space.PersonName : Smith,JamesPersonName : O'Brien,Patrick SeanPersonName : Mahagedara Patabendige,Minosha Mitsuaki SenakasirPersonName : Clifton-Everest,David EwanPersonName : Bronte,Greta-Anna Maryanneshow(PersonName ) returns "James Smith"show(PersonName ) returns "Patrick O'Brien"show(PersonName ) returns "Minosha Mahagedara Patabendige"show(PersonName ) returns "David Clifton-Everest"show(PersonName ) returns "Greta-Anna Bronte"Hint: test out as many of your C functions as you can outside PostgreSQL (e.g. write a simple testdriver) before you try to install them in PostgreSQL. This will make debugging much easier.You should ensure that your definitions capture the full semantics of the operators (e.g. specifycommutativity if the operator is commutative). You should also ensure that you provide sufficientdefinitions so that users of the PersonName type can create hash-based indexes on an attribute oftype PersonName.3/15/22, 2:09 PM COMP9315 22T1 - Assignment 1https://cgi.cse.unsw.edu.au/~... 8/8SubmissionYou need to submit two files: pname.c containing the C functions that implement the internals of thePersonName data type, and pname.source containing the template SQL commands to install thePersonName data type into a PostgreSQL server. Do not submit the pname.sql file, since it containsabsolute file names which are not helpful in our test environment.Have fun, jas
...