关于数据挖掘:解析CSC1002-Data-Visualization

38次阅读

共计 8160 个字符,预计需要花费 21 分钟才能阅读完成。

CSC1002 – Computational Laboratory
Data Visualization – Part I
(Interactive Query)

OVERVIEW
In this assignment, you are asked to design and develop a Data Visualization application to provide
school administrators and students with a web-enabled interface to perform online query on school
information, such information as course description, student GPA, course location, instructor name and
so on; based on the responses from the users, the application then generates a SQL statement to fetch
the data needed; based on the data returned from SQL server, the screen will be refreshed with the
requested information shown in a tabular format or as graphical plot. The application will be composed
of 3 primary components: Graphical User Interface (GUI), SQL Interface and Graphical Tool depicted as
follows:

CSC1002 – Computational Laboratory
CSC1002 – 2019 Term 2 By Kinley Lam
The GUI component will be composed of 2 web pages, one for online course query and the other for
statistical information on student GPA distribution. The format of the query page is shown as follows:
Whereas the format of the statistics page is shown as follows:
CSC1002 – Computational Laboratory
CSC1002 – 2019 Term 2 By Kinley Lam
SPECIAL SKILLS
Data Visualization: using“Bokeh”to construct the web-based GUI to interface with the end
users for online query
Relationship Database: constructing simple SQL statements to extract the relevant information
from a database.
Using data visualization tool to turn raw data into interactive 2-D graphics such as lines, scatter
plots, bar graphs, stacked bars and so on.
Using pymssql to interface the SQL server
Using widgets to capture user responses: text input, radio buttons, button groups, tables,
dropdown list and so on.
SCOPE – GENERAL
NOTE:
Keep your entire source code in ONE SINGLE file.
Use only standard python modules
Use pymssql to connect SQL server
Use bokeh to design the GUI
In your design stick ONLY to functions, in other words, no class objects of your own.

CSC1002 – Computational Laboratory
CSC1002 – 2019 Term 2 By Kinley Lam
SCOPE – TAB PANEL

  1. Using bokeh tool to construct a multi-tab panel for the home page:
    a. First panel as“Course Info”
    b. Second panel as“Statistics”
  2. The panel“Course Info”holds the online course query page, refer to corresponding section for
    more details.
  3. The panel“Statistics”holds the statistics page, refer to corresponding section for more details.

CSC1002 – Computational Laboratory
CSC1002 – 2019 Term 2 By Kinley Lam
SCOPE – COURSE INFO

  1. Using bokeh widgets & layout to construct the web-based interactive query screen.
    a. refer to the given template file (ui.py) for information.
  2. The A-Z button group, when clicked, will display all courses with title beginning with the letter of
    the clicked button, the result will be displayed in the data table shown on the bottom of the
    screen.
  3. Two sets of“begin with ….”-“…contains…”-“…ends with”button-groups, one for the course title
    and the other for department name. They behave as radio buttons. They are used together
    with the two corresponding text inputs, namely“Title”and“Department”respectively.
    a. These 3 buttons indicate whether or not the given string will be used as prefix, infix or
    suffix match respectively. If“begins with…”is selected, courses will be matched with
    title beginning with the given text; if“…contains…”is selected, title containing the given
    text will be matched.
    b. Ex:“begin with…”with input“bio”, courses such as“biology”,“bioinformatics”are
    shown.
  4. The“and”-“or”radio button-group indicates whether or not to search on title and department
    as a group (“and”) or separately (“or”).
  5. When the button“Refresh”is clicked, the actual search will take place and the result will be
    displayed in the data table. The search is based on a database table called“lgu.course”.
    a. sp_help“lgu.course”to view its structure (refer to session“Database – Info”)
  6. It is required that the program execute“efficient”SQL statement to fetch only the data needed.
    It is NOT supposed to cache the entire course table in memory and avoid further database
    request. Though the size of the course table for this exercise is small in size, however, in reality,
    it can be potentially large in size; it’s therefore best practice to use SQL query effectively to fetch
    only the information needed. You should also take advantages of SQL group-by function to
    CSC1002 – Computational Laboratory
    CSC1002 – 2019 Term 2 By Kinley Lam
    reduce as much as data possible to minimize the network traffic between the server and your
    program.
    Examples:
    “begins with…”with“bio”as Title
    “…contains…”with“bio”as Title
    “…ends with”with“bio”as Title
    CSC1002 – Computational Laboratory
    CSC1002 – 2019 Term 2 By Kinley Lam
    “…contains…”with“bio”as Title and“finance”as Department, with“and”option
    “…contains…”with“bio”as Title and“finance”as Department, with“or”option

CSC1002 – Computational Laboratory
CSC1002 – 2019 Term 2 By Kinley Lam
DATABASE – INFO
Using Visual Studio Code, create a file of type“sql”, make a database connection to“csc1002”and
perform a query execution with the following statement:
sp_help“lgu.course”
SQL connection:
CSC1002 – Computational Laboratory
CSC1002 – 2019 Term 2 By Kinley Lam
SCOPE – STATISTICS

  1. Using bokeh widgets & layout to construct the web-based interactive query screen.
    a. refer to the lab materials for more information
  2. On the right side of the page is a stacked bar chart, showing the GPA distribution by
    department, further categorized by class year.
  3. On the left side of the page is a selection list displaying the list of all departments
    a. It is used to select the department for which the chart is created
  4. When a new department is selected, the chart will be updated to refresh the corresponding
    information.
  5. The stacked bar chart shows student GPA distribution based on the selected department.
    a. Students are from class of 2015, 2016 & 2017 (year)
    b. As illustrated, GPA counts are stacked in chronological order, each with a unique color
    c. Legend (upper right) is shown to display the color of the classes
  6. The database table to use is“lgu.student”and the structure is shown as follows:
    CSC1002 – Computational Laboratory
    CSC1002 – 2019 Term 2 By Kinley Lam
    STARTUP OPTIONS
    “bokeh serve –show <a2.py>”

SKILLS – PYTHON
In this assignment, you will be trained on the use of the followings:
Standard Python Object:
Standard objects (strings, numbers & lists)
Control statements to interact with users
Variable Scope
String formatting (method style)
Functions for program structure and decomposition
DELIVERABLES

  1. Design documentation (A2_School_StudentID_Design.doc/pdf)
  2. Program source code (A2_School_StudentID_Source.py)
    where School is SSE, SME, HSS, or FE and StudentID is your 9-digit student ID.

Zip all files above in a single file (A1_School_StudentID.zip) and submit the zip file by due date to the
corresponding assignment folder under“Assignment (submission)”
For instances, a SME student with student ID“119010001”:
A2_SME_119010001.zip:
o A2_SME_119010001_Design.doc/pdf
o A2_SME_119010001_Source.py
5% will be deducted if any files are incorrectly named!!!
For the design document kindly refer to section“Design Documentation”for details.
CSC1002 – Computational Laboratory
CSC1002 – 2019 Term 2 By Kinley Lam
DESIGN DOCUMENTATION
For the design document provide write-up for the following sections:

  1. Program structure and flow
    a. Describe the main process in turns of the overall program flow, that is, your solution to
    the problem presented by this assignment.
  2. Python objects (global variables)
    a. Usage of core python objects (purposes)
  3. Functions
    a. Describe usage of all your newly defined functions, including details of parameter(s)
  4. Output
    a. Show samples of output from your program
    TIPS & HINTS
    For executing and debugging“bokeh”, refer to the uploaded file“launch.json”and“tasks.json”.
    For general information on“Bokeh”, refer to the following link:
    https://bokeh.pydata.org/en/l…
    For adding widgets (text input, list box, button groups, tables ..etc), refer to the following link:
    https://bokeh.pydata.org/en/l…
    For reference guide, refer to the following link:
    https://bokeh.pydata.org/en/l…

CSC1002 – Computational Laboratory
CSC1002 – 2019 Term 2 By Kinley Lam
DEBUGGING

VSC TASKS
Note: replace IP (20.20.12.68) with your own local IP. 5006 is the http port, do not need to change it.
CSC1002 – Computational Laboratory
CSC1002 – 2019 Term 2 By Kinley Lam
SAMPLE OUTPUT

CSC1002 – Computational Laboratory
CSC1002 – 2019 Term 2 By Kinley Lam

CSC1002 – Computational Laboratory
CSC1002 – 2019 Term 2 By Kinley Lam
MARKING CRITERIA
Coding Styles – layout, comments, white spaces, naming convention, variables, indentation.
Documentation – Design + Test Plan
Program Correctness – logic, program structure, functions with appropriate parameters
User Interaction – how informative and accurate information is exchanged between game
player and host.
Readability counts – programs that are well structured and easy-to-follow using functions to
breakdown complex problems into smaller cleaner generalized functions are preferred over a
function embracing a complex logic with nested conditions and sub-functions! In other words, a
design with clean architecture with high readability is the predilection for the course objectives
over efficiency.
KISS approach – Keep It Simple and Straightforward.
Balance approach – you are not required to come up a very optimized solution. However, take a
balance between readability and efficiency with good use of program constructs.
CHALLENGES
Completed by due date
DUE DATE

WX:codehelp

正文完
 0