关于data:Data-Modeling

43次阅读

共计 1893 个字符,预计需要花费 5 分钟才能阅读完成。

Basic Concepts

  • Data Subjects / Entities: Something that “exists”, like student, grade, etc. Not equal to database tables. Database tables are sometimes artificial, duplicative(aggregates)

    • Strong Entity: has a primary key.
    • Weak Entity: has the partial key which acts as a discriminator between the entities of a weak entity set.
  • Data Attributes of the Data Subjects: field / database column, like ID, name, etc.
  • Relationship between Data Subjects: like instructor teaches a class, class is taught by an instructor.

    • Gerund: A relationship that also exhibits characteristics of an entity, and can have attributes attached to it.
  • Business Rules applied to our data: Cardinality(一个列中不同值的个数), mandatory or optional relationships, permissible attribute values (like NULL), data change dynamics.

Modeling

  • Systems Modeling

    • Data Modeling

      • Classic ER (Entity–Relationship)
      • Post Classic ER
    • System Modeling

      • Semantic
      • UML: Unified Modeling Language

Database Design vs Data Modeling

Database Design

  • Specific DBMS(Database Management System) model (e.g. relational)
  • Goes below schema to physical storage
  • Implementation/product specific restrictions from the very beginning

Data Modeling

  • Conceptual / Semantic level
  • Unconstraint by RDBMS(Relational Database Management System) or other implementation rules
  • Closer to real world

Data Modeling Life Cycle

Conceptual Modeling
Logical Modeling
Physical Modeling

Data Modeling Methodologies

Transactional:

  • Conceptual level: mirror real world
  • Logical level:

    • Relational: data normalization with deliberate denormalization
    • Non Relational: NoSQL, OODBMS(Object-Oriented Database Management System) constructs, etc.
  • Physical level: blocks/tracks, MPP(Massively Parallel Processing) distribution, etc.

Analytical (DW):

  • Conceptual level: dimensional
  • Logical level:

    • Relational: fact and dimension tables
    • Non Relational: cubes, culumnardatabases, etc.
  • Physical level: blocks/tracks, MPP(Massively Parallel Processing) distribution, AWS buckets, HDFS name nodes and data nodes, etc.

Classic ER Notation / Chen Notation

Multi Valued Attribute (MVA): like one person can have multiple email address

Crow’s Foot Notation

More closed aligned with logical modeling

正文完
 0