共计 2350 个字符,预计需要花费 6 分钟才能阅读完成。
The data science roadmap for medical students
开学后智能医学社曾经进行了几次练习,但学生的实现状况并不现实,明天我筹备为医学生做一下课程的体系构建,这样能让想学习的同学提前学习,落后的同学,晓得本人在哪个阶段,哪个步骤了。
医学生并不是业余的 coder 或者 engineer,所以相似的 Roadmap 并不太实用。
医学生想要步入数据迷信,以下几点必须留神:
- 弱化数学及统计学
- 短期速成,容易把握
- 启发思维为主,主张代码复用
- 合乎医学思维和医药数据需要
- 实际为王,解决医学迷信问题
先上图,数据迷信金字塔,如果你按我的常识体系,能以最快的速度爬到塔尖
咱们采纳的程序语言是 python 和 R,至于为什么,基本无需解释了。我更喜爱 R,双条腿走路,不吃亏。整个学习体系,大部分以现有的优良教程为主,切实不行,我就本人做教程,一步一步带你达到巅峰。最初正告大家一句:高处不胜寒!
Week 1 / Git、R、python 根底,小热身
本周,通过一些小练习和一些基础教程,来学习一些简略的编程语法和版本控制(git)。
- Python 新手入门课, 该课程蕴含了 git 的局部内容
- git/github 采纳 token 进行认证拜访,解决局部 git 中的问题
- 20 节 R 视频课程,零根底学习 R!– 专栏课程 – 医咖会 (mediecogroup.com),经典入门课,只看到 2 章即可
Week 2 / 环境搭建,R 和 python 如影随行
数据迷信的王者是 Anaconda,但我感觉它仿佛太臃肿了,我更偏向于本人搭建。
- 必备工具
- Python 数据迷信环境搭建(jupyter-lab)
- R 环境搭建 R &Rstudio 装置与配置
- 数据迷信本地环境搭建 Python&R(jupyter lab | Rstudio)视频
所有工具, 都倡议采纳英文界面, 所有工具的装置目录都不得呈现中文
Week 3 / 扫盲科普
数据迷信必须理解的概念:机器学习,深度学习,神经网络,机器学习经典模型(算法)
每个视频都不长。算是入门科普。
是什么系列
https://www.bilibili.com/vide…
https://www.bilibili.com/vide…
https://www.bilibili.com/vide…
https://www.bilibili.com/vide…
https://www.bilibili.com/vide…
经典算法 5 分钟
https://space.bilibili.com/10…
Data!Data!Data!
编程语言数据类型
统计学数据类型
Numeric
Data that are expressed on a numeric scale.- Continuous
Data that can take on any value in an interval. (Synonyms: interval, float, numeric) - Discrete
Data that can take on only integer values, such as counts. (Synonyms: integer, count)
- Continuous
Categorical
Data that can take on only a specific set of values representing a set of possible categories. (Synonyms: enums, enumerated, factors, nominal)- Binary
A special case of categorical data with just two categories of values, e.g., 0/1, true/false. (Synonyms: dichotomous, logical, indicator, boolean) - Ordinal
Categorical data that has an explicit ordering. (Synonym: ordered factor)
- Binary
Rectangular Data
The typical frame of reference for an analysis in data science is a rectangular data object, like a spreadsheet or database table.
xml,json,csv…..
Data frame
Rectangular data (like a spreadsheet) is the basic data structure for statistical and machine learning models.
Feature
A column within a table is commonly referred to as a feature.
attribute, input, predictor, (independent) variable, regressors, covariates
Outcome measurement Y
Many data science projects involve predicting an outcome Y
dependent variable, response, target, output
- In the regression problem, Y is
quantitative
(e.g price, blood pressure). - In the
classification problem
, Y takes values in a finite, unordered set (survived/died, digit 0-9, cancer class of tissue sample).
Records
A row within a table is commonly referred to as a record.
case, example, instance, observation, pattern, sample