关于机器学习:医学生数据科学之路Ⅰ基础篇

The data science roadmap for medical students
开学后智能医学社曾经进行了几次练习，但学生的实现状况并不现实，明天我筹备为医学生做一下课程的体系构建，这样能让想学习的同学提前学习，落后的同学，晓得本人在哪个阶段，哪个步骤了。
医学生并不是业余的coder或者engineer，所以相似的Roadmap并不太实用。

医学生想要步入数据迷信，以下几点必须留神：

弱化数学及统计学
短期速成，容易把握
启发思维为主，主张代码复用
合乎医学思维和医药数据需要
实际为王，解决医学迷信问题

先上图，数据迷信金字塔，如果你按我的常识体系，能以最快的速度爬到塔尖

咱们采纳的程序语言是python和R，至于为什么，基本无需解释了。我更喜爱R，双条腿走路，不吃亏。整个学习体系，大部分以现有的优良教程为主，切实不行，我就本人做教程，一步一步带你达到巅峰。最初正告大家一句：高处不胜寒！

Week 1 / Git、R、python 根底，小热身

本周，通过一些小练习和一些基础教程，来学习一些简略的编程语法和版本控制（git）。

Python 新手入门课,该课程蕴含了git的局部内容
git/github采纳token进行认证拜访，解决局部git中的问题
20节R视频课程，零根底学习R！ - 专栏课程 - 医咖会 (mediecogroup.com)，经典入门课，只看到2章即可

Week 2 / 环境搭建，R和python如影随行

数据迷信的王者是Anaconda，但我感觉它仿佛太臃肿了，我更偏向于本人搭建。

必备工具
Python 数据迷信环境搭建（jupyter-lab）
R环境搭建R&Rstudio装置与配置
数据迷信本地环境搭建Python&R（jupyter lab | Rstudio）视频

所有工具,都倡议采纳英文界面,所有工具的装置目录都不得呈现中文

Week 3 / 扫盲科普

数据迷信必须理解的概念：机器学习，深度学习，神经网络，机器学习经典模型（算法）
每个视频都不长。算是入门科普。

是什么系列

https://www.bilibili.com/vide...
https://www.bilibili.com/vide...
https://www.bilibili.com/vide...
https://www.bilibili.com/vide...
https://www.bilibili.com/vide...

经典算法5分钟

https://space.bilibili.com/10...

Data！Data！Data！

编程语言数据类型

统计学数据类型

Numeric
Data that are expressed on a numeric scale.
- Continuous
  Data that can take on any value in an interval. (Synonyms: interval, float, numeric)
- Discrete
  Data that can take on only integer values, such as counts. (Synonyms: integer, count)
Categorical
Data that can take on only a specific set of values representing a set of possible categories. (Synonyms: enums, enumerated, factors, nominal)
- Binary
  A special case of categorical data with just two categories of values, e.g., 0/1, true/false. (Synonyms: dichotomous, logical, indicator, boolean)
- Ordinal
  Categorical data that has an explicit ordering. (Synonym: ordered factor)

Rectangular Data

The typical frame of reference for an analysis in data science is a rectangular data object, like a spreadsheet or database table.

xml,json,csv.....

Data frame

Rectangular data (like a spreadsheet) is the basic data structure for statistical and machine learning models.

Feature

A column within a table is commonly referred to as a feature.

attribute, input, predictor, (independent) variable, regressors, covariates

Outcome measurement Y

Many data science projects involve predicting an outcome Y

dependent variable, response, target, output

In the regression problem, Y is quantitative (e.g price, blood pressure).
In the classification problem, Y takes values in a finite, unordered set (survived/died, digit 0-9, cancer class of tissue sample).

Records

A row within a table is commonly referred to as a record.

case, example, instance, observation, pattern, sample