关于机器学习:Machine-Learning-with-Python

43次阅读

共计 2127 个字符,预计需要花费 6 分钟才能阅读完成。

Machine Learning with Python (2021 Fall semester)
Programming Assignment: Classification of Titanic Data Set

  1. Benchmark Dataset: This is the problem of predicting survivals based on the information of the
    people on board the Titanic. You should evaluate the performance of each model using the machine
    learning models presented in the assignment. You can download the dataset from the following
    website: https://www.kaggle.com/c/titanic
    In this assignment, both model training and testing use the train.csv file.
    When performing the task, be careful NOT to use the following features for model training:
    PassengerId, Name, Ticket, Cabin
  2. Preprocessing
  3. There are data with missing values in the train data. Remove these data.
  4. Use the sample of train.csv 7 to 3 as training data and test data.
  5. Machine Learning Models: Use scikit-learn to implement the following three machine learning
    models and evaluate their performance.
    3-1 K-Nearest Neighbors(KNN) (sklearn.neighbors.KNeighborsClassifier): Analyze how the results
    change in the test data while changing the number of K to [3-5].
    3-2 Logistic Regression (sklearn.linear_model.LogisticRegression): Analyze how the results change in
    the test data while changing the number of iterations (max_iter) by 20 in the range of [0-100]. After
    fixing the number of iterations to 100, change the regularization term (C in scikit-learn) by 1 in the
    range of [1 to 5] and analyze how the results change in the test data.
    3-3 Decision Tree (sklearn.tree.DecisionTreeClassifier): Analyze the separation criteria of the first and
    second depths in the decision tree with information gain. Also, when max_depth=None, use an
    appropriate tool to visualize the tree to know the condition and gain values at each depth. Analyze
    how the results change in the test data when max_depth is changed to [1~3, None].
  6. Evaluation Methods: Show the performance according to each model through Accuracy and F1-
    Score.
  7. Submission Form: There are 3 files to be submitted. You can submit the csv file, report, and python
    file in a zip file. The file name must follow the student number_name.zip format (eg,
    2020714950_Hong_Gil-dong.zip). When the python file is executed while the csv file and the python
    file are in the same directory, it should be clearly expressed how the results are from each machine
    learning model. This is to check whether the performance in the report is similar to the performance
    in actual execution. If you wrote it as an ipynb file, you can submit it instead of a python file.
正文完
 0