1.总体预览<class ‘pandas.core.frame.DataFrame’>RangeIndex: 891 entries, 0 to 890Data columns (total 12 columns):PassengerId 891 non-null int64Survived 891 non-null int64Pclass 891 non-null int64Name 891 non-null objectSex 891 non-null objectAge 714 non-null float64SibSp 891 non-null int64Parch 891 non-null int64Ticket 891 non-null objectFare 891 non-null float64Cabin 204 non-null objectEmbarked 889 non-null objectdtypes: float64(2), int64(5), object(5)memory usage: 83.6+ KB训练集有891条数据,其中Age,Cabin,Embarked三个特征是有缺失值的,且Cabin的缺失值非常多。2.数据可视化仅按照性别划分,显然女性的生还数量明显高于男性pclass是船舱等级,等级越高生还概率越大SibSp代表兄弟姐妹/配偶的数量,只有一个兄弟姐妹/配偶的乘客生还人数最多Parch代表直系亲属即父母和子女的数量Embarked代表登船港口,C港口的生存概率最高