乐趣区

关于程序员:Pandas-重置索引深度总结

明天咱们来探讨 Pandas 中的 reset_index() 办法,包含为什么咱们须要在 Pandas 中重置 DataFrame 的索引,以及咱们应该如何利用该办法

在本文咱们将应用 Kaggle 上的数据集样本 Animal Shelter Analytics 来作为咱们的测试数据

Pandas 中的 Reset_Index() 是什么?

如果咱们应用 Pandas 的 read_csv() 办法读取 csv 文件而不指定任何索引,则生成的 DataFrame 将具备默认的基于整数的索引,第一行从 0 开始,随后每行减少 1:

import pandas as pd
import numpy as np

df = pd.read_csv('Austin_Animal_Center_Intakes.csv').head()
df

Output:

    Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
0    A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
1    A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
2    A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
3    A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
4    A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

在某些状况下,咱们可能心愿领有更有意义的行标签,因而咱们将抉择 DataFrame 的其中一列作为 DataFrame 索引。咱们能够应用 read_csv() 办法的 index_col 参数间接执行此操作:

df = pd.read_csv('Austin_Animal_Center_Intakes.csv', index_col='Animal ID').head()
df

Output:

Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

或者咱们还能够应用 set_index() 办法将 DataFrame 的任何列设置为 DataFrame 索引:

df = pd.read_csv('Austin_Animal_Center_Intakes.csv').head()
df.set_index('Animal ID', inplace=True)
df

Output:

Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

如果在某个时候咱们须要复原默认的数字索引呢,这时就能够应用 reset_index() 函数了

df.reset_index()

Output:

    Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
0    A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
1    A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
2    A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
3    A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
4    A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

此办法的默认行为包含用默认的基于整数的索引替换现有的 DataFrame 索引,并将旧索引转换为与旧索引同名的新列(或名称索引)。此外,默认状况下,reset_index() 办法会从 MultiIndex 中删除所有级别并且不会影响原始 DataFrame 数据,而是创立一个新的

何时应用 Reset_Index() 办法

reset_index() 办法将 DataFrame 索引重置为默认数字索引,在以下状况下特地有用:

  • 执行数据整顿时——尤其是过滤数据或删除缺失值等预处理操作,会导致较小的 DataFrame 具备不再间断的数字索引
  • 当索引应该被视为一个常见的 DataFrame 列时
  • 当索引标签没有提供无关数据的任何有价值的信息时

如何调整 Reset_Index() 办法

后面的探讨中,咱们看到了当咱们不向它传递任何参数时,reset_index() 办法是如何工作的,当然如果有须要,咱们能够通过调整办法的各种参数来更改此默认行为。让咱们看看最有用的三种参数:level、drop 和 inplace

level

此参数采纳整数、字符串、元组或列表作为可能的数据类型,并且仅实用于具备 MultiIndex 的 DataFrame,如下所示:

df_multiindex = pd.read_csv('Austin_Animal_Center_Intakes.csv', index_col=['Animal ID', 'Name']).head()
df_multiindex

Output:

Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

事实上,如果当初咱们查看下面 DataFrame 的索引,咱们会发现它不是一个常见的 DataFrame 索引,而是一个 MultiIndex 对象:

df_multiindex.index

Output:

MultiIndex([('A786884',  '*Brock'),
            ('A706918',   'Belle'),
            ('A724273', 'Runster'),
            ('A665644',       nan),
            ('A682524',     'Rio')],
           names=['Animal ID', 'Name'])

默认状况下,reset_index() 办法参数 level (level=None) 会移除 MultiIndex 的所有级别:

df_multiindex.reset_index()

Output:

    Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
0    A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
1    A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
2    A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
3    A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
4    A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

咱们看到 DataFrame 的两个索引都被转换为通用 DataFrame 列,而索引被重置为默认的基于整数的索引

相同,如果咱们显式传递 level 的值,则此参数会从 DataFrame 索引中删除选定的级别,并将它们作为常见的 DataFrame 列返回(除非咱们抉择应用 drop 参数从 DataFrame 中齐全删除此信息)。比拟以下操作:

df_multiindex.reset_index(level='Animal ID')

Output:

Name    Animal ID    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
*Brock    A786884    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
Belle    A706918    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
Runster    A724273    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
NaN    A665644    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
Rio    A682524    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

最开始 Animal ID 是 DataFrame 的索引之一,当咱们设置 level 参数后,将其从索引中删除并作为称为 Animal ID 的公共列插入到 DataFrame 中

df_multiindex.reset_index(level='Name')

Output:

Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

在这里,Name 最后是 DataFrame 的索引之一,设置完 level 参数后,就变成了一个罕用的列,叫做 Name

drop

此参数决定在索引重置后是否将旧索引保留为通用 DataFrame 列,或者将其从 DataFrame 中齐全删除。默认状况下 (drop=False) 是进行保留的,正如咱们在后面的所有示例中看到的那样。否则,如果咱们不想将旧索引保留为列,咱们能够在索引重置后将其从 DataFrame 中齐全删除(drop=True):

df

Output:

Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

减少 drop 参数

df.reset_index(drop=True)

Output:

    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
0    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
1    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
2    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
3    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
4    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

在下面的 DataFrame 中,旧索引中蕴含的信息已齐全从 DataFrame 中删除了

drop 参数也实用于具备 MultiIndex 的 DataFrame,就像咱们之前创立的那样:

df_multiindex

Output:

Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

减少 drop 参数

df_multiindex.reset_index(drop=True)

Output:

    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
0    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
1    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
2    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
3    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
4    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

两个旧索引都已从 Dataframe 中齐全删除,并且索引已重置为默认值

当然,咱们能够联合 drop 和 level 参数,指定要从 DataFrame 中齐全删除哪些旧索引:

df_multiindex.reset_index(level='Animal ID', drop=True)

Output:

    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
Name                                        
*Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

旧索引 Animal ID 已从索引和 DataFrame 自身中删除,另一个索引 Name 被保留为 DataFrame 的以后索引

inplace

该参数决定是间接批改原来的 DataFrame 还是新建一个 DataFrame 对象。默认状况下,它会应用新索引 (inplace=False) 创立一个新的 DataFrame,并放弃原始 DataFrame 不变。让咱们应用默认参数再次运行 reset_index() 办法,而后将后果与原始 DataFrame 进行比拟:

df.reset_index()

Output:

    Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
0    A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
1    A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
2    A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
3    A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
4    A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray
df

Output:

Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

即便咱们将索引重置为运行第一段代码的默认数字,原始 DataFrame 依然放弃不变。如果咱们须要将原始 DataFrame 重新分配给对其利用 reset_index() 办法的后果,咱们能够间接重新分配它(df = df.reset_index())或将参数 inplace=True 传递给该办法:

df.reset_index(inplace=True)
df

Output:

    Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
0    A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
1    A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
2    A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
3    A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
4    A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

咱们看到当初更改已间接利用于原始 DataFrame 之上了

利用实例:删除缺失值后重置索引

让咱们将到目前为止探讨的所有内容付诸实践,看看当咱们从 DataFrame 中删除缺失值时,重置 DataFrame 索引是如何有用的

首先,让咱们复原咱们最开始时创立的第一个 DataFrame,它具备默认数字索引:

df = pd.read_csv('Austin_Animal_Center_Intakes.csv').head()
df

Output:

    Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
0    A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
1    A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
2    A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
3    A665644    NaN    10/21/2013 07:59:00 AM    10/21/2013 07:59:00 AM    Austin (TX)    Stray    Sick    Cat    Intact Female    4 weeks    Domestic Shorthair Mix    Calico
4    A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

咱们看到 DataFrame 中有一个缺失值,让咱们应用 dropna() 办法删除具备缺失值的整行

df.dropna(inplace=True)
df

Output:

    Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
0    A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
1    A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
2    A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
4    A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

该行已从 DataFrame 中删除,然而索引不再是间断的:0、1、2、4。让咱们从新设置它:

df.reset_index()

Output:

    index    Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
0    0    A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
1    1    A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
2    2    A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
3    4    A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

当初索引是间断的了,然而因为咱们没有显式传递 drop 参数,旧索引被转换为列,具备默认名称 index,上面让咱们从 DataFrame 中齐全删除旧索引:

df.reset_index(drop=True)

Output:

    Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
0    A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
1    A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
2    A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
3    A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

当初咱们彻底解脱了无意义的旧索引,以后索引是间断的。最初一步是应用 inplace 参数将这些批改保留到咱们的原始 DataFrame 中:

df.reset_index(drop=True, inplace=True)
df

Output:

    Animal ID    Name    DateTime    MonthYear    Found Location    Intake Type    Intake Condition    Animal Type    Sex upon Intake    Age upon Intake    Breed    Color
0    A786884    *Brock    01/03/2019 04:19:00 PM    01/03/2019 04:19:00 PM    2501 Magin Meadow Dr in Austin (TX)    Stray    Normal    Dog    Neutered Male    2 years    Beagle Mix    Tricolor
1    A706918    Belle    07/05/2015 12:59:00 PM    07/05/2015 12:59:00 PM    9409 Bluegrass Dr in Austin (TX)    Stray    Normal    Dog    Spayed Female    8 years    English Springer Spaniel    White/Liver
2    A724273    Runster    04/14/2016 06:43:00 PM    04/14/2016 06:43:00 PM    2818 Palomino Trail in Austin (TX)    Stray    Normal    Dog    Intact Male    11 months    Basenji Mix    Sable/White
3    A682524    Rio    06/29/2014 10:38:00 AM    06/29/2014 10:38:00 AM    800 Grove Blvd in Austin (TX)    Stray    Normal    Dog    Neutered Male    4 years    Doberman Pinsch/Australian Cattle Dog    Tan/Gray

总结

明天咱们从多个方面探讨了 reset_index() 办法

  • reset_index() 办法的默认行为
  • 如何复原 DataFrame 的默认数字索引
  • 何时应用 reset_index() 办法
  • 该办法最重要的几个参数
  • 如何应用 MultiIndex
  • 如何从 DataFrame 中齐全删除旧索引
  • 如何将批改间接保留到原始 DataFrame 中

最好咱们又残缺的实现了一个在删除缺失值后重置 DataFrame 索引的实战案例

好了,这就是明天分享的全部内容,喜爱就点个赞吧

本文由 mdnice 多平台公布

退出移动版