目录 | 上一节 (3.1 脚本) | [下一节 (3.3 谬误查看)]()

3.2 深刻函数

只管函数在新近时候介绍了，但无关函数在更深层次上是如何工作的细节却很少提供。本节旨在填补这些空白，并探讨函数调用约定，作用域规定等问题。

调用函数

思考以下函数：

def read_prices(filename, debug):    ...

能够应用地位参数调用该函数：

prices = read_prices('prices.csv', True)

或者，能够应用关键字参数调用该函数：

prices = read_prices(filename='prices.csv', debug=True)

默认参数

有时候，你心愿参数是可选的，如果是这样，请在函数定义中调配一个默认值。

def read_prices(filename, debug=False):    ...

如果调配了默认值，则参数在函数调用中是可选的。

d = read_prices('prices.csv')e = read_prices('prices.dat', True)

留神：带有默认值的参数（译注：即关键字参数）必须呈现在参数列表的开端（所有非可选参数都放在最后面）

首选关键字参数作为可选参数

比拟以下两种不同的调用格调：

parse_data(data, False, True) # ?????parse_data(data, ignore_errors=True)parse_data(data, debug=True)parse_data(data, debug=True, ignore_errors=True)

在大部分状况下，关键字参数进步了代码的简洁性——特地是对于用作标记的参数，或者与可选个性相干的参数。

设计最佳实际

始终为函数参数指定简短但有意义的名称。

应用函数的人可能想要应用关键字调用格调。

d = read_prices('prices.csv', debug=True)

Python 开发工具将会在帮忙性能或者帮忙文档中显示这些名称。

返回值

return 语句返回一个值：

def square(x):    return x * x

如果没有给出返回值或者 return 语句缺失，那么返回 None：

def bar(x):    statements    returna = bar(4)      # a = None# ORdef foo(x):    statements  # No `return`b = foo(4)      # b = None

多个返回值

函数只能返回一个值。然而，通过将返回值放到元组中，函数能够返回多个值：

def divide(a,b):    q = a // b      # Quotient    r = a % b       # Remainder    return q, r     # Return a tuple

用例：

x, y = divide(37,5) # x = 7, y = 2x = divide(37, 5)   # x = (7, 2)

变量作用域

程序给变量赋值：

x = value # Global variabledef foo():    y = value # Local variable

变量赋值产生在函数的外部和内部。定义在函数内部的变量是“全局的”。定义在函数外部的变量是“部分的”。

局部变量

在函数外部赋值的变量是公有的。

def read_portfolio(filename):    portfolio = []    for line in open(filename):        fields = line.split(',')        s = (fields[0], int(fields[1]), float(fields[2]))        portfolio.append(s)    return portfolio

在此示例中，filename, portfolio, line, fields 和 s 是局部变量。在函数调用之后，这些变量将不会保留或者不可拜访。

>>> stocks = read_portfolio('portfolio.csv')>>> fieldsTraceback (most recent call last):File "<stdin>", line 1, in ?NameError: name 'fields' is not defined>>>

局部变量也不能与其它中央的变量抵触。

全局变量

函数能够自在地拜访定义在同一文件中的全局变量值。

name = 'Dave'def greeting():    print('Hello', name)  # Using `name` global variable

然而，函数不能批改全局变量：

name = 'Dave'def spam():  name = 'Guido'spam()print(name) # prints 'Dave'

切记：函数中的所有赋值都是部分的

批改全局变量

如果必须批改全局变量，请像上面这样申明它：

name = 'Dave'def spam():    global name    name = 'Guido' # Changes the global name above

全局申明必须在应用之前呈现，并且相应的变量必须与该函数处在同一文件中。看下面这个函数，要晓得这是一种蹩脚的模式。事实上，如果能够的话，尽量避免应用 global 。如果须要一个函数来批改函数内部的某种状态，最好是应用类来代替（稍后具体介绍）。

参数传递

当调用一个函数的时候，参数变量的传递是援用传递。不拷贝值（参见2.7 节）。如果传递了可变数据类型（如列表，字典），它们能够被原地批改。

def foo(items):    items.append(42)    # Modifies the input objecta = [1, 2, 3]foo(a)print(a)                # [1, 2, 3, 42]

关键点：函数不接管输出参数的拷贝。

从新赋值与批改

确保理解批改值与给变量名从新赋值的细微差别。

def foo(items):    items.append(42)    # Modifies the input objecta = [1, 2, 3]foo(a)print(a)                # [1, 2, 3, 42]# VSdef bar(items):    items = [4,5,6]    # Changes local `items` variable to point to a different objectb = [1, 2, 3]bar(b)print(b)                # [1, 2, 3]

揭示：变量赋值永远不会重写内存。名称只是被绑定到了新的值下面

练习

本组练习实现的内容可能是本课程最弱小的和最难的。有很多步骤，并且过来练习中的许多概念被一次性整合在一起。尽管最初的题解只有大概 25 行的代码，但要花点工夫，确保你了解每一个局部。

report.py 的核心局部次要用于读取 CSV 文件。例如，read_portfolio() 函数读取蕴含投资组合数据的文件，read_prices() 函数读取蕴含价格数据的文件。在这两个函数中，有很多底层的“精密的”事以及类似的个性。例如，它们都关上一个文件并应用 csv 模块来解决，并且将各种字段转换为新的类型。

如果真的须要对大量的文件进行解析，可能须要清理其中的一些内容使其更通用。这是咱们的指标。

通过关上 Work/fileparse.py 文件开始本练习，该文件是咱们将要写代码的中央。

练习 3.3：读取 CSV 文件

首先，让咱们仅关注将 CSV 文件读入字典列表的问题。在 fileparse.py 中，定义一个如下所示的函数：

# fileparse.pyimport csvdef parse_csv(filename):    '''    Parse a CSV file into a list of records    '''    with open(filename) as f:        rows = csv.reader(f)        # Read the file headers        headers = next(rows)        records = []        for row in rows:            if not row:    # Skip rows with no data                continue            record = dict(zip(headers, row))            records.append(record)    return records

该函数将 CSV 文件读入字典列表中，然而暗藏了关上文件，应用 csv 模块解决，疏忽空行等详细信息。

试试看：

提醒： python3 -i fileparse.py.

>>> portfolio = parse_csv('Data/portfolio.csv')>>> portfolio[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]>>>

这很好，除了不能应用数据做任何有用的计算之外。因为所有的内容都是用字符串示意。咱们将马上解决此问题，先让咱们持续在此基础上进行构建。

练习 3.4：构建列选择器

在大部分状况下，你只对 CSV 文件中选定的列感兴趣，而不是所有数据。批改 parse_csv() 函数，以便让用户指定任意的列，如下所示：

>>> # Read all of the data>>> portfolio = parse_csv('Data/portfolio.csv')>>> portfolio[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]>>> # Read only some of the data>>> shares_held = parse_csv('Data/portfolio.csv', select=['name','shares'])>>> shares_held[{'name': 'AA', 'shares': '100'}, {'name': 'IBM', 'shares': '50'}, {'name': 'CAT', 'shares': '150'}, {'name': 'MSFT', 'shares': '200'}, {'name': 'GE', 'shares': '95'}, {'name': 'MSFT', 'shares': '50'}, {'name': 'IBM', 'shares': '100'}]>>>

练习 2.23 中给出了列选择器的示例。

然而，这里有一个办法能够做到这一点：

# fileparse.pyimport csvdef parse_csv(filename, select=None):    '''    Parse a CSV file into a list of records    '''    with open(filename) as f:        rows = csv.reader(f)        # Read the file headers        headers = next(rows)        # If a column selector was given, find indices of the specified columns.        # Also narrow the set of headers used for resulting dictionaries        if select:            indices = [headers.index(colname) for colname in select]            headers = select        else:            indices = []        records = []        for row in rows:            if not row:    # Skip rows with no data                continue            # Filter the row if specific columns were selected            if indices:                row = [ row[index] for index in indices ]            # Make a dictionary            record = dict(zip(headers, row))            records.append(record)    return records

这部分有一些辣手的问题，最重要的一个可能是列抉择到行索引的映射。例如，假如输出文件具备以下题目：

>>> headers = ['name', 'date', 'time', 'shares', 'price']>>>

当初，假如选定的列如下：

>>> select = ['name', 'shares']>>>

为了执行正确的抉择，必须将抉择的列名映射到文件中的列索引。这就是该步骤正在执行的操作：

>>> indices = [headers.index(colname) for colname in select ]>>> indices[0, 3]>>>

换句话说，名称（"name" ）是第 0 列，股份数目（"shares" ）是第 3 列。

当从文件读取数据行的时候，应用索引对其进行过滤：

>>> row = ['AA', '6/11/2007', '9:50am', '100', '32.20' ]>>> row = [ row[index] for index in indices ]>>> row['AA', '100']>>>

练习 3.5：执行类型转换

批改 parse_csv() 函数，以便能够抉择将类型转换利用到返回数据上。例如：

>>> portfolio = parse_csv('Data/portfolio.csv', types=[str, int, float])>>> portfolio[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]>>> shares_held = parse_csv('Data/portfolio.csv', select=['name', 'shares'], types=[str, int])>>> shares_held[{'name': 'AA', 'shares': 100}, {'name': 'IBM', 'shares': 50}, {'name': 'CAT', 'shares': 150}, {'name': 'MSFT', 'shares': 200}, {'name': 'GE', 'shares': 95}, {'name': 'MSFT', 'shares': 50}, {'name': 'IBM', 'shares': 100}]>>>

在练习 2.24 中曾经对此进行了摸索。须要将下列代码片段插入到题解中：

...if types:    row = [func(val) for func, val in zip(types, row) ]...

练习 3.6：解决无标题的数据

某些 CSV 文件不蕴含任何的题目信息。例如，prices.csv 文件看起来像上面这样：

"AA",9.22"AXP",24.85"BA",44.85"BAC",11.27...

批改 parse_csv() 文件以便通过创立元组列表来解决此类文件。例如：

>>> prices = parse_csv('Data/prices.csv', types=[str,float], has_headers=False)>>> prices[('AA', 9.22), ('AXP', 24.85), ('BA', 44.85), ('BAC', 11.27), ('C', 3.72), ('CAT', 35.46), ('CVX', 66.67), ('DD', 28.47), ('DIS', 24.22), ('GE', 13.48), ('GM', 0.75), ('HD', 23.16), ('HPQ', 34.35), ('IBM', 106.28), ('INTC', 15.72), ('JNJ', 55.16), ('JPM', 36.9), ('KFT', 26.11), ('KO', 49.16), ('MCD', 58.99), ('MMM', 57.1), ('MRK', 27.58), ('MSFT', 20.89), ('PFE', 15.19), ('PG', 51.94), ('T', 24.79), ('UTX', 52.61), ('VZ', 29.26), ('WMT', 49.74), ('XOM', 69.35)]>>>

要执行此更改，须要批改代码以便数据的第一行不被解释为题目行。另外，须要确保不创立字典，因为不再有可用于列名的键。

练习 3.7：抉择其它的列分隔符

只管 CSV 文件十分广泛，但还可能会遇到应用其它列分隔符（如制表符（tab）或空格符（space））的文件。例如，如下所示的 Data/portfolio.dat 文件：

name shares price"AA" 100 32.20"IBM" 50 91.10"CAT" 150 83.44"MSFT" 200 51.23"GE" 95 40.37"MSFT" 50 65.10"IBM" 100 70.44

csv.reader() 函数容许像上面这样指定不同的分隔符：

rows = csv.reader(f, delimiter=' ')

批改 parse_csv() 函数以便也容许批改分隔符。

例如：

>>> portfolio = parse_csv('Data/portfolio.dat', types=[str, int, float], delimiter=' ')>>> portfolio[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]>>>

阐明

到目前为止，如果你曾经实现，那么你创立了一个十分有用的库函数。你能够应用它去解析任意的 CSV 文件，抉择感兴趣的列，执行类型转换，而不必对文件或者 csv 模块的外部工作有太多的放心。

目录 | 上一节 (3.1 脚本) | [下一节 (3.3 谬误查看)]()

注：残缺翻译见 https://github.com/codists/practical-python-zh