[翻译]Python中yield的解释

翻译

来源于stackoverflow问答，原文链接 Here

SN上面看到的，顺手翻译下，第一次翻译，好多地方翻的不是很好 🙂

问题:

Python中yield关键字的作用是什么？它做了什么？

例如，我想理解以下代码

def node._get_child_candidates(self, distance, min_dist, max_dist):    if self._leftchild and distance - max_dist < self._median:        yield self._leftchild    if self._rightchild and distance + max_dist >= self._median:        yield self._rightchild

下面是调用者

result, candidates = list(), [self]while candidates:    node = candidates.pop()    distance = node._get_dist(obj)    if distance <= max_dist and distance >= min_dist:        result.extend(node._values)    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))return result

在_get_child_candidates这个函数被调用时发生了什么？返回了一个列表？还是只返回了一个元素？然后又再次被调用？什么时候调用结束？

这段代码的来源 Jochen Schulz (jrschulz), who made a great Python library for metric spaces. 完整源码链接: here

要了解yield的作用，你必须先明白什么是生成器，在此之前，你需要了解什么是可迭代对象（可迭代序列）

迭代

你可以创建一个列表，然后逐一遍历，这就是迭代

>>> mylist = [1, 2, 3]>>> for i in mylist:...    print(i)

mylist是可迭代的对象，当你使用列表解析时，你创建一个列表,即一个可迭代对象

>>> mylist = [x*x for x in range(3)]>>> for i in mylist:...    print(i)

任何你可用 “for… in…” 处理的都是可迭代对象：列表，字符串，文件….这些迭代对象非常便捷，因为你可以尽可能多地获取你想要的东西

但，当你有大量数据并把所有值放到内存时，这种处理方式可能不总是你想要的(but you store all the values in memory and it’s not always what you want when you have a lot of values.)

生成器

生成器是迭代器，但你只能遍历它一次(iterate over them once)因为生成器并没有将所有值放入内存中，而是实时地生成这些值

>>> mygenerator = (x*x for x in range(3))>>> for i in mygenerator:...    print(i)

这和使用列表解析地唯一区别在于使用()替代了原来的[]

注意，你不能执行for i in mygenerator第二次，因为每个生成器只能被使用一次: 计算0，并不保留结果和状态，接着计算1，然后计算4，逐一生成

yield

yield是一个关键词，类似return, 不同之处在于，yield返回的是一个生成器

>>> def createGenerator():...    mylist = range(3)...    for i in mylist:...        yield i*i...>>> mygenerator = createGenerator() # create a generator>>> print(mygenerator) # mygenerator is an object!<generator object createGenerator at 0xb7555c34>>>> for i in mygenerator:...     print(i)

这个例子并没有什么实际作用,仅说明当你知道你的函数将产生大量仅被读取一次的数据时,使用生成器将是十分有效的做法

要掌握yield,你必须明白 – 当你调用这个函数，函数中你书写的代码并没有执行。这个函数仅仅返回一个生成器对象

这有些狡猾 🙂

然后，在每次for循环使用生成器时,都会执行你的代码

然后，是比较困难的部分：

第一次函数将会从头运行，直到遇到yield，然后将返回循环的首个值. 然后，每次调用，都会执行函数中的循环一次，返回下一个值，直到没有值可以返回

当循环结束，或者不满足”if/else”条件，导致函数运行但不命中yield关键字，此时生成器被认为是空的

问题代码的解释

生成器:

# 这你你创建了node的能返回生成器的函数def node._get_child_candidates(self, distance, min_dist, max_dist):# 这里的代码你每次使用生成器对象都会调用# 如果node节点存在左子节点,且距离没问题，返回该节点if self._leftchild and distance - max_dist < self._median:                yield self._leftchild# 同理，返回右子节点if self._rightchild and distance + max_dist >= self._median:                yield self._rightchild# 如果函数运行到这里，生成器空，该节点不存在左右节点

调用者:

# 创建一个空列表，一个包含当前候选对象引用的列表result, candidates = list(), [self]# 当前候选非空，循环(开始时仅有一个元素)while candidates:    # 从候选列表取出最后一个元素作为当前节点    node = candidates.pop()    # 获取obj和当前节点距离    distance = node._get_dist(obj)    # 如果距离满足条件，将节点值加入结果列表    if distance <= max_dist and distance >= min_dist:        result.extend(node._values)    # 获取节点的子节点，加入到候选列表，回到循环开始, 这里使用了生成器    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))    # 注意这里extend会反复调用获取到所有生成器返回值return result

这段代码包含几个灵活的部分:

1.这个循环遍读取历候选列表,但过程中，候选列表不断扩展:-)

这是一种遍历嵌套数据的简明方法，虽然有些危险，你或许会陷入死循环中

在这个例子中， candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) 读取了生成器产生的所有值, 同时while循环产生新的生成器对象加入到列表，因为每个对象作用在不同节点上，所以每个生成器都将生成不同的值

2.列表方法extend() 接收一个生成器，生成器的所有值被添加到列表中

通常，我们传一个列表作为参数:

>>> a = [1, 2]>>> b = [3, 4]>>> a.extend(b)>>> print(a)[1, 2, 3, 4]

但是，在代码中，这个函数接受到一个生成器

这样的做法好处是:

1.你不需要重复读这些值

2.你可能有海量的子节点，但是不希望将所有节点放入内存

并且，可以这么传递生成器作为参数的原因是，Python不关心参数是一个方法还是一个列表

Python接收可迭代对象，对于字符串，列表，元组还有生成器，都适用!

这就是所谓的“鸭子类型”(duck typing), 这也是Python如此酷的原因之一, 但这是另一个问题了，对于这个问题……

你可以在这里完成阅读，或者读一点点生成器的进阶用法:

控制一个生成器的消耗

>>> class Bank(): # let's create a bank, building ATMs...    crisis = False...    def create_atm(self):...        while not self.crisis:...            yield "$100">>> hsbc = Bank() # when everything's ok the ATM gives you as much as you want>>> corner_street_atm = hsbc.create_atm()>>> print(corner_street_atm.next())$100>>> print(corner_street_atm.next())$100>>> print([corner_street_atm.next() for cash in range(5)])['$100', '$100', '$100', '$100', '$100']>>> hsbc.crisis = True # crisis is coming, no more money!>>> print(corner_street_atm.next())<type 'exceptions.StopIteration'>>>> wall_street_atm = hsbc.create_atm() # it's even true for new ATMs>>> print(wall_street_atm.next())<type 'exceptions.StopIteration'>>>> hsbc.crisis = False # trouble is, even post-crisis the ATM remains empty>>> print(corner_street_atm.next())<type 'exceptions.StopIteration'>>>> brand_new_atm = hsbc.create_atm() # build a new one to get back in business>>> for cash in brand_new_atm:...    print cash$100$100$100$100$100$100$100$100$100...

这在很多场景都非常有用，例如控制资源的获取

Itertools

一个很好的工具

itertools模块包含很多处理可迭代对象的具体方法. 例如

复制一个生成器？连接两个生成器？一行将嵌套列表中值分组？不使用另一个列表进行Map/Zip?(Ever wish to duplicate a generator? Chain two generators? Group values in a nested list with a one liner? Map / Zip without creating another list?)

只需要使用itertools模块

一个例子，4匹马赛跑的可能抵达顺序

>>> horses = [1, 2, 3, 4]>>> races = itertools.permutations(horses)>>> print(races)<itertools.permutations object at 0xb754f1dc>>>> print(list(itertools.permutations(horses)))[(1, 2, 3, 4),(1, 2, 4, 3),(1, 3, 2, 4),(1, 3, 4, 2),(1, 4, 2, 3),(1, 4, 3, 2),(2, 1, 3, 4),(2, 1, 4, 3),(2, 3, 1, 4),(2, 3, 4, 1),(2, 4, 1, 3),(2, 4, 3, 1),(3, 1, 2, 4),(3, 1, 4, 2),(3, 2, 1, 4),(3, 2, 4, 1),(3, 4, 1, 2),(3, 4, 2, 1),(4, 1, 2, 3),(4, 1, 3, 2),(4, 2, 1, 3),(4, 2, 3, 1),(4, 3, 1, 2),(4, 3, 2, 1)]

了解迭代器的内部机制

迭代过程包含可迭代对象(实现__iter__()方法) 和迭代器(实现__next__()方法)

你可以获取一个迭代器的任何对象都是可迭代对象，迭代器可以让你迭代遍历一个可迭代对象(Iterators are objects that let you iterate on iterables.) [好拗口:]

更多关于这个问题的 how does the for loop work

如果你喜欢这个回答，你也许会喜欢我关于 decorators 和 metaclasses 的解释

原文地址：[翻译]Python中yield的解释, 感谢原作者分享。愈想得到，就愈要放手。放手是很难的，但是别无选择。

相关文章：

你感兴趣的文章：

标签云：