Iteratory and simple generators Python

In Python 2.2 the new design with the keyword has appeared. This design - the generator, and a keyword - yield. Though generators allow to realize new, powerful and original ideas, nevertheless not so simply to understand how they work. This clause{article} - attempt of a unostentatious explanation of this design, it is equal as the concept connected to her iteratorov.



What is Python?


Python - freely accessible, interpretive programming language of a high level developed by Guido van Rossumom (Guido van Rossum). He unites clear syntax with powerful (but it is unessential) object-oriented semantics. Python can be established on any platform and provides perfect compatibility at transition from one platform on another.



Introduction


Welcome in the world of exotic management of dataflows. In Python 2.2 developers receive opportunities which were inaccessible earlier, or, at least, not so are digestible, in earlier versions of language.


And though that suggests Python 2.2, not so magnificently as, for example, full continuations and the microstrings submitted in Stackless Python, generators and iteratory really can something, that allocates them among traditional functions and classes.


Let's consider all over again iteratory as they are easier for understanding. First of all, iterator is an object, which has a method .next (). It is not absolutely correct, but close enough. Actually, the most part of contexts demands object, which will generate iterator, when to it{him} the new built - in function iter is applied (). That the class determined by the user (which the necessary method .next has ()) returned iterator, it is necessary to provide only return self with a method __ iter __ (). Examples will explain said below. The method .next () can call exception StopIteration if iteration has logically come to the end.


The generator little bit more difficultly also is more the general{common} concept. Generators are used basically for definition iteratorov; therefore not always it is necessary to take into account all subtleties of their application. The generator is function which remembers a point in a body of function from which there was last return. The second (or n-nyj) the call of the generator appears in the middle of function, and all local variables appear not changed from the moment of last call.


Somewhat generators are similar to short circuits (closure) about which there was a speech in previous clauses{articles} about functional programming. Similarly to short circuit, the generator "remembers" a status of the data. But with the generator it is possible to reach{achieve} the greater in the sense that he also "remembers" the position within the limits of structure of management of the dataflow (that in imperative programming something the greater, than it is simple values of the data). Continuations still more the general{common} designs as they allow to move any way between the staff of a stack, instead of always to come back in a context of causing function (as does{makes} the generator).


Fortunately, to work with the generator it is much easier, than to try to understand all conceptual aspects of process of execution{performance} and a status of the program. Actually, it is required very few{not enough;very poorly} efforts, and generators will be as be clear, as well as functions.



Casual wandering


With the purpose of the explanatory, let me to put rather simple problem{task} which can be solved in the various ways: both new, and old. We shall assume, we want to receive a stream of random numbers smaller units which submit to return restriction. Namely, we want, that each following number was, at least, on 0.4 more or less previous. Moreover, the stream is not infinite, and comes to an end after a random number of steps. For example, we shall interrupt it{him} as soon as the number smaller 0.1 will appear. The described restrictions are a little bit similar that it is possible to find in algorithm of " casual wandering ", and the condition of the termination{ending} reminds " a local minimum ", but, definitely, these requirements is softer, than at the decision of real problems{tasks}.


Python 2.1 or his{its} earlier versions offer some methods of the decision of this problem{task}. In this case we shall create and send simply the list of numbers in a stream. It can look as follows;



* RandomWalk_List.py *

import random

def randomwalk_list ():

* Initialization of potential elements

last, rand = 1, random.random ()

* The empty list

nums = []

* A condition of interruption

while rand> 0.1:

* Acceptance of number

if abs (last-rand)> = 0.4:

last = rand

* Addition of the last

* An element in nums

nums.append (rand)

else:

* Display of a deviation{rejection}

print '*',

* A new potential element

rand = random.random ()

* Addition of last small element

nums.append (rand)

return nums


To use this function very simply:



* Iterate over Random Walk List *

for num in randomwalk_list ():

print num,


However the given approach possesses appreciable restrictions. It is extremely improbable, that the given example will generate the long list; but, having made a condition of interruption by more rigid, we could create any way long streams (their exact size will be casual, but the order of sizes can be predicted). During the certain moment of a problem of memory and efficiency can make this approach unacceptable and excessive. This problem also has compelled to add functions xrange () and xreadlines () in earlier versions Python. Even more essential that many streams depend on external events, and still they should be processed, when each element is accessible. For example, the stream can listen to port or expect input of the user. In these cases creation of the full list from such stream is simply unacceptable. Python 2.1 and earlier versions offered one more reception: it was possible to use a "static" local variable for storing the information on last call of function. Obviously, global variables could make the same, but they generate well familiar problems: litter global space of names and suppose the mistakes called nelokal`nost`ju. Probably, it will surprise you - if you are not familiar with this cunning - in Python there is no "official" announcement of static area. However, if the called parameters have changeable default values, they can be long-term storehouses of the previous calls. Lists, in particular, convenient changeable objects which can contain even plural values.


Using the "static" approach we can write the following function:



* RandomWalk_Static.py *

import random

* Initialization of "static" variables

def randomwalk_static (last = [1]):

* Initialization of possible{probable} result

rand = random.random ()

* A condition of interruption

if last [0] <0.1:

* The terminator of a stream

return None

* Search of the comprehensible candidate

while abs (last [0]-rand) <0.4:

* Display of a deviation{rejection}

print '*',

* The new candidate

rand = random.random ()

* To update a "static" variable

last [0] = rand

return rand


This function is rather noncritical to memory. It is enough to her to remember only one previous value, she returns only a singular (instead of the long list of numbers). Similar function could return the following size dependent (in part or completely) from external events. Lack of this approach that he is a little less laconic and is much less elegant.



* Iterate over Static Random Walk *

num = randomwalk_static ()

while num is not None:

print num,

num = randomwalk_static ()



New way of wandering


" Under a hood " Python 2.2 all sequences - iteratory. Well-known pitonovskaja a design ' for elem in lst: ' now actually requests lst for creation iteratora. The cycle for will be then to cause consistently a method .next () it iteratora, yet will not reach{achieve} exception StopIteration. Fortunately, programmers on Python do not need to know, that occurs in this place as all built - in types generate the iteratory automatically. Actually, now dictionaries have methods .iterkeys ().iteritems () and .itervalues () for creation iteratorov; the first corresponds{meets} to a new design: ' for key in dct: '. Similarly to this, a new design ' for line in file: ' it is supported iteratorom, causing .readline ().


But knowing, that actually occurs inside interpreter Python, becomes obvious how to use the user classes which generate own iteratory, instead of iteratory the built - in types. The example of the user class allowing directly to use randomwalk_list (), and also economical - on - ehlemetnyj-randomwalk_static, is resulted below:



* RandomWalk_Iter.py *

import random

class randomwalk_iter:

def __ init __ (self):

* Initialization of the previous value

self.last = 1

* Initialization of the candidate

self.rand = random.random ()

def __ iter __ (self):

* Creation of the elementary iteratora

return self

def next (self):

* A condition of interruption

if self.rand <0.1:

* The end of iteration

raise StopIteration

* Search of the comprehensible candidate

else:

while

abs (self.last-self.rand) <0.4:

* Display of a deviation{rejection}

print '*',

* The new candidate

self.rand = random.random ()

* Updating the previous value

self.last = self.rand


return self.rand


Application of this user iteratora is similar to use of the true list generated by function:



* Iterate with Random Walk Class *

for num in randomwalk_iter ():

print num,


Actually, the design ' if elem in itetator ' which checks only so much elements iteratora is carried out even, how much it is necessary for definition of the validity (certainly if she will give out "lie", she will need to check up all elements).



Leaving a trace from crumbs


The approaches stated above are ideal for the decision of the given problem{task}. However, any of does not approach, if in a course of performance of the program the huge number of local variables, and a code - a web of the enclosed cycles and conditions is created. In a situation, when a class iteratora or function with static (or global) variables depends on a status of numerous variables, arise two problems. The first is an essential problem{task} of creation of numerous attributes of object or elements of the static list for preservation of each value of the data. But more important problem{task} - to define{determine} how precisely to return to a relevant part of logic of a stream which corresponds{meets} to a status of the data. In fact it is very easy to forget about interaction and interconditionality of the various data.


Generators simply bypass all this problem. The generator " returns management " a new keyword yield, but "remembers" an exact point of execution{performance} where there was a return. By the following call of the generator, he begins with that place where it{him} have left before, - and in sense of course of execution of function, and in sense of value of variables.


In Python 2.2 generators are not written directly. Instead of it the function returning the generator by a call is written. It can seem strange, but as " the factory of functions " is natural opportunity Python, " the factory of generators " seems its{her} obvious conceptual development. Due to presence in a body of function of one or several yield directives, she turns to factory of functions. If in a body of a code there is yield directive, the operator return can meet only without returned value. However it is better to make a body of function so, that execution{performance} is simple " has fallen out from the end " after all yield directives will be executed. But if there is an operator return he forces the created generator to excite exception StopIteration, instead of to return the further values.


It seems to me, that the choice of similar syntax for creation of factories of generators is not absolutely justified. Yield directive it easy can appear deeply in a body of function, and within the limits of the first N lines of function cannot be defined{determined}, whether function by factory of generators is. Certainly, it is fair and concerning factory of functions, but realization of factory of functions does not change existing *sintaksis* a body of function (and it is supposed, that the body of function can sometimes return simple size; though, probably, not from good design). In my opinion, the new keyword - generator instead of def - would be the best choice.


Having left in the party{side} polemic about the best syntax, we shall note, that generators can automatically work as iteratory when them for this purpose cause. For this purpose it is not required any methods of classes like. __ iter __ (). Each yield directive becomes the returned size for a method .next () the generator. The elementary example explains said:



* Simplest Possible Python 2.2 Generator *

>>> from __ future __ import generators

>>> def gen ():

yield 1


>>> g = gen ()

>>> g.next ()

1

>>> g.next ()

Traceback (most recent call last):

File " ", line 1, in?

g.next ()

StopIteration


Let's involve the generator for the decision discussed above problems{tasks}:



* RandomWalk_Generator.py *

* It is necessary only for Python 2.2

from __ future __ import generators

import random

def randomwalk_generator ():

* Initialization of potential elements


last, rand = 1, random.random ()

* A condition of interruption

while rand> 0.1:

* Display of a deviation{rejection}

print '*',

* Acceptance of number

if abs (last-rand)> = 0.4:

* update the previous number

last = rand

* Return To THIS POINT

yield rand

* A new potential element

rand = random.random ()

* Return of last small element

yield rand


Simplicity of this definition more than is attractive. It is possible to use this generator manually, or in quality iteratora. In the first case the generator can be passed within the limits of the program and be caused there where he is necessary and when it is required (that is rather floppy). The simple example realizes this case below:



* Manual use of Random Walk Generator *

gen = randomwalk_generator ()

try:

while 1: print gen.next (),

except StopIteration:

pass


However, most likely the generator will be used much more often in quality iteratora, that is more compact (and looks as old kind sequence):



* Random Walk Generator as Interator *

for num in randomwalk_generator ():

print_short (num)



Conclusions


Some time that the developers writing on Python, have accustomed with all cunnings at use of generators is required. The power inherent in this simple design, is amazing; I suspect, what even skilled programmers (as developers herself Python) will open new thin aspects of application of generators.


In summary, let me to bring to to your attention one more example which can be found in the module test_generators.py distribution kit Python 2.2. We shall assume, you have tree on which all sites you want to pass from left to right. Using traditional variable statuses, it is difficult enough to receive a class or function immediately. Application of the generator does{makes} performance of this problem{task} ridiculously simple:



>>>> * A recursive generator that generates

* Tree leaves in in-order.

>>> def inorder (t):

... if t:

... for x in inorder (t.left):

... yield x

... yield t.label

... for x in inorder (t.right):

... yield x