Understanding “yield” in Python
I work as a consultant in an analytics firm, mainly in the Computer Vision domain, and writing Python scripts, creating modules is like a day-to-day thing. I’ve written a plethora of codes, participated in a few competitions; heck I teach Python to newcomers inside the firm; but, and I am not so happy to admit it- “I’ve never used the “yield” keyword within a function.” Until recently when I did 😃(more about this later)
Now few of the folks who might end up reading this article would be like- “What an idiot!” or some of your might be like me- “Why to even use it?”. Some would be on the lines of — “Umm, how or when to even use it?” and I am not ignoring the fact that a minority of us might say- “What’s a yield?”
NOTE — Before we move ahead, a quick note- Whenever you see any passage in Blockquotes(like the one you are currently reading), it’s something I am quoting from either a website or from a person. And just above that you’d see something underlined and marked BOLD which would lead you to the original source where you might get lot more related content. I don’t want to take credit for someone else’s work, so doing the due diligence 🙌.
So now, rather than making this a long boring story, let me just come straight to the point. But before I do that, here’s a small meme I made 😅
Now if you took any hint from this meme, it sort of hints at what “yield” actually is useful for. So here’s a StackOverflow answer that nicely sums it up:
yield
is just likereturn
- it returns whatever you tell it to (as a generator). The difference is that the next time you call the generator, execution starts from the last call to theyield
statement. Unlike return, the stack frame is not cleaned up when a yield occurs, however control is transferred back to the caller, so its state will resume the next time the function is called.
Get it? Well if someone showed me the same text while explaining something, even I’d end up standing with a long face with no clue what the above statement meant. But the above statement does explain “yield” in a nutshell.
Let’s break it down…
In Python, a function as the name suggests — is a block of code that only runs when it is called. You can pass data, known as parameters, into a function. A function can return data as a result. With me yet? Good!
Now replace the return keyword with yield. What you get is a Generator. (As the StatQuest guy says - BAM).
Wait, what’s a Generator?
If a function contains at least one
yield
statement (it may contain otheryield
orreturn
statements), it becomes a generator function. Bothyield
andreturn
will return some value from a function.It is fairly simple to create a generator in Python. It is as easy as defining a normal function, but with a
yield
statement instead of areturn
statement.The difference is that while a
return
statement terminates a function entirely,yield
statement pauses the function saving all its states and later continues from there on successive calls.
Ok, so I hope this explained it a bit better. So I won’t go into the full depth and copy-paste from an already well-explained website about what a generator is, but let’s take what is useful for our current discussion. So we learned that when return
is used, the function terminates entirely, while with yield
the function takes a pause and continues when called till it exhausts.
Okay, so what?
Well, so my friend this allows us to make our code more memory efficient. Still no clue? Bear with me for a little more.
So, at times we use a function that performs some action and it gives back some result(not always, but usually). It can be a numeric value, a string, or a data structure(list, tuple, array, etc). Now with this returned list or something, we perform maybe some other operation. These iterables(data structures) are handy because you can read them as much as you wish, but you store all the values in memory and this is not ideal when you have a lot of values.
At this point, you might say again- “Okay, so what?”. Well, to be honest, nothing. This memory thing usually is not much of an issue. In fact, I never had any such issue, or never had such big of the list that I had to resolve to use “yield”. Until I came across a problem a friend of mine recently asked. It’s a really cute and nice one by the way.
The problem statement!
The date 01–03–05( 1st March 2005) has three consecutive odd numbers. This is the first day in the 21st Century with this property. How many days with this property are there in total in the 21st Century?
Now at first look, the problem is sort of simple, doable by hand but one can never be sure if the answer what you ended up with is correct or not. Well when she first shared this with me, I just jotted down some obvious ones-
01–03–05
03–05–07
05–07–09
07–09–11
09–11–13
I couldn’t think of any other such date. So I asked, is this the answer? Her first response was- I was trying to confirm with you, I don’t know the actual answer. Well, that’s a bummer 😒. But wait, I know Python, why not just code this?
The solution
So I wrote the code below. Now while writing it I had an epiphany. Do I really need to create a list with all the dates in the 21st century and then iterate over them? No right? All I wanted was to get a date, see if it has three consecutive odd numbers and move on. Now all of a sudden “yield” came to my mind. I had taught the newcomers plenty of times that what yield is and what it does but never used it myself. So I thought why not try it this time, and so I did:
So in the above code when you call give_date(...)
it creates a generator which when called, gives back a date value and then pauses. This way I am not holding all the dates inside a list which would consume a lot of memory. For anyone interested, if I make a list inside the function and let it append all the dates first and then return it, the code execution time is almost exactly the same, but the memory utilization of a generator would be 120 bytes, while that of the returned list would be 321104 bytes!
Now before anyone goes bonkers on me that hey, that’s just 0.3MB. Well, memory efficiency remember? One should realize here that using yield, we have created a generator that would be 2675 times lower in size than the list. Now that’s a significant amount. For a small list containing all the dates of 100 years(it’s actually not that small), it’s not a big deal but many a time, a lot of us deal with a lot larger amount of data and storing such a huge amount of data in a list is not ideal at all. And if you don’t delete any such temporary variable after it’s used, which I don’t think most of us usually do, this used up memory remains as it is. While if you perform the same using a generator, it ceases to exist once its work is over. Yeah, not kidding. I can actually show you, take a look:
Remember that a generator is an iterator; that is, it is one-time-use. If you want to reuse it, you should call my_range(...)
again. If you need to use the result twice/again, you should first convert the result to a list and store it in a variable likex = list(my_range(10))
.
Now I know as a smart coder, a lot of you would be like, this is such a load of crap! There wasn’t even a need for a function at all. Well I know, I know. This same thing can be done like this too:
Again, memory-efficient right? And I believe there are so many other and better ways too. After all, there’s always some scope for improvement. Well, folks, I am used to writing code in form of a function, and actually, I am kind of glad I didn’t think of writing the code for this problem this way because I actually ended up seeing a really good use of “yield” for first time in my life.
And hey, the actual answer to that question is- well why don’t you try on your own 😄. And if anyone is new to all this and coding, feel free to ask me in the comments.
I hope this article or post was a good read for anyone who ended up reading it. My sincere thanks to anyone who did 😃. This is my first time trying Medium and, to be honest, I am just trying it out as per the recommendation of a colleague and fellow medium user — Sreekiran. Check out his posts too, they are pretty knowledgeable.
Thanks again and if this article goes well, I might write a lot more, maybe Computer Vision related next time 👐.