How an AI Concept is Reshaping My Reality.
Reinforcement Learning: Choosing to Explore
The 'Exploit' Loop and the 'Aha!' Moment
Ever wonder how AI models like ChatGPT got so uncannily good at conversation? A huge part of their secret sauce is a process called Reinforcement Learning (RL). In one line, it’s like teaching an AI to play a video game: it gets points for good moves and loses points for bad ones, and over millions of attempts, it learns the best strategy to win.
But for that AI to find the best strategy, it has to constantly wrestle with a core dilemma: Explore vs. Exploit. Does it exploit a path where it knows it can get 10 points? Or does it explore a new, unknown tunnel that might have a 100-point jackpot... or nothing at all
And you’re probably wondering why I’m telling you all this. It’s because a friend brought up this exact concept recently, and it hit me.
It hit me that I'd been living mostly in 'exploit' mode. So I started to introspect: was I really making the best of things? I try to work mostly, but is that just pseudo-productivity? I realized choosing to 'explore' doesn't mean having fun 24/7. It just means being open to trying more things—like not letting an extra 5km or ₹200 stop me from a new experience of any kind.
Running the Experiments
It’s been exactly a month since I resigned from my job. The first few weeks were me going back home and exploring Varanasi with my parents, knowing about the ghats and the history of the Ram Mandir and the case of Kashi Vishwanath mandir. The boat ride there was serene. After that much needed reset, I came back to Noida with a clear goal: to analytically explore my own past programming. I gave Claude, Gemini, GPT, and Grok six months of my 2025 notes and asked them to plot the best and worst path for me. It’s a powerful practice I’d recommend to anyone trying to understand their own patterns.
Even explorations that 'fail' can have good side effects. Me and Star Labs(remember about Hemingway AI? )... we thought of an accountability product but scrapped it as it didn’t get enough traction. But the real reward wasn't the product; it was the process. Through that exploration, I learned how to use N8N pretty well, one of the most powerful automation tools out there.
My new 'explore' mindset isn't just for big projects; it's also about running small, weekly experiments. For instance, I've been using an AI calling tool Boardy which calls me every Tuesday afternoon and takes work updates from me and connects me to other people. It connected me with an Italian and a Chinese guy. Fun little accountability tool, would recommend it.
The Takeaway
I heard a quote recently regarding fitness, you can apply to anything in your life.
“If you are not trying to be fit right now, you’re making a conscious decision to never be fit in your life as responsibilities are always gonna pile up more and more.”
Yup, this is a bit of a privileged take so take it with a grain of salt. And who knows, maybe this is all just my brain's way of building a narrative to support a recent decision. A classic case of confirmation bias.
Ultimately, even with that doubt, it’s about making a conscious choice to gather new data on yourself. So, I’ll leave you with a question: where in your life could you use a little less 'exploit' and a little more 'explore'?
Gallery















This one’s a thinker. Good work!
Interesting parallel for sure! Exploit for comfort, explore for growth