A Fellow Struggler

@BotanicBinary

Just another guy trying to make machines learn to eliminate us. Currently @microsoft to build "another Copilot"

Joined May 2025

239Posts 65Followers 128Following

Pinned

A Fellow Struggler

@BotanicBinary

Jun 24

Thanks for this @cneuralnetwork I will take this up personally. Broad aims: - end to end understanding of the ML aspect of modern AI mostly in the post training and RL side of things. Includes detailed equations, paper readings and code implementations. - get into the…

neural nets.

@cneuralnetwork

Jun 23

i want to start something small but powerful a movement called "180 Days of Whatever" here’s the idea: for the next 180 days, you’ll do two things: - set one goal you’re determined to achieve in these 6 months, big or small, personal or professional - show up daily:…

A Fellow Struggler

@BotanicBinary

Nov 5

You know winter is here when your US meetings are now 1 hr postponed from the usual. Day light savings have arrived

A Fellow Struggler

@BotanicBinary

Nov 4

Ok learnt about value function approximation using functions. Moved away from tabular rl here. Now basic MC and TD ctrl work well with linear approximators only. Since we directly update w, there is a ripple effect across states and hence convergence is tricky. Came to know of…

A Fellow Struggler

@BotanicBinary

Nov 2

Atleast take a blue tick before ragebaiting. Here’s my two cents: every fandom (football, tennis, f1) looks “elite” from the distance. Delve deep and its the same shit everywhere. Its just that cricket has more Indian shit and people just find it cool to stray away from the…

nikita

@aareyyyyyyy

Oct 31

the difference between cricket and football fans is that football fans are mature while most cricket fans are just full of gawaars and dehatis

A Fellow Struggler

@BotanicBinary

Oct 31

Was surprised YouTube even recommended this. Air quality in India is so bad…. oh sorry its Dubai. Would have got millions of views if it was the other title

BotanicBinary's tweet image. Was surprised YouTube even recommended this. Air quality in India is so bad…. oh sorry its Dubai. Would have got millions of views if it was the other title

A Fellow Struggler

@BotanicBinary

Oct 24

Good lecture, shows the sarsa and q- learning algorithms. Also clarified on policy and off policy distinction. Really need to implement all these algorithms in weekend. Monte Carlo prediction and ctrl (usually on policy since off policy is super unstable) TD (0) prediction and…

A Fellow Struggler

@BotanicBinary

Oct 24

Model free control!

A Fellow Struggler

@BotanicBinary

Oct 24

Isn’t this like 3 months old trend at this point?

Cheeni

@ChaiChaahiManne

Oct 23

Wait how do I fit them in proportion

A Fellow Struggler

@BotanicBinary

Oct 24

Model free control!

A Fellow Struggler

@BotanicBinary

Oct 21

Ok quite a solid lecture. Prediction is definitely slightly easier problem to solve imo. Two techniques covered and then an intermediate way. Mc methods are dependent on episodic tasks and has lower bias but can vary wildly. Td (0) is better in variance as well as working for…

A Fellow Struggler

@BotanicBinary

Oct 21

Model free RL begins

A Fellow Struggler

@BotanicBinary

Oct 21

Model free RL begins

A Fellow Struggler

@BotanicBinary

Oct 13

Ok this was definitely more fun. Value iteration, policy iteration and truncated policy iteration. Good algorithms and would like to implement them as well: - Value iteration looked simplest, start with value function, update policy greedily based on action values and get new…

A Fellow Struggler

@BotanicBinary

Oct 13

Back to RL again

A Fellow Struggler

@BotanicBinary

Oct 13

Back to RL again

A Fellow Struggler

@BotanicBinary

Oct 5

Not learning much today. Mostly revision of Bellman equations and resolving some questions. Also some idea of whats next; model based learning like value and policy iteration. Will try to do it next

A Fellow Struggler

@BotanicBinary

Oct 4

Ok kinda get the concept. The key takeaway is able to understand 2 things: - Evaluate a policy using value functions. These are the Bellman equations. - Find the optimal policy by finding the optimal value functions. These are covered in Bellman optimality equations. Solving…

A Fellow Struggler

@BotanicBinary

Oct 4

Mdp mein hi faat gai 😢

A Fellow Struggler

@BotanicBinary

Oct 4

Mdp mein hi faat gai 😢

A Fellow Struggler

@BotanicBinary

Oct 3

Completed watching this and understood basics: agent, environment, action, reward, history, state, markov states, policy, value function, model. Tldr: RL tries to find a set of actions (policy) to maximise expected reward. Can be done using: value function only methods, policy…