BotanicBinary's profile picture. Just another guy trying to make machines learn to eliminate us. Currently @microsoft to build "another Copilot"

A Fellow Struggler

@BotanicBinary

Just another guy trying to make machines learn to eliminate us. Currently @microsoft to build "another Copilot"

Pinned

Thanks for this @cneuralnetwork I will take this up personally. Broad aims: - end to end understanding of the ML aspect of modern AI mostly in the post training and RL side of things. Includes detailed equations, paper readings and code implementations. - get into the…

i want to start something small but powerful a movement called "180 Days of Whatever" here’s the idea: for the next 180 days, you’ll do two things: - set one goal you’re determined to achieve in these 6 months, big or small, personal or professional - show up daily:…



You know winter is here when your US meetings are now 1 hr postponed from the usual. Day light savings have arrived


Ok learnt about value function approximation using functions. Moved away from tabular rl here. Now basic MC and TD ctrl work well with linear approximators only. Since we directly update w, there is a ripple effect across states and hence convergence is tricky. Came to know of…


Atleast take a blue tick before ragebaiting. Here’s my two cents: every fandom (football, tennis, f1) looks “elite” from the distance. Delve deep and its the same shit everywhere. Its just that cricket has more Indian shit and people just find it cool to stray away from the…

the difference between cricket and football fans is that football fans are mature while most cricket fans are just full of gawaars and dehatis



Was surprised YouTube even recommended this. Air quality in India is so bad…. oh sorry its Dubai. Would have got millions of views if it was the other title

BotanicBinary's tweet image. Was surprised YouTube even recommended this. Air quality in India is so bad…. oh sorry its Dubai. Would have got millions of views if it was the other title

Good lecture, shows the sarsa and q- learning algorithms. Also clarified on policy and off policy distinction. Really need to implement all these algorithms in weekend. Monte Carlo prediction and ctrl (usually on policy since off policy is super unstable) TD (0) prediction and…

Model free control!

BotanicBinary's tweet image. Model free control!


Isn’t this like 3 months old trend at this point?

Wait how do I fit them in proportion

ChaiChaahiManne's tweet image. Wait how do I fit them in proportion


Model free control!

BotanicBinary's tweet image. Model free control!

Ok quite a solid lecture. Prediction is definitely slightly easier problem to solve imo. Two techniques covered and then an intermediate way. Mc methods are dependent on episodic tasks and has lower bias but can vary wildly. Td (0) is better in variance as well as working for…

Model free RL begins

BotanicBinary's tweet image. Model free RL begins


Model free RL begins

BotanicBinary's tweet image. Model free RL begins

Ok this was definitely more fun. Value iteration, policy iteration and truncated policy iteration. Good algorithms and would like to implement them as well: - Value iteration looked simplest, start with value function, update policy greedily based on action values and get new…


Not learning much today. Mostly revision of Bellman equations and resolving some questions. Also some idea of whats next; model based learning like value and policy iteration. Will try to do it next


Ok kinda get the concept. The key takeaway is able to understand 2 things: - Evaluate a policy using value functions. These are the Bellman equations. - Find the optimal policy by finding the optimal value functions. These are covered in Bellman optimality equations. Solving…

Mdp mein hi faat gai 😢

BotanicBinary's tweet image. Mdp mein hi faat gai 😢


Mdp mein hi faat gai 😢

BotanicBinary's tweet image. Mdp mein hi faat gai 😢

Completed watching this and understood basics: agent, environment, action, reward, history, state, markov states, policy, value function, model. Tldr: RL tries to find a set of actions (policy) to maximise expected reward. Can be done using: value function only methods, policy…

Starting RL with this wish me luck!

BotanicBinary's tweet image. Starting RL with this wish me luck!


Starting RL with this wish me luck!

BotanicBinary's tweet image. Starting RL with this wish me luck!

Ignorance is bliss


Downfall incoming after promo 😔


Is it just me or the PL broadcast quality on @HotstarReality is just shit this season?


United States Trends

Loading...

Something went wrong.


Something went wrong.