A Fellow Struggler
@BotanicBinary
Just another guy trying to make machines learn to eliminate us. Currently @microsoft to build "another Copilot"
Thanks for this @cneuralnetwork I will take this up personally. Broad aims: - end to end understanding of the ML aspect of modern AI mostly in the post training and RL side of things. Includes detailed equations, paper readings and code implementations. - get into the…
i want to start something small but powerful a movement called "180 Days of Whatever" here’s the idea: for the next 180 days, you’ll do two things: - set one goal you’re determined to achieve in these 6 months, big or small, personal or professional - show up daily:…
You know winter is here when your US meetings are now 1 hr postponed from the usual. Day light savings have arrived
Ok learnt about value function approximation using functions. Moved away from tabular rl here. Now basic MC and TD ctrl work well with linear approximators only. Since we directly update w, there is a ripple effect across states and hence convergence is tricky. Came to know of…
Atleast take a blue tick before ragebaiting. Here’s my two cents: every fandom (football, tennis, f1) looks “elite” from the distance. Delve deep and its the same shit everywhere. Its just that cricket has more Indian shit and people just find it cool to stray away from the…
the difference between cricket and football fans is that football fans are mature while most cricket fans are just full of gawaars and dehatis
Was surprised YouTube even recommended this. Air quality in India is so bad…. oh sorry its Dubai. Would have got millions of views if it was the other title
Good lecture, shows the sarsa and q- learning algorithms. Also clarified on policy and off policy distinction. Really need to implement all these algorithms in weekend. Monte Carlo prediction and ctrl (usually on policy since off policy is super unstable) TD (0) prediction and…
Isn’t this like 3 months old trend at this point?
Ok quite a solid lecture. Prediction is definitely slightly easier problem to solve imo. Two techniques covered and then an intermediate way. Mc methods are dependent on episodic tasks and has lower bias but can vary wildly. Td (0) is better in variance as well as working for…
Ok this was definitely more fun. Value iteration, policy iteration and truncated policy iteration. Good algorithms and would like to implement them as well: - Value iteration looked simplest, start with value function, update policy greedily based on action values and get new…
Not learning much today. Mostly revision of Bellman equations and resolving some questions. Also some idea of whats next; model based learning like value and policy iteration. Will try to do it next
Ok kinda get the concept. The key takeaway is able to understand 2 things: - Evaluate a policy using value functions. These are the Bellman equations. - Find the optimal policy by finding the optimal value functions. These are covered in Bellman optimality equations. Solving…
Completed watching this and understood basics: agent, environment, action, reward, history, state, markov states, policy, value function, model. Tldr: RL tries to find a set of actions (policy) to maximise expected reward. Can be done using: value function only methods, policy…
United States Trends
- 1. #DWTS 21.5K posts
- 2. Robert 92.7K posts
- 3. Elaine 52.7K posts
- 4. Dylan 27.7K posts
- 5. Northern Lights 6,560 posts
- 6. #RHOSLC 3,714 posts
- 7. Carrie Ann 1,211 posts
- 8. #aurora 2,000 posts
- 9. Daniella 2,592 posts
- 10. #WWENXT 7,186 posts
- 11. Louisville 9,417 posts
- 12. Veterans Day 466K posts
- 13. Jeezy 3,291 posts
- 14. Meredith 2,883 posts
- 15. Oweh N/A
- 16. Woody 22.4K posts
- 17. Mikel Brown N/A
- 18. #DancingWithTheStars N/A
- 19. Jaland Lowe N/A
- 20. Britani N/A
Something went wrong.
Something went wrong.