program_counter's profile picture. all things toward agi

Program Counter

@program_counter

all things toward agi

Program Counter đã đăng lại

I enjoy Sam Schillace's weekly AI-positive posts like this one: open.substack.com/pub/sundaylett…


Program Counter đã đăng lại

I'm teaching a new "Intro to Modern AI" course at CMU this Spring: modernaicourse.org. It's an early-undergrad course on how to build a chatbot from scratch (well, from PyTorch). The course name has bothered some people – "AI" usually means something much broader in academic…


Program Counter đã đăng lại

looked it up because Anthropic has it in their job postings

vikhyatk's tweet image. looked it up because Anthropic has it in their job postings

really enjoying reading the trio tutorial trio.readthedocs.io/en/stable/tuto…

vikhyatk's tweet image. really enjoying reading the trio tutorial

trio.readthedocs.io/en/stable/tuto…


Program Counter đã đăng lại

checkout our new work AdaptiveNN — an active visual reasoning framework. It learns where to look via self-rewarding RL (no external rewards!) and integrates evidence across sequential fixations. up to 28× lower infer cost and more human-like vision. nature.com/articles/s4225…

YangYue_THU's tweet image. checkout our new work AdaptiveNN — an active visual reasoning framework.

It learns where to look via self-rewarding RL (no external rewards!) and integrates evidence across sequential fixations.

up to 28× lower infer cost and more human-like vision.

nature.com/articles/s4225…
YangYue_THU's tweet image. checkout our new work AdaptiveNN — an active visual reasoning framework.

It learns where to look via self-rewarding RL (no external rewards!) and integrates evidence across sequential fixations.

up to 28× lower infer cost and more human-like vision.

nature.com/articles/s4225…
YangYue_THU's tweet image. checkout our new work AdaptiveNN — an active visual reasoning framework.

It learns where to look via self-rewarding RL (no external rewards!) and integrates evidence across sequential fixations.

up to 28× lower infer cost and more human-like vision.

nature.com/articles/s4225…
YangYue_THU's tweet image. checkout our new work AdaptiveNN — an active visual reasoning framework.

It learns where to look via self-rewarding RL (no external rewards!) and integrates evidence across sequential fixations.

up to 28× lower infer cost and more human-like vision.

nature.com/articles/s4225…

Program Counter đã đăng lại

Who Invented Transformer Neural Networks (the T in ChatGPT)? Timeline of Transformer evolution people.idsia.ch/~juergen/who-i… ★ 1991. Original tech report on what's now called the unnormalized linear Transformer (ULTRA)[FWP0][ULTRA]. KEY/VALUE was called FROM/TO. ULTRA uses outer…

SchmidhuberAI's tweet image. Who Invented Transformer Neural Networks (the T in ChatGPT)? Timeline of Transformer evolution people.idsia.ch/~juergen/who-i… 

★ 1991. Original tech report on what's now called the unnormalized linear Transformer (ULTRA)[FWP0][ULTRA]. KEY/VALUE was called FROM/TO. ULTRA uses outer…

Program Counter đã đăng lại

the kimi k2 creative writing RL rubric clamps down on qualifiers and justifications, making it “confident and assertive, even in contexts involving ambiguity or subjectivity.” helps explain k2's muscular writing style / aesthetic risk taking @dbreunig on subjective rubrics:

darrenangle's tweet image. the kimi k2 creative writing RL rubric clamps down on qualifiers and justifications, making it “confident and assertive, even in contexts involving ambiguity or subjectivity.”

helps explain k2's muscular writing style / aesthetic risk taking

@dbreunig on subjective rubrics:

Program Counter đã đăng lại

Congrats to Yanshu!🥳 He is also looking for PhD positions!

Excited to share that I have 2 first-authored papers accepted at @RealAAAI ! And both are selected as Oral presentations!🎉🎉🎉 My research focuses on enhancing the reasoning abilities of MLLMs in complex scenarios, especially in vision-language tasks. I would like to bridge the…

karrsen0713's tweet image. Excited to share that I have 2 first-authored papers accepted at @RealAAAI !  And both are selected as Oral presentations!🎉🎉🎉
My research focuses on enhancing the reasoning abilities of MLLMs in complex scenarios, especially in vision-language tasks. I would like to bridge the…


Program Counter đã đăng lại

RL LEARNING WITH LORA: A DIVERSE DEEP DIVE

kalomaze's tweet image. RL LEARNING WITH LORA: A DIVERSE DEEP DIVE

Program Counter đã đăng lại

2006 RBM paper convinced Quoc Le (and other visionaries) to ditch kernel to go deep. Back in the old days, the deep learning interview was how to train a DBM. And people wrote backprops and wake-sleeps (I did in Matlab and then cudamat). SVM, Gaussian processes and LDA still…

people often take deep learning as synonymous with backprop, but deep networks were originally trained with probabilistic energy-based methods! found this great talk by hinton from 2012 about EBMs, boltzmann machines, and deep belief nets at the start of the deep learning era

khoomeik's tweet image. people often take deep learning as synonymous with backprop, but deep networks were originally trained with probabilistic energy-based methods!

found this great talk by hinton from 2012 about EBMs, boltzmann machines, and deep belief nets at the start of the deep learning era


Program Counter đã đăng lại

Your program writes to a file. `write()` succeeds. `close()` returns 0. So your data is safely on disk, right? 🤨 Not so fast. This is one of the most common and dangerous traps in Linux I/O. A thread on silent data loss and the `close()` syscall (#3 on x86_64). 🧵👇

popovicu94's tweet image. Your program writes to a file. `write()` succeeds. `close()` returns 0. So your data is safely on disk, right? 🤨

Not so fast.

This is one of the most common and dangerous traps in Linux I/O. A thread on silent data loss and the `close()` syscall (#3 on x86_64). 🧵👇

Program Counter đã đăng lại

The science of learning has advanced significantly over the past century. Numerous effective cognitive learning strategies have been identified and researched extensively since the early to mid-1900s, with key findings being successfully reproduced over and over again.

justinskycak's tweet image. The science of learning has advanced significantly over the past century. Numerous effective cognitive learning strategies have been identified and researched extensively since the early to mid-1900s, with key findings being successfully reproduced over and over again.

Program Counter đã đăng lại

WOW that's VERY COOL!!! cc @Jianlin_S -- we need a blog on steepest descent tricks like truncation

How did we improve the sensitivity to learning rates? MuonAdam/MuonMax are steepest descent methods, thus we can import tricks such as truncation. Truncation changes the steepest descent model, by making use of a known lower bound on the loss. Scaling laws give us a lower bound

gowerrobert's tweet image. How did we improve the sensitivity to learning rates? MuonAdam/MuonMax are steepest descent methods, thus we can import tricks such as truncation. Truncation changes the steepest descent model, by making use of a known lower bound on the loss. Scaling laws give us a lower bound


Program Counter đã đăng lại

Quantization is the wild west of large scale deep learning. Every library supports different combinations of options: - Activations quantized or not - Integer or floating point - Precisions: 2, 4, 6, 8, etc - Block / group / tensor scaling - Data-dependent / independent quants…


Program Counter đã đăng lại

The highest-performing developers I worked with at Amazon asked better questions than everyone else. After 18 years in tech, here's what I learned: while average engineers jump to solutions, exceptional ones pause to ask the right questions first. The 6 questions that separated…


Program Counter đã đăng lại

Most robotic companies do not design their own 🧠, except Tesla.

Designing chips in-house unlocks absolute efficiency that no off-the-shelf part can match AI5 has potential to be 50x more performant than AI4 (our current hardware) – working toward mass production in 2027 It will be used in vehicles, robotics, training & data centers

Tesla's tweet image. Designing chips in-house unlocks absolute efficiency that no off-the-shelf part can match

AI5 has potential to be 50x more performant than AI4 (our current hardware) – working toward mass production in 2027

It will be used in vehicles, robotics, training & data centers


Program Counter đã đăng lại

Designing an inference chip for robots is actually very difficult. In data centers each chip is bathed in jacuzzi and babysat by nannies. If they died it would be hot swapped by one of their clones. The fault rate of GPUs in datacenter is actually quite high. Industrial average…


Program Counter đã đăng lại

New weekend blogpost. Some light PTX exploration, and a simple Top-K kernel.

AlpinDale's tweet image. New weekend blogpost. Some light PTX exploration, and a simple Top-K kernel.

Program Counter đã đăng lại

I'm hiring a student researcher for next summer at the intersection of MARL x LLM. If you have a strong background and research experience in MARL algorithms, please apply and drop me an email (so that I know you've applied!) google.com/about/careers/…


Program Counter đã đăng lại

When we began applying diffusion to language in my lab at Stanford, many doubted it could work. That research became Mercury diffusion LLM: 10X faster, more efficient, and now the foundation of @_inception_ai. Proud to raise $50M with support from top investors.

Today’s LLMs are painfully slow and expensive. They are autoregressive and spit out words sequentially. One. At. A. Time. Our dLLMs generate text in parallel, delivering answers up to 10X faster. Now we’ve raised $50M to scale them. Full story from @russellbrandom in…



United States Xu hướng

Loading...

Something went wrong.


Something went wrong.