Program Counter

@program_counter

all things toward agi

Valhalla

Tham gia vào Tháng 12 2022

1KBài đăng 463Người theo dõi 8KĐang theo dõi

Bạn có thể thích

@lucas__durante

@SeniorPartner_

@Xavier_web_dev

@cameraenvy

Program Counter đã đăng lại

elie

@eliebakouch

13 giờ

blog.character.ai/technical/insi…

eliebakouch's tweet card. What made Character.ai's early models so engaging? Before open-source models became the norm, our team built Kaiju - a family of in-house LLMs designed to power millions of fast, expressive convers...

Inside Kaiju - building conversational models at scale

Nguồn: blog.character.ai

Program Counter đã đăng lại

Guido van Rossum

@gvanrossum

10 thg 11

I enjoy Sam Schillace's weekly AI-positive posts like this one: open.substack.com/pub/sundaylett…

Program Counter đã đăng lại

I'm teaching a new "Intro to Modern AI" course at CMU this Spring: modernaicourse.org. It's an early-undergrad course on how to build a chatbot from scratch (well, from PyTorch). The course name has bothered some people – "AI" usually means something much broader in academic…

Program Counter đã đăng lại

vik

@vikhyatk

9 thg 11

looked it up because Anthropic has it in their job postings

vik

@vikhyatk

9 thg 11

really enjoying reading the trio tutorial trio.readthedocs.io/en/stable/tuto…

Program Counter đã đăng lại

Yang Yue

@YangYue_THU

9 thg 11

checkout our new work AdaptiveNN — an active visual reasoning framework. It learns where to look via self-rewarding RL (no external rewards!) and integrates evidence across sequential fixations. up to 28× lower infer cost and more human-like vision. nature.com/articles/s4225…

YangYue_THU's tweet image. checkout our new work AdaptiveNN — an active visual reasoning framework.

It learns where to look via self-rewarding RL (no external rewards!) and integrates evidence across sequential fixations.

up to 28× lower infer cost and more human-like vision.

nature.com/articles/s4225…

Program Counter đã đăng lại

Jürgen Schmidhuber

@SchmidhuberAI

9 thg 11

Who Invented Transformer Neural Networks (the T in ChatGPT)? Timeline of Transformer evolution people.idsia.ch/~juergen/who-i… ★ 1991. Original tech report on what's now called the unnormalized linear Transformer (ULTRA)[FWP0][ULTRA]. KEY/VALUE was called FROM/TO. ULTRA uses outer…

SchmidhuberAI's tweet image. Who Invented Transformer Neural Networks (the T in ChatGPT)? Timeline of Transformer evolution people.idsia.ch/~juergen/who-i…

★ 1991. Original tech report on what's now called the unnormalized linear Transformer (ULTRA)[FWP0][ULTRA]. KEY/VALUE was called FROM/TO. ULTRA uses outer…

Program Counter đã đăng lại

darren

@darrenangle

8 thg 11

the kimi k2 creative writing RL rubric clamps down on qualifiers and justifications, making it “confident and assertive, even in contexts involving ambiguity or subjectivity.” helps explain k2's muscular writing style / aesthetic risk taking @dbreunig on subjective rubrics:

darrenangle's tweet image. the kimi k2 creative writing RL rubric clamps down on qualifiers and justifications, making it “confident and assertive, even in contexts involving ambiguity or subjectivity.”

helps explain k2's muscular writing style / aesthetic risk taking

@dbreunig on subjective rubrics:

Program Counter đã đăng lại

MikaStars★

@MikaStars39

9 thg 11

Congrats to Yanshu!🥳 He is also looking for PhD positions!

Yanshu Li✈️NeruIPS&AAAI

@karrsen0713

8 thg 11

Excited to share that I have 2 first-authored papers accepted at @RealAAAI ! And both are selected as Oral presentations!🎉🎉🎉 My research focuses on enhancing the reasoning abilities of MLLMs in complex scenarios, especially in vision-language tasks. I would like to bridge the…

karrsen0713's tweet image. Excited to share that I have 2 first-authored papers accepted at @RealAAAI ! And both are selected as Oral presentations!🎉🎉🎉
My research focuses on enhancing the reasoning abilities of MLLMs in complex scenarios, especially in vision-language tasks. I would like to bridge the…

Program Counter đã đăng lại

kalomaze

@kalomaze

9 thg 11

RL LEARNING WITH LORA: A DIVERSE DEEP DIVE

Program Counter đã đăng lại

Shane Gu

@shaneguML

9 thg 11

2006 RBM paper convinced Quoc Le (and other visionaries) to ditch kernel to go deep. Back in the old days, the deep learning interview was how to train a DBM. And people wrote backprops and wake-sleeps (I did in Matlab and then cudamat). SVM, Gaussian processes and LDA still…

Rohan Pandey

@khoomeik

8 thg 11

people often take deep learning as synonymous with backprop, but deep networks were originally trained with probabilistic energy-based methods! found this great talk by hinton from 2012 about EBMs, boltzmann machines, and deep belief nets at the start of the deep learning era

khoomeik's tweet image. people often take deep learning as synonymous with backprop, but deep networks were originally trained with probabilistic energy-based methods!

found this great talk by hinton from 2012 about EBMs, boltzmann machines, and deep belief nets at the start of the deep learning era

Program Counter đã đăng lại

Uros Popovic

@popovicu94

8 thg 11

Your program writes to a file. `write()` succeeds. `close()` returns 0. So your data is safely on disk, right? 🤨 Not so fast. This is one of the most common and dangerous traps in Linux I/O. A thread on silent data loss and the `close()` syscall (#3 on x86_64). 🧵👇

popovicu94's tweet image. Your program writes to a file. `write()` succeeds. `close()` returns 0. So your data is safely on disk, right? 🤨

Not so fast.

This is one of the most common and dangerous traps in Linux I/O. A thread on silent data loss and the `close()` syscall (#3 on x86_64). 🧵👇

Program Counter đã đăng lại

Justin Skycak

@justinskycak

8 thg 11

The science of learning has advanced significantly over the past century. Numerous effective cognitive learning strategies have been identified and researched extensively since the early to mid-1900s, with key findings being successfully reproduced over and over again.

justinskycak's tweet image. The science of learning has advanced significantly over the past century. Numerous effective cognitive learning strategies have been identified and researched extensively since the early to mid-1900s, with key findings being successfully reproduced over and over again.

Program Counter đã đăng lại

You Jiacheng

@YouJiacheng

8 thg 11

WOW that's VERY COOL!!! cc @Jianlin_S -- we need a blog on steepest descent tricks like truncation

Robert M. Gower 🇺🇦

@gowerrobert

6 thg 11

How did we improve the sensitivity to learning rates? MuonAdam/MuonMax are steepest descent methods, thus we can import tricks such as truncation. Truncation changes the steepest descent model, by making use of a known lower bound on the loss. Scaling laws give us a lower bound

gowerrobert's tweet image. How did we improve the sensitivity to learning rates? MuonAdam/MuonMax are steepest descent methods, thus we can import tricks such as truncation. Truncation changes the steepest descent model, by making use of a known lower bound on the loss. Scaling laws give us a lower bound

Program Counter đã đăng lại

Awni Hannun

@awnihannun

8 thg 11

Quantization is the wild west of large scale deep learning. Every library supports different combinations of options: - Activations quantized or not - Integer or floating point - Precisions: 2, 4, 6, 8, etc - Block / group / tensor scaling - Data-dependent / independent quants…

Program Counter đã đăng lại

Steve Huynh

@ALEngineered

7 thg 11

The highest-performing developers I worked with at Amazon asked better questions than everyone else. After 18 years in tech, here's what I learned: while average engineers jump to solutions, exceptional ones pause to ask the right questions first. The 6 questions that separated…

Program Counter đã đăng lại

Yun-Ta Tsai

@yunta_tsai

8 thg 11

Most robotic companies do not design their own 🧠, except Tesla.

Tesla

@Tesla

6 thg 11

Designing chips in-house unlocks absolute efficiency that no off-the-shelf part can match AI5 has potential to be 50x more performant than AI4 (our current hardware) – working toward mass production in 2027 It will be used in vehicles, robotics, training & data centers

Tesla's tweet image. Designing chips in-house unlocks absolute efficiency that no off-the-shelf part can match

AI5 has potential to be 50x more performant than AI4 (our current hardware) – working toward mass production in 2027

It will be used in vehicles, robotics, training &amp; data centers

Program Counter đã đăng lại

Yun-Ta Tsai

@yunta_tsai

8 thg 11

Designing an inference chip for robots is actually very difficult. In data centers each chip is bathed in jacuzzi and babysat by nannies. If they died it would be hot swapped by one of their clones. The fault rate of GPUs in datacenter is actually quite high. Industrial average…

Program Counter đã đăng lại

Alpin

@AlpinDale

8 thg 11

New weekend blogpost. Some light PTX exploration, and a simple Top-K kernel.

Program Counter đã đăng lại

Minsuk Chang

@minsuk_chang

7 thg 11

I'm hiring a student researcher for next summer at the intersection of MARL x LLM. If you have a strong background and research experience in MARL algorithms, please apply and drop me an email (so that I know you've applied!) google.com/about/careers/…

Program Counter đã đăng lại

Stefano Ermon

@StefanoErmon

6 thg 11

When we began applying diffusion to language in my lab at Stanford, many doubted it could work. That research became Mercury diffusion LLM: 10X faster, more efficient, and now the foundation of @_inception_ai. Proud to raise $50M with support from top investors.

Inception

@_inception_ai

6 thg 11

Today’s LLMs are painfully slow and expensive. They are autoregressive and spit out words sequentially. One. At. A. Time. Our dLLMs generate text in parallel, delivering answers up to 10X faster. Now we’ve raised $50M to scale them. Full story from @russellbrandom in…