Quan Steven

@robot_steven_07

Joined November 2016

84Posts 5Followers 323Following

Quan Steven reposted

The AI Timeline

@TheAITimeline

Jul 13, 2024

🚨This week’s top AI/ML research paper (big week!): LM/Transformers - Mixture of A Million Experts - Vision language models are blind - Learning to (Learn at Test Time) - PaliGemma - Arena Learning (WizardLM 2) - FBI-LLM (first Large bit-based LLM!) - Understanding Transformers…

TheAITimeline's tweet image. 🚨This week’s top AI/ML research paper (big week!):

LM/Transformers
- Mixture of A Million Experts
- Vision language models are blind
- Learning to (Learn at Test Time)
- PaliGemma
- Arena Learning (WizardLM 2)
- FBI-LLM (first Large bit-based LLM!)
- Understanding Transformers…

Quan Steven reposted

Leo Zang

@LeoTZ03

Jul 15, 2024

ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction - Use Protein Chain of Thought (ProCoT) to simulate signaling pathways with ProtTrans embeddings and step-by-step reasoning chains. - Convert the Mol dataset into prompt-answer pairs for…

LeoTZ03's tweet image. ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction
- Use Protein Chain of Thought (ProCoT) to simulate signaling pathways with ProtTrans embeddings and step-by-step reasoning chains.
- Convert the Mol dataset into prompt-answer pairs for…

Quan Steven reposted

Leo Zang

@LeoTZ03

Jul 15, 2024

Pseudo-perplexity in One Fell Swoop for Protein Fitness Estimation - Use ESM-2-650M embeddings from an unmasked sequence to predict masked one-at-a-time probability vectors (by MLP), reducing the need for multiple forward passes - Combine OFS pseudo-perplexity technique within an…

LeoTZ03's tweet image. Pseudo-perplexity in One Fell Swoop for Protein Fitness Estimation
- Use ESM-2-650M embeddings from an unmasked sequence to predict masked one-at-a-time probability vectors (by MLP), reducing the need for multiple forward passes
- Combine OFS pseudo-perplexity technique within an…

Quan Steven reposted

fly51fly

@fly51fly

Jul 13, 2024

[CL] Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models arxiv.org/abs/2407.07035 - Vision-and-language navigation (VLN) is gaining increasing research attention with the rise of foundation models like BERT, GPT-3, and CLIP. This…

fly51fly's tweet image. [CL] Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
arxiv.org/abs/2407.07035
- Vision-and-language navigation (VLN) is gaining increasing research attention with the rise of foundation models like BERT, GPT-3, and CLIP. This…

Quan Steven reposted

Abhishek

@HeyAbhishekk

Jul 14, 2024

100 AI Tools to replace your tedious work: 1. Research - ChatGPT - YouChat - Abacus - Perplexity - Copilot - Gemini 2. Image - Fotor - Stability AI - Midjourney - Microsoft Designer 3. CopyWriting - Rytr - Copy AI - Writesonic - Adcreative AI 4. Writing - Jasper - HIX AI…

HeyAbhishekk's tweet image. 100 AI Tools to replace your tedious work:

1. Research

- ChatGPT
- YouChat
- Abacus
- Perplexity
- Copilot
- Gemini

2. Image

- Fotor
- Stability AI
- Midjourney
- Microsoft Designer

3. CopyWriting

- Rytr
- Copy AI
- Writesonic
- Adcreative AI

4. Writing

- Jasper
- HIX AI…

Quan Steven reposted

Zhengzhong Tu

@_vztu

Jun 28, 2024

🚨𝐑𝐨𝐛𝐨𝐏𝐨𝐢𝐧𝐭: A Vision-Language Model for Spatial Affordance Prediction for Robotics 🌟𝐏𝐫𝐨𝐣: robo-point.github.io 🚀𝐀𝐛𝐬: arxiv.org/abs/2406.18915 introduce an automatic synthetic data generation pipeline that instruction-tunes VLMs to robotic domains and needs

Quan Steven reposted

Alex Reibman 🖇️

@AlexReibman

Jun 29, 2024

Scraping web data for AI agents sucks. @firecrawl_dev is fixing that. Live demo of Firecrawl turning entire websites into LLM-ready data in seconds w/ @CalebPeffer

Quan Steven reposted

elvis

@omarsar0

Jun 28, 2024

Cool paper proposing a graph-based agent system to enhance the long-context abilities of LLMs. It first structures long text into a graph (elements and facts) and employs an agent to explore the graph using predefined functions guided by a step-by-step rational plan. The agent…

omarsar0's tweet image. Cool paper proposing a graph-based agent system to enhance the long-context abilities of LLMs.

It first structures long text into a graph (elements and facts) and employs an agent to explore the graph using predefined functions guided by a step-by-step rational plan. The agent…

Quan Steven reposted

Rohan Paul

@rohanpaul_ai

Jun 27, 2024

Gemma-2 Paper is out. Shows the power of Knowledge distillation 📌 Knowledge distillation replaces next token prediction with learning from a larger teacher model's output distribution. This simulates training beyond available tokens, providing richer gradients at each step.…

rohanpaul_ai's tweet image. Gemma-2 Paper is out.

Shows the power of Knowledge distillation

📌 Knowledge distillation replaces next token prediction with learning from a larger teacher model's output distribution. This simulates training beyond available tokens, providing richer gradients at each step.…

Quan Steven reposted

Zhengzhong Tu

@_vztu

Jun 28, 2024

🚨Towards Semantic Equivalence of Tokenization in Multimodal LLM 🌟𝐏𝐫𝐨𝐣: chocowu.github.io/SeTok-web/ 🚀𝐀𝐛𝐬: arxiv.org/abs/2406.05127 A novel Vision Tokenizer (SeTok), which groups visual features into semantic units via dynamic clustering

_vztu's tweet image. 🚨Towards Semantic Equivalence of Tokenization in Multimodal LLM
🌟𝐏𝐫𝐨𝐣: chocowu.github.io/SeTok-web/
🚀𝐀𝐛𝐬: arxiv.org/abs/2406.05127

A novel Vision Tokenizer (SeTok), which groups visual features into semantic units via dynamic clustering

Quan Steven reposted

Philipp Schmid

@_philschmid

Jun 29, 2024

Gemini and Gemma are bringing back Knowledge Distillation to Language Models! Gemini and Gemma used “online” Distillation for different pre- & post-training steps. But what is it? 🤔 In Online or on-policy Knowledge Distillation, a student learns from a teacher during training…

_philschmid's tweet image. Gemini and Gemma are bringing back Knowledge Distillation to Language Models! Gemini and Gemma used “online” Distillation for different pre- &amp; post-training steps. But what is it? 🤔

In Online or on-policy Knowledge Distillation, a student learns from a teacher during training…

Quan Steven reposted

Sumanth

@Sumanth_077

Jun 28, 2024

Microsoft launched the best course on Generative AI! The free 18 lesson course is available on Github and will teach you everything you need to know to start building Generative AI applications.

Sumanth_077's tweet image. Microsoft launched the best course on Generative AI!

The free 18 lesson course is available on Github and will teach you everything you need to know to start building Generative AI applications.

Quan Steven reposted

Lewis Tunstall

@_lewtun

Jun 27, 2024

One neat thing I learned from the Gemma 2 report is their use of "on policy distillation" to refine the SFT models before RLHF. The motivation is as follows: - suppose you fine-tune a student model on synthetic data from a larger, more capable teacher like GPT-4, Gemini…

_lewtun's tweet image. One neat thing I learned from the Gemma 2 report is their use of "on policy distillation" to refine the SFT models before RLHF.

The motivation is as follows:

- suppose you fine-tune a student model on synthetic data from a larger, more capable teacher like GPT-4, Gemini…

Quan Steven reposted

David Wessels

@Dafidofff

Jun 28, 2024

🥠 Excited to introduce our latest work on Equivariant Neural Fields (ENFs)! Grounding conditioning variables in geometry 🚀 Paper: arxiv.org/abs/2406.05753 Github: github.com/dafidofff/enf-… Project Page: dafidofff.github.io/enf-jax/ Details below 👇👇

Dafidofff's tweet image. 🥠 Excited to introduce our latest work on Equivariant Neural Fields (ENFs)! Grounding conditioning variables in geometry 🚀

Paper: arxiv.org/abs/2406.05753
Github: github.com/dafidofff/enf-…
Project Page: dafidofff.github.io/enf-jax/

Details below 👇👇

Quan Steven reposted

Chau Minh Pham

@chautmpham

Jun 28, 2024

Long-form text generation with multiple stylistic and semantic constraints remains largely unexplored. We present Suri 🦙: a dataset of 20K long-form texts & LLM-generated, backtranslated instructions with complex constraints. 📎 arxiv.org/abs/2406.19371

chautmpham's tweet image. Long-form text generation with multiple stylistic and semantic constraints remains largely unexplored.

We present Suri 🦙: a dataset of 20K long-form texts &amp; LLM-generated, backtranslated instructions with complex constraints.

📎 arxiv.org/abs/2406.19371

Quan Steven reposted

fly51fly

@fly51fly

Jun 27, 2024

[CL] A Closer Look into Mixture-of-Experts in Large Language Models arxiv.org/abs/2406.18219 - Experts act like fine-grained neurons. The gate embedding determines expert selection while the gate projection matrix controls neuron activation. Their heatmaps are correlated,…

fly51fly's tweet image. [CL] A Closer Look into Mixture-of-Experts in Large Language Models
arxiv.org/abs/2406.18219
- Experts act like fine-grained neurons. The gate embedding determines expert selection while the gate projection matrix controls neuron activation. Their heatmaps are correlated,…

Quan Steven reposted

Yu Lu Liu 🦋@ liuyulu.bsky.social

@liu_yu_lu

Jun 28, 2024

Do NLP benchmark measurements provide meaningful insights about the evaluated models? How valid are these measurements? To help practitioners answer these questions, we introduce ECBD - a conceptual framework that formalizes the process of benchmark design 🧵. #ACL2024 #NLProc

liu_yu_lu's tweet image. Do NLP benchmark measurements provide meaningful insights about the evaluated models? How valid are these measurements? To help practitioners answer these questions, we introduce ECBD - a conceptual framework that formalizes the process of benchmark design 🧵. #ACL2024 #NLProc

Quan Steven reposted

Wenpeng_Yin

@Wenpeng_Yin

Jun 28, 2024

Our latest work, "LLMs Assist NLP Researchers," analyzes LLMs as i) paper reviewers and ii) area chairs. Both quantitative and qualitative comparisons show that LLMs are still far from satisfactory in these expertise-demanding roles. Link: arxiv.org/pdf/2406.16253 👏

Wenpeng_Yin's tweet image. Our latest work, "LLMs Assist NLP Researchers," analyzes LLMs as i) paper reviewers and ii) area chairs. Both quantitative and qualitative comparisons show that LLMs are still far from satisfactory in these expertise-demanding roles. Link: arxiv.org/pdf/2406.16253 👏

Quan Steven reposted

Jindong Wang

@jd92wang

Jun 29, 2024

A collection of our recent work on "LLM agents + X". We are trying to use agents to help research in other domains. We are new and still learning new stuff.

jd92wang's tweet image. A collection of our recent work on "LLM agents + X". We are trying to use agents to help research in other domains. We are new and still learning new stuff.

Quan Steven reposted

Xiuying Wei

@XiuyingWei966

Jun 28, 2024

Curious about making big FFNs in Transformers more efficient? 🧠 We explored training structured matrices up to 1.3B models and found they work well in pre-training too! 🚀 They’re more efficient and have steeper scaling curves, aiding in better architecture design! 🧵1/9