robot_steven_07's profile picture.

Quan Steven

@robot_steven_07

Quan Steven reposted

🚨This week’s top AI/ML research paper (big week!): LM/Transformers - Mixture of A Million Experts - Vision language models are blind - Learning to (Learn at Test Time) - PaliGemma - Arena Learning (WizardLM 2) - FBI-LLM (first Large bit-based LLM!) - Understanding Transformers…

TheAITimeline's tweet image. 🚨This week’s top AI/ML research paper (big week!):

LM/Transformers
- Mixture of A Million Experts
- Vision language models are blind
- Learning to (Learn at Test Time)
- PaliGemma
- Arena Learning (WizardLM 2)
- FBI-LLM (first Large bit-based LLM!)
- Understanding Transformers…

Quan Steven reposted

ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction - Use Protein Chain of Thought (ProCoT) to simulate signaling pathways with ProtTrans embeddings and step-by-step reasoning chains. - Convert the Mol dataset into prompt-answer pairs for…

LeoTZ03's tweet image. ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction
- Use Protein Chain of Thought (ProCoT) to simulate signaling pathways with ProtTrans embeddings and step-by-step reasoning chains.
- Convert the Mol dataset into prompt-answer pairs for…

Quan Steven reposted

Pseudo-perplexity in One Fell Swoop for Protein Fitness Estimation - Use ESM-2-650M embeddings from an unmasked sequence to predict masked one-at-a-time probability vectors (by MLP), reducing the need for multiple forward passes - Combine OFS pseudo-perplexity technique within an…

LeoTZ03's tweet image. Pseudo-perplexity in One Fell Swoop for Protein Fitness Estimation
- Use ESM-2-650M embeddings from an unmasked sequence to predict masked one-at-a-time probability vectors (by MLP), reducing the need for multiple forward passes
- Combine OFS pseudo-perplexity technique within an…

Quan Steven reposted

[CL] Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models arxiv.org/abs/2407.07035 - Vision-and-language navigation (VLN) is gaining increasing research attention with the rise of foundation models like BERT, GPT-3, and CLIP. This…

fly51fly's tweet image. [CL] Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models  
arxiv.org/abs/2407.07035     
- Vision-and-language navigation (VLN) is gaining increasing research attention with the rise of foundation models like BERT, GPT-3, and CLIP. This…
fly51fly's tweet image. [CL] Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models  
arxiv.org/abs/2407.07035     
- Vision-and-language navigation (VLN) is gaining increasing research attention with the rise of foundation models like BERT, GPT-3, and CLIP. This…

Quan Steven reposted

100 AI Tools to replace your tedious work: 1. Research - ChatGPT - YouChat - Abacus - Perplexity - Copilot - Gemini 2. Image - Fotor - Stability AI - Midjourney - Microsoft Designer 3. CopyWriting - Rytr - Copy AI - Writesonic - Adcreative AI 4. Writing - Jasper - HIX AI…

HeyAbhishekk's tweet image. 100 AI Tools to replace your tedious work:

1. Research

- ChatGPT
- YouChat
- Abacus
- Perplexity
- Copilot
- Gemini

2. Image

- Fotor
- Stability AI
- Midjourney
- Microsoft Designer

3. CopyWriting

- Rytr
- Copy AI
- Writesonic
- Adcreative AI

4. Writing

- Jasper
- HIX AI…

Quan Steven reposted

🚨𝐑𝐨𝐛𝐨𝐏𝐨𝐢𝐧𝐭: A Vision-Language Model for Spatial Affordance Prediction for Robotics 🌟𝐏𝐫𝐨𝐣: robo-point.github.io 🚀𝐀𝐛𝐬: arxiv.org/abs/2406.18915 introduce an automatic synthetic data generation pipeline that instruction-tunes VLMs to robotic domains and needs


Quan Steven reposted

Scraping web data for AI agents sucks. @firecrawl_dev is fixing that. Live demo of Firecrawl turning entire websites into LLM-ready data in seconds w/ @CalebPeffer


Quan Steven reposted

Cool paper proposing a graph-based agent system to enhance the long-context abilities of LLMs. It first structures long text into a graph (elements and facts) and employs an agent to explore the graph using predefined functions guided by a step-by-step rational plan. The agent…

omarsar0's tweet image. Cool paper proposing a graph-based agent system to enhance the long-context abilities of LLMs. 

It first structures long text into a graph (elements and facts) and employs an agent to explore the graph using predefined functions guided by a step-by-step rational plan. The agent…

Quan Steven reposted

Gemma-2 Paper is out. Shows the power of Knowledge distillation 📌 Knowledge distillation replaces next token prediction with learning from a larger teacher model's output distribution. This simulates training beyond available tokens, providing richer gradients at each step.…

rohanpaul_ai's tweet image. Gemma-2 Paper is out.

Shows the power of Knowledge distillation

📌 Knowledge distillation replaces next token prediction with learning from a larger teacher model's output distribution. This simulates training beyond available tokens, providing richer gradients at each step.…

Quan Steven reposted

🚨Towards Semantic Equivalence of Tokenization in Multimodal LLM 🌟𝐏𝐫𝐨𝐣: chocowu.github.io/SeTok-web/ 🚀𝐀𝐛𝐬: arxiv.org/abs/2406.05127 A novel Vision Tokenizer (SeTok), which groups visual features into semantic units via dynamic clustering

_vztu's tweet image. 🚨Towards Semantic Equivalence of Tokenization in Multimodal LLM 
🌟𝐏𝐫𝐨𝐣: chocowu.github.io/SeTok-web/
🚀𝐀𝐛𝐬: arxiv.org/abs/2406.05127

A novel Vision Tokenizer (SeTok), which groups visual features into semantic units via dynamic clustering

Quan Steven reposted

Gemini and Gemma are bringing back Knowledge Distillation to Language Models! Gemini and Gemma used “online” Distillation for different pre- & post-training steps. But what is it? 🤔 In Online or on-policy Knowledge Distillation, a student learns from a teacher during training…

_philschmid's tweet image. Gemini and Gemma are bringing back Knowledge Distillation to Language Models! Gemini and Gemma used “online” Distillation for different pre- & post-training steps. But what is it? 🤔

In Online or on-policy Knowledge Distillation, a student learns from a teacher during training…

Quan Steven reposted

Microsoft launched the best course on Generative AI! The free 18 lesson course is available on Github and will teach you everything you need to know to start building Generative AI applications.

Sumanth_077's tweet image. Microsoft launched the best course on Generative AI!

The free 18 lesson course is available on Github and will teach you everything you need to know to start building Generative AI applications.

Quan Steven reposted

One neat thing I learned from the Gemma 2 report is their use of "on policy distillation" to refine the SFT models before RLHF. The motivation is as follows: - suppose you fine-tune a student model on synthetic data from a larger, more capable teacher like GPT-4, Gemini…

_lewtun's tweet image. One neat thing I learned from the Gemma 2 report is their use of "on policy distillation" to refine the SFT models before RLHF. 

The motivation is as follows: 

- suppose you fine-tune a student model on synthetic data from a larger, more capable teacher like GPT-4, Gemini…

Quan Steven reposted

🥠 Excited to introduce our latest work on Equivariant Neural Fields (ENFs)! Grounding conditioning variables in geometry 🚀 Paper: arxiv.org/abs/2406.05753 Github: github.com/dafidofff/enf-… Project Page: dafidofff.github.io/enf-jax/ Details below 👇👇

Dafidofff's tweet image. 🥠 Excited to introduce our latest work on Equivariant Neural Fields (ENFs)! Grounding conditioning variables in geometry 🚀

Paper: arxiv.org/abs/2406.05753
Github: github.com/dafidofff/enf-…
Project Page: dafidofff.github.io/enf-jax/

Details below 👇👇

Quan Steven reposted

Long-form text generation with multiple stylistic and semantic constraints remains largely unexplored. We present Suri 🦙: a dataset of 20K long-form texts & LLM-generated, backtranslated instructions with complex constraints. 📎 arxiv.org/abs/2406.19371

chautmpham's tweet image. Long-form text generation with multiple stylistic and semantic constraints remains largely unexplored.

We present Suri 🦙: a dataset of 20K long-form texts & LLM-generated, backtranslated instructions with complex constraints.

📎 arxiv.org/abs/2406.19371

Quan Steven reposted

[CL] A Closer Look into Mixture-of-Experts in Large Language Models arxiv.org/abs/2406.18219 - Experts act like fine-grained neurons. The gate embedding determines expert selection while the gate projection matrix controls neuron activation. Their heatmaps are correlated,…

fly51fly's tweet image. [CL] A Closer Look into Mixture-of-Experts in Large Language Models  
arxiv.org/abs/2406.18219     
- Experts act like fine-grained neurons. The gate embedding determines expert selection while the gate projection matrix controls neuron activation. Their heatmaps are correlated,…
fly51fly's tweet image. [CL] A Closer Look into Mixture-of-Experts in Large Language Models  
arxiv.org/abs/2406.18219     
- Experts act like fine-grained neurons. The gate embedding determines expert selection while the gate projection matrix controls neuron activation. Their heatmaps are correlated,…
fly51fly's tweet image. [CL] A Closer Look into Mixture-of-Experts in Large Language Models  
arxiv.org/abs/2406.18219     
- Experts act like fine-grained neurons. The gate embedding determines expert selection while the gate projection matrix controls neuron activation. Their heatmaps are correlated,…

Quan Steven reposted

Do NLP benchmark measurements provide meaningful insights about the evaluated models? How valid are these measurements? To help practitioners answer these questions, we introduce ECBD - a conceptual framework that formalizes the process of benchmark design 🧵. #ACL2024 #NLProc

liu_yu_lu's tweet image. Do NLP benchmark measurements provide meaningful insights about the evaluated models? How valid are these measurements? To help practitioners answer these questions, we introduce ECBD - a conceptual framework that formalizes the process of benchmark design 🧵. #ACL2024 #NLProc

Quan Steven reposted

Our latest work, "LLMs Assist NLP Researchers," analyzes LLMs as i) paper reviewers and ii) area chairs. Both quantitative and qualitative comparisons show that LLMs are still far from satisfactory in these expertise-demanding roles. Link: arxiv.org/pdf/2406.16253 👏

Wenpeng_Yin's tweet image. Our latest work, "LLMs Assist NLP Researchers," analyzes LLMs as i) paper reviewers and ii) area chairs. Both quantitative and qualitative comparisons show that LLMs are still far from satisfactory in these expertise-demanding roles. Link: arxiv.org/pdf/2406.16253  👏

Quan Steven reposted

A collection of our recent work on "LLM agents + X". We are trying to use agents to help research in other domains. We are new and still learning new stuff.

jd92wang's tweet image. A collection of our recent work on "LLM agents + X". We are trying to use agents to help research in other domains. We are new and still learning new stuff.

Quan Steven reposted

Curious about making big FFNs in Transformers more efficient? 🧠 We explored training structured matrices up to 1.3B models and found they work well in pre-training too! 🚀 They’re more efficient and have steeper scaling curves, aiding in better architecture design! 🧵1/9

XiuyingWei966's tweet image. Curious about making big FFNs in Transformers more efficient? 🧠 We explored training structured matrices up to 1.3B models and found they work well in pre-training too! 🚀 They’re more efficient and have steeper scaling curves, aiding in better architecture design! 🧵1/9

United States Trends

Loading...

Something went wrong.


Something went wrong.