wjzheng_nlp's profile picture. PhD student | Interested in Multimodal Learning | Feel free to connect me. 🧸

Wenjie Zheng

@wjzheng_nlp

PhD student | Interested in Multimodal Learning | Feel free to connect me. 🧸

Wenjie Zheng reposted

🚨New paper!🚨 Still worried about the low quality of your rule-cleaned pre-training corpora? Try 🫐 ProX! 1. Dramatically boosts pre-training corpus quality with a language model that generates executable programs. 2. A 1.7B model, trained on corpus refined by 🫐 ProX with…

SinclairWang1's tweet image. 🚨New paper!🚨
Still worried about the low quality of your rule-cleaned pre-training corpora?

Try 🫐 ProX!

1. Dramatically boosts pre-training corpus quality with a language model that generates executable programs.

2. A 1.7B model, trained on corpus refined by 🫐 ProX with…

🚀 Still relying on human-crafted rules to improve pretraining data? Time to try Programming Every Example(ProX)! Our latest efforts use LMs to refine data with unprecedented accuracy, and brings up to 20x faster training in general and math domain! 👇 Curious about the details?



Wenjie Zheng reposted

[CV] CinePile: A Long Video Question Answering Dataset and Benchmark arxiv.org/abs/2405.08813 - The paper introduces CinePile, a large-scale video question answering dataset with ~305k questions covering temporal comprehension, human-object interactions, reasoning about…

fly51fly's tweet image. [CV] CinePile: A Long Video Question Answering Dataset and Benchmark  
arxiv.org/abs/2405.08813     
- The paper introduces CinePile, a large-scale video question answering dataset with ~305k questions covering temporal comprehension, human-object interactions, reasoning about…
fly51fly's tweet image. [CV] CinePile: A Long Video Question Answering Dataset and Benchmark  
arxiv.org/abs/2405.08813     
- The paper introduces CinePile, a large-scale video question answering dataset with ~305k questions covering temporal comprehension, human-object interactions, reasoning about…
fly51fly's tweet image. [CV] CinePile: A Long Video Question Answering Dataset and Benchmark  
arxiv.org/abs/2405.08813     
- The paper introduces CinePile, a large-scale video question answering dataset with ~305k questions covering temporal comprehension, human-object interactions, reasoning about…

work full of sincerity, welcome everyone to follow. 🥳🥳🥳

🎉🎉🎉So happy to announce that our paper "Ask Again, Then Fail: Large Language Models' Vacillations in Judgement"(w/@SinclairWang1) arxiv.org/abs/2310.02174 was accepted to #ACL2024 main! Here's a quick overview: 👇🧵



Wenjie Zheng reposted

Excited to share our #ACL2024 Findings paper "EmpathicStories++: A Multimodal Dataset for Empathy towards Personal Experiences" 🧵(1/7) Dataset request: mitmedialab.github.io/empathic-stori…

jocelynjshen's tweet image. Excited to share our #ACL2024 Findings paper "EmpathicStories++: A Multimodal Dataset for Empathy towards Personal Experiences" 🧵(1/7)

Dataset request: mitmedialab.github.io/empathic-stori…

Wenjie Zheng reposted

Introducing MetaGPT's Data Interpreter: Open Source and Better "Devin". Data Interpreter has achieved state-of-the-art scores in machine learning, mathematical reasoning, and open-ended tasks, and can analyze stocks, imitate websites, and train models. Data Interpreter is an…

MetaGPT_'s tweet image. Introducing MetaGPT's Data Interpreter: Open Source and Better "Devin".
 
Data Interpreter has achieved state-of-the-art scores in machine learning, mathematical reasoning, and open-ended tasks, and can analyze stocks, imitate websites, and train models.

Data Interpreter is an…
MetaGPT_'s tweet image. Introducing MetaGPT's Data Interpreter: Open Source and Better "Devin".
 
Data Interpreter has achieved state-of-the-art scores in machine learning, mathematical reasoning, and open-ended tasks, and can analyze stocks, imitate websites, and train models.

Data Interpreter is an…
MetaGPT_'s tweet image. Introducing MetaGPT's Data Interpreter: Open Source and Better "Devin".
 
Data Interpreter has achieved state-of-the-art scores in machine learning, mathematical reasoning, and open-ended tasks, and can analyze stocks, imitate websites, and train models.

Data Interpreter is an…
MetaGPT_'s tweet image. Introducing MetaGPT's Data Interpreter: Open Source and Better "Devin".
 
Data Interpreter has achieved state-of-the-art scores in machine learning, mathematical reasoning, and open-ended tasks, and can analyze stocks, imitate websites, and train models.

Data Interpreter is an…

Wenjie Zheng reposted

A Survey on Data Selection for Language Models Presents a comprehensive review of existing literature on data selection methods and related research areas, providing a taxonomy of existing approaches arxiv.org/abs/2402.16827

arankomatsuzaki's tweet image. A Survey on Data Selection for Language Models

Presents a comprehensive review of existing literature on data selection methods and related research areas, providing a taxonomy of existing approaches

arxiv.org/abs/2402.16827

Wenjie Zheng reposted

🔴 stream: YOLO-World Q&A + coding in less than 15 minutes, I start my first YT stream; I'll be talking about YOLO-World and answering your questions that you left under my last YT video stop by to say hello link: youtube.com/live/lF1BtQL16… ↓ some of the topics we will cover


Wenjie Zheng reposted

Thrilled to release our new version (v0.2) of MathPile, a cleaner version through our efforts to fix some issues!🥳 More importantly, we also released commercial-use version, namely MathPile_Commercial(huggingface.co/datasets/GAIR/…)🥳🥳🥳

Generative AI for Math: MathPile Presents a diverse and high-quality math-centric corpus comprising about 9.5B tokens proj: gair-nlp.github.io/MathPile/ repo: github.com/GAIR-NLP/MathP… abs: arxiv.org/abs/2312.17120

arankomatsuzaki's tweet image. Generative AI for Math: MathPile

Presents a diverse and high-quality math-centric corpus comprising about 9.5B tokens

proj: gair-nlp.github.io/MathPile/
repo: github.com/GAIR-NLP/MathP…
abs: arxiv.org/abs/2312.17120


Wenjie Zheng reposted

Quadratic attention has been indispensable for information-dense modalities such as language... until now. Announcing Mamba: a new SSM arch. that has linear-time scaling, ultra long context, and most importantly--outperforms Transformers everywhere we've tried. With @tri_dao 1/

_albertgu's tweet image. Quadratic attention has been indispensable for information-dense modalities such as language... until now.

Announcing Mamba: a new SSM arch. that has linear-time scaling, ultra long context, and most importantly--outperforms Transformers everywhere we've tried.

With @tri_dao 1/

Wenjie Zheng reposted

A team just made OpenAI Whisper 6x faster, 49% smaller, while keeping 99% of the accuracy. The model is already available on the HuggingFace Transformers library: model_id = "distil-whisper/distil-large-v2" You can also use their web UI to transcribe from URLs, files, or…


Wenjie Zheng reposted

How to #RLHF for LLMs: #PPO or #DPO? Introducing #BPO (black-box prompt optimization) to align LLMs without model training. 1) ChatGPT + BPO > ChatGPT 2) GPT-4 + BPO > GPT-4 3) Vicuna + BPO > Vicuna + PPO/DPO 4) Vicuna + DPO + BPO > Vicuna + DPO arxiv.org/pdf/2311.04155…

thukeg's tweet image. How to #RLHF for LLMs: #PPO or #DPO? 
Introducing #BPO (black-box prompt optimization) to align LLMs without model training. 

1) ChatGPT + BPO > ChatGPT
2) GPT-4 + BPO > GPT-4
3) Vicuna + BPO > Vicuna + PPO/DPO
4) Vicuna + DPO + BPO > Vicuna + DPO

arxiv.org/pdf/2311.04155…

Wenjie Zheng reposted

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration paper page: huggingface.co/papers/2311.04… Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods…

_akhaliq's tweet image. mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

paper page: huggingface.co/papers/2311.04…

Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods…

Wenjie Zheng reposted

TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models paper page: huggingface.co/papers/2311.04… Despite Multi-modal Large Language Models (MM-LLMs) have made exciting strides recently, they are still struggling to efficiently model the interactions among multi-modal…

_akhaliq's tweet image. TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

paper page: huggingface.co/papers/2311.04…

Despite Multi-modal Large Language Models (MM-LLMs) have made exciting strides recently, they are still struggling to efficiently model the interactions among multi-modal…

Wenjie Zheng reposted

🚨 Unveiling GPT-4V(ision)'s mind! We're breaking down how even the brightest Visual Language Models get it wrong! With our new 'Bingo' benchmark, we shed light on the two common types of hallucinations in GPT-4V(ision): bias and interference. Led by @cuichenhang @AiYiyangZ

HuaxiuYaoML's tweet image. 🚨 Unveiling GPT-4V(ision)'s mind! We're breaking down how even the brightest Visual Language Models get it wrong!

With our new 'Bingo' benchmark, we shed light on the two common types of hallucinations in GPT-4V(ision): bias and interference.

Led by @cuichenhang @AiYiyangZ

Wenjie Zheng reposted

CogVLM: Visual Expert for Pretrained Language Models paper page: huggingface.co/papers/2311.03… introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language…

_akhaliq's tweet image. CogVLM: Visual Expert for Pretrained Language Models

paper page: huggingface.co/papers/2311.03…

introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language…

Wenjie Zheng reposted

OtterHD: A High-Resolution Multi-modality Model paper page: huggingface.co/papers/2311.04… present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision. Unlike conventional models…

_akhaliq's tweet image. OtterHD: A High-Resolution Multi-modality Model

paper page: huggingface.co/papers/2311.04…

present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision. Unlike conventional models…

Wenjie Zheng reposted

We're rolling out new features and improvements that developers have been asking for: 1. Our new model GPT-4 Turbo supports 128K context and has fresher knowledge than GPT-4. Its input and output tokens are respectively 3× and 2× less expensive than GPT-4. It’s available now to…


Wenjie Zheng reposted

FlashDecoding++: Faster Large Language Model Inference on GPUs paper page: huggingface.co/papers/2311.01… As the Large Language Model (LLM) becomes increasingly important in various domains. However, the following challenges still remain unsolved in accelerating LLM inference: (1)…

_akhaliq's tweet image. FlashDecoding++: Faster Large Language Model Inference on GPUs

paper page: huggingface.co/papers/2311.01…

As the Large Language Model (LLM) becomes increasingly important in various domains. However, the following challenges still remain unsolved in accelerating LLM inference: (1)…

Wenjie Zheng reposted

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing paper page: huggingface.co/papers/2311.00… LLaVA-Interactive is a research prototype for multimodal human-AI interaction. The system can have multi-turn dialogues with human users by taking…

_akhaliq's tweet image. LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

paper page: huggingface.co/papers/2311.00…

LLaVA-Interactive is a research prototype for multimodal human-AI interaction. The system can have multi-turn dialogues with human users by taking…

Wenjie Zheng reposted

MM-VID: Advancing Video Understanding with GPT-4V(ision) paper page: huggingface.co/papers/2310.19… present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding.…

_akhaliq's tweet image. MM-VID: Advancing Video Understanding with GPT-4V(ision)

paper page: huggingface.co/papers/2310.19…

present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding.…

Loading...

Something went wrong.


Something went wrong.