Wenjie Zheng

@wjzheng_nlp

PhD student | Interested in Multimodal Learning | Feel free to connect me. 🧸

scholar.google.com/citations?user…

Joined October 2022

44Posts 34Followers 98Following

You might like

@ZenMoore1

@huybery

@YifeiLiPKU

@xc_feng

@LyxTg

@XiaochuangHan

@ChulinXie

@XinyiZhouXZ

Wenjie Zheng reposted

Zengzhi Wang

@SinclairWang1

Sep 26, 2024

🚨New paper!🚨 Still worried about the low quality of your rule-cleaned pre-training corpora? Try 🫐 ProX! 1. Dramatically boosts pre-training corpus quality with a language model that generates executable programs. 2. A 1.7B model, trained on corpus refined by 🫐 ProX with…

SinclairWang1's tweet image. 🚨New paper!🚨
Still worried about the low quality of your rule-cleaned pre-training corpora?

Try 🫐 ProX!

1. Dramatically boosts pre-training corpus quality with a language model that generates executable programs.

2. A 1.7B model, trained on corpus refined by 🫐 ProX with…

Fan Zhou

@FaZhou_998

Sep 26, 2024

🚀 Still relying on human-crafted rules to improve pretraining data? Time to try Programming Every Example(ProX)! Our latest efforts use LMs to refine data with unprecedented accuracy, and brings up to 20x faster training in general and math domain! 👇 Curious about the details?

Wenjie Zheng reposted

fly51fly

@fly51fly

May 15, 2024

[CV] CinePile: A Long Video Question Answering Dataset and Benchmark arxiv.org/abs/2405.08813 - The paper introduces CinePile, a large-scale video question answering dataset with ~305k questions covering temporal comprehension, human-object interactions, reasoning about…

fly51fly's tweet image. [CV] CinePile: A Long Video Question Answering Dataset and Benchmark
arxiv.org/abs/2405.08813
- The paper introduces CinePile, a large-scale video question answering dataset with ~305k questions covering temporal comprehension, human-object interactions, reasoning about…

Wenjie Zheng

@wjzheng_nlp

May 17, 2024

work full of sincerity, welcome everyone to follow. 🥳🥳🥳

Qiming Xie

@grayground_x

May 17, 2024

🎉🎉🎉So happy to announce that our paper "Ask Again, Then Fail: Large Language Models' Vacillations in Judgement"(w/@SinclairWang1) arxiv.org/abs/2310.02174 was accepted to #ACL2024 main! Here's a quick overview: 👇🧵

Wenjie Zheng reposted

Jocelyn Shen

@jocelynjshen

May 16, 2024

Excited to share our #ACL2024 Findings paper "EmpathicStories++: A Multimodal Dataset for Empathy towards Personal Experiences" 🧵(1/7) Dataset request: mitmedialab.github.io/empathic-stori…

jocelynjshen's tweet image. Excited to share our #ACL2024 Findings paper "EmpathicStories++: A Multimodal Dataset for Empathy towards Personal Experiences" 🧵(1/7)

Dataset request: mitmedialab.github.io/empathic-stori…

Wenjie Zheng reposted

MetaGPT

@MetaGPT_

Mar 13, 2024

Introducing MetaGPT's Data Interpreter: Open Source and Better "Devin". Data Interpreter has achieved state-of-the-art scores in machine learning, mathematical reasoning, and open-ended tasks, and can analyze stocks, imitate websites, and train models. Data Interpreter is an…

MetaGPT_'s tweet image. Introducing MetaGPT's Data Interpreter: Open Source and Better "Devin".

Data Interpreter has achieved state-of-the-art scores in machine learning, mathematical reasoning, and open-ended tasks, and can analyze stocks, imitate websites, and train models.

Data Interpreter is an…

Wenjie Zheng reposted

Aran Komatsuzaki

@arankomatsuzaki

Feb 27, 2024

A Survey on Data Selection for Language Models Presents a comprehensive review of existing literature on data selection methods and related research areas, providing a taxonomy of existing approaches arxiv.org/abs/2402.16827

arankomatsuzaki's tweet image. A Survey on Data Selection for Language Models

Presents a comprehensive review of existing literature on data selection methods and related research areas, providing a taxonomy of existing approaches

arxiv.org/abs/2402.16827

Wenjie Zheng reposted

SkalskiP

@skalskip92

Feb 27, 2024

🔴 stream: YOLO-World Q&A + coding in less than 15 minutes, I start my first YT stream; I'll be talking about YOLO-World and answering your questions that you left under my last YT video stop by to say hello link: youtube.com/live/lF1BtQL16… ↓ some of the topics we will cover

Wenjie Zheng reposted

Zengzhi Wang

@SinclairWang1

Jan 6, 2024

Thrilled to release our new version (v0.2) of MathPile, a cleaner version through our efforts to fix some issues!🥳 More importantly, we also released commercial-use version, namely MathPile_Commercial(huggingface.co/datasets/GAIR/…)🥳🥳🥳

GAIR/MathPile_Commercial · Datasets at Hugging Face

Source: huggingface.co

Aran Komatsuzaki

@arankomatsuzaki

Dec 29, 2023

Generative AI for Math: MathPile Presents a diverse and high-quality math-centric corpus comprising about 9.5B tokens proj: gair-nlp.github.io/MathPile/ repo: github.com/GAIR-NLP/MathP… abs: arxiv.org/abs/2312.17120

arankomatsuzaki's tweet image. Generative AI for Math: MathPile

Presents a diverse and high-quality math-centric corpus comprising about 9.5B tokens

proj: gair-nlp.github.io/MathPile/
repo: github.com/GAIR-NLP/MathP…
abs: arxiv.org/abs/2312.17120

Wenjie Zheng reposted

Albert Gu

@_albertgu

Dec 4, 2023

Quadratic attention has been indispensable for information-dense modalities such as language... until now. Announcing Mamba: a new SSM arch. that has linear-time scaling, ultra long context, and most importantly--outperforms Transformers everywhere we've tried. With @tri_dao 1/

_albertgu's tweet image. Quadratic attention has been indispensable for information-dense modalities such as language... until now.

Announcing Mamba: a new SSM arch. that has linear-time scaling, ultra long context, and most importantly--outperforms Transformers everywhere we've tried.

With @tri_dao 1/

Wenjie Zheng reposted

Lior Alexander

@LiorOnAI

Nov 14, 2023

A team just made OpenAI Whisper 6x faster, 49% smaller, while keeping 99% of the accuracy. The model is already available on the HuggingFace Transformers library: model_id = "distil-whisper/distil-large-v2" You can also use their web UI to transcribe from URLs, files, or…

Wenjie Zheng reposted

Tsinghua KEG (THUDM)

@thukeg

Nov 8, 2023

How to #RLHF for LLMs: #PPO or #DPO? Introducing #BPO (black-box prompt optimization) to align LLMs without model training. 1) ChatGPT + BPO > ChatGPT 2) GPT-4 + BPO > GPT-4 3) Vicuna + BPO > Vicuna + PPO/DPO 4) Vicuna + DPO + BPO > Vicuna + DPO arxiv.org/pdf/2311.04155…

thukeg's tweet image. How to #RLHF for LLMs: #PPO or #DPO?
Introducing #BPO (black-box prompt optimization) to align LLMs without model training.

1) ChatGPT + BPO &gt; ChatGPT
2) GPT-4 + BPO &gt; GPT-4
3) Vicuna + BPO &gt; Vicuna + PPO/DPO
4) Vicuna + DPO + BPO &gt; Vicuna + DPO

arxiv.org/pdf/2311.04155…

Wenjie Zheng reposted

AK

@_akhaliq

Nov 9, 2023

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration paper page: huggingface.co/papers/2311.04… Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods…

_akhaliq's tweet image. mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

paper page: huggingface.co/papers/2311.04…

Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods…

Wenjie Zheng reposted

AK

@_akhaliq

Nov 9, 2023

TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models paper page: huggingface.co/papers/2311.04… Despite Multi-modal Large Language Models (MM-LLMs) have made exciting strides recently, they are still struggling to efficiently model the interactions among multi-modal…

_akhaliq's tweet image. TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

paper page: huggingface.co/papers/2311.04…

Despite Multi-modal Large Language Models (MM-LLMs) have made exciting strides recently, they are still struggling to efficiently model the interactions among multi-modal…

Wenjie Zheng reposted

Huaxiu Yao

@HuaxiuYaoML

Nov 7, 2023

🚨 Unveiling GPT-4V(ision)'s mind! We're breaking down how even the brightest Visual Language Models get it wrong! With our new 'Bingo' benchmark, we shed light on the two common types of hallucinations in GPT-4V(ision): bias and interference. Led by @cuichenhang @AiYiyangZ

HuaxiuYaoML's tweet image. 🚨 Unveiling GPT-4V(ision)'s mind! We're breaking down how even the brightest Visual Language Models get it wrong!

With our new 'Bingo' benchmark, we shed light on the two common types of hallucinations in GPT-4V(ision): bias and interference.

Led by @cuichenhang @AiYiyangZ

Wenjie Zheng reposted

AK

@_akhaliq

Nov 7, 2023

CogVLM: Visual Expert for Pretrained Language Models paper page: huggingface.co/papers/2311.03… introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language…

_akhaliq's tweet image. CogVLM: Visual Expert for Pretrained Language Models

paper page: huggingface.co/papers/2311.03…

introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language…

Wenjie Zheng reposted

AK

@_akhaliq

Nov 8, 2023

OtterHD: A High-Resolution Multi-modality Model paper page: huggingface.co/papers/2311.04… present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision. Unlike conventional models…

_akhaliq's tweet image. OtterHD: A High-Resolution Multi-modality Model

paper page: huggingface.co/papers/2311.04…

present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision. Unlike conventional models…

Wenjie Zheng reposted

OpenAI

@OpenAI

Nov 6, 2023

We're rolling out new features and improvements that developers have been asking for: 1. Our new model GPT-4 Turbo supports 128K context and has fresher knowledge than GPT-4. Its input and output tokens are respectively 3× and 2× less expensive than GPT-4. It’s available now to…

Wenjie Zheng reposted

AK

@_akhaliq

Nov 3, 2023

FlashDecoding++: Faster Large Language Model Inference on GPUs paper page: huggingface.co/papers/2311.01… As the Large Language Model (LLM) becomes increasingly important in various domains. However, the following challenges still remain unsolved in accelerating LLM inference: (1)…

_akhaliq's tweet image. FlashDecoding++: Faster Large Language Model Inference on GPUs

paper page: huggingface.co/papers/2311.01…

As the Large Language Model (LLM) becomes increasingly important in various domains. However, the following challenges still remain unsolved in accelerating LLM inference: (1)…

Wenjie Zheng reposted

AK

@_akhaliq

Nov 2, 2023

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing paper page: huggingface.co/papers/2311.00… LLaVA-Interactive is a research prototype for multimodal human-AI interaction. The system can have multi-turn dialogues with human users by taking…

_akhaliq's tweet image. LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

paper page: huggingface.co/papers/2311.00…

LLaVA-Interactive is a research prototype for multimodal human-AI interaction. The system can have multi-turn dialogues with human users by taking…

Wenjie Zheng reposted

AK

@_akhaliq

Oct 31, 2023

MM-VID: Advancing Video Understanding with GPT-4V(ision) paper page: huggingface.co/papers/2310.19… present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding.…