Minseong Bae

@kylebae1017

M.S. Student in KAIST @MLVLab. Interested in #AI4Science.

kylebae1017.github.io

Joined June 2024

22Posts 7Followers 149Following

Minseong Bae reposted

Jia-Bin Huang

@jbhuang0604

Nov 6

Diffusion language models are making a splash (again)! To learn more about this fascinating topic, check out ⏩ my video tutorial (and references within): youtu.be/8BTOoc0yDVA ⏩discrete diffusion reading group: @diffusion_llms

jbhuang0604's tweet card. Diffusion Language Models: The Next Big Shift in GenAI

youtube.com

YouTube

Diffusion Language Models: The Next Big Shift in GenAI

Source: youtube.com

Minseong Bae reposted

Jorge Bravo Abad

@bravo_abad

Nov 6

Atomically accurate de novo antibody design with diffusion models Antibodies recognize specific molecular “patches” on proteins—epitopes. Modern therapeutics depend heavily on finding antibodies that bind exactly the right patch. But today, we still obtain them mostly through…

bravo_abad's tweet image. Atomically accurate de novo antibody design with diffusion models

Antibodies recognize specific molecular “patches” on proteins—epitopes. Modern therapeutics depend heavily on finding antibodies that bind exactly the right patch. But today, we still obtain them mostly through…

Minseong Bae reposted

GLADIA Research Lab

@GladiaLab

Oct 27

LLMs are injective and invertible. In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space. (1/6)

GladiaLab's tweet image. LLMs are injective and invertible.

In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space.

(1/6)

Minseong Bae reposted

Rishabh Anand

@rishabh16_

Oct 27

Recently gave a talk at @FlagshipPioneer on how far we've come in AI methods for RNA design, and where the field is potentially heading. Shared a similar (shorter) version with Rhiju Das' group at Stanford BioE this summer. Lots of fun convos thereafter 🤔 The talk contains some…

rishabh16_'s tweet image. Recently gave a talk at @FlagshipPioneer on how far we've come in AI methods for RNA design, and where the field is potentially heading. Shared a similar (shorter) version with Rhiju Das' group at Stanford BioE this summer. Lots of fun convos thereafter 🤔

The talk contains some…

Minseong Bae reposted

Hannes Stärk

@HannesStaerk

Oct 26

Excited to release BoltzGen which brings SOTA folding performance to binder design! The best part of this project has been collaborating with many leading biologists who tested BoltzGen at an unprecedented scale, showing success on many novel targets and pushing its limits! 🧵..

HannesStaerk's tweet image. Excited to release BoltzGen which brings SOTA folding performance to binder design! The best part of this project has been collaborating with many leading biologists who tested BoltzGen at an unprecedented scale, showing success on many novel targets and pushing its limits! 🧵..

Minseong Bae reposted

Andrej Karpathy

@karpathy

Oct 20

Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've…

Nathan Barry

@nathanbarrydev

Oct 20

BERT is just a Single Text Diffusion Step! (1/n) When I first read about language diffusion models, I was surprised to find that their training objective was just a generalization of masked language modeling (MLM), something we’ve been doing since BERT from 2018. The first…

Minseong Bae reposted

Diego del Alamo

@DdelAlamo

Aug 17

Anyone know what program & settings was used to make these figs? I assume it isn't pymol or chimera

Asimov Press

@AsimovPress

Aug 16

Protein designers have long dreamed of building new enzymes from scratch. Unfortunately, enzymes are highly dynamic; they move around a lot and have intermediate states that are difficult to design computationally. But a recent paper from David Baker’s group at the Institute…

AsimovPress's tweet image. Protein designers have long dreamed of building new enzymes from scratch.

Unfortunately, enzymes are highly dynamic; they move around a lot and have intermediate states that are difficult to design computationally.

But a recent paper from David Baker’s group at the Institute…

Minseong Bae reposted

Sander Dieleman

@sedielem

Oct 10

In diffusion LMs, discrete methods have all but displaced continuous ones (🥲). Interesting new trend: why not both? Use continuous methods to make discrete diffusion better. Diffusion duality: arxiv.org/abs/2506.10892 CADD: arxiv.org/abs/2510.01329 CCDD: arxiv.org/abs/2510.03206

Sander Dieleman

@sedielem

Aug 19

New survey on diffusion language models: arxiv.org/abs/2508.10875 (via @NicolasPerezNi1). Covers pre/post-training, inference and multimodality, with very nice illustrations. I can't help but feel a bit wistful about the apparent extinction of the continuous approach after 2023🥲

sedielem's tweet image. New survey on diffusion language models: arxiv.org/abs/2508.10875 (via @NicolasPerezNi1). Covers pre/post-training, inference and multimodality, with very nice illustrations.

I can't help but feel a bit wistful about the apparent extinction of the continuous approach after 2023🥲

Minseong Bae reposted

Biology+AI Daily

@BiologyAIDaily

Sep 30

A Genetic Algorithm for Navigating Synthesizable Molecular Spaces 1. This paper introduces SynGA, a genetic algorithm designed to operate directly over synthesis routes, ensuring that all generated molecules are synthesizable. This is a significant step forward in addressing the…

BiologyAIDaily's tweet image. A Genetic Algorithm for Navigating Synthesizable Molecular Spaces

1. This paper introduces SynGA, a genetic algorithm designed to operate directly over synthesis routes, ensuring that all generated molecules are synthesizable. This is a significant step forward in addressing the…

Minseong Bae reposted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

Sep 24

SimpleFold: Folding Proteins is Simpler than You Think "we introduce SimpleFold, the first flow-matching based protein folding model that solely uses general purpose transformer blocks. Protein folding models typically employ computationally expensive modules involving…

iScienceLuvr's tweet image. SimpleFold: Folding Proteins is Simpler than You Think

"we introduce SimpleFold, the first flow-matching based protein folding model that solely uses general purpose transformer blocks. Protein folding models typically employ computationally expensive modules involving…

Minseong Bae reposted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

Aug 22

Intern-S1: A Scientific Multimodal Foundation Model "Intern-S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T tokens, including over 2.5T tokens from scientific domains. In the…

iScienceLuvr's tweet image. Intern-S1: A Scientific Multimodal Foundation Model

"Intern-S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T tokens, including over 2.5T tokens from scientific domains. In the…

Minseong Bae reposted

Kyle Tretina, Ph.D.

@AllThingsApx

Jul 31

TempRe reframes retrosynthesis as template generation: a seq-to-seq Transformer predicts SMARTS templates token-by-token (P2T) and even autowrites full multi-step routes (Direct TempRe) #GenAI #DrugDiscovery

AllThingsApx's tweet image. TempRe reframes retrosynthesis as template generation:

a seq-to-seq Transformer predicts SMARTS templates token-by-token (P2T) and even autowrites full multi-step routes (Direct TempRe) #GenAI #DrugDiscovery

Minseong Bae reposted

Biology+AI Daily

@BiologyAIDaily

Jun 3

MolTextNet: A Two-Million Molecule-Text Dataset for Multimodal Molecular Learning １．MolTextNet introduces a dataset of 2.5 million molecule-text pairs, designed to support large-scale multimodal learning between molecular graphs and natural language. Each entry provides…

BiologyAIDaily's tweet image. MolTextNet: A Two-Million Molecule-Text Dataset for Multimodal Molecular Learning

１．MolTextNet introduces a dataset of 2.5 million molecule-text pairs, designed to support large-scale multimodal learning between molecular graphs and natural language. Each entry provides…

Minseong Bae reposted

Microsoft Research

@MSFTResearch

Jun 19

Chris Bishop, Technical Fellow and Director of Microsoft Research AI for Science, explains how Microsoft researchers recently achieved a milestone in solving a grand challenge that has hampered scientists and slowed innovation for decades. Watch the video: msft.it/6019SQtUX

MSFTResearch's tweet card. In this video, Microsoft’s Chris Bishop, Technical Fellow and Director of Microsoft Research AI for Science, explains how Microsoft researchers achieved a breakthrough in the accuracy of density...

What is Density Functional Theory (DFT)? - Microsoft Research

Source: microsoft.com

Minseong Bae reposted

Greg Brockman

@gdb

May 23

ChatGPT now can analyze, manipulate, and visualize molecules and chemical information via the RDKit library. Useful for scientific work across health, biology, and chemistry.

gdb's tweet image. ChatGPT now can analyze, manipulate, and visualize molecules and chemical information via the RDKit library.

Useful for scientific work across health, biology, and chemistry.

Minseong Bae reposted

Nathan C. Frey

@nc_frey

May 19

The two most common questions I get via cold email are 1) what should I work on, and 2) how do I get a job doing research, engineering, etc.? I wrote a new post summarizing my advice for people trying to enter Biology+ML, from undergrads to early career. 👇 (link below)

nc_frey's tweet image. The two most common questions I get via cold email are 1) what should I work on, and 2) how do I get a job doing research, engineering, etc.?

I wrote a new post summarizing my advice for people trying to enter Biology+ML, from undergrads to early career.

👇 (link below)

Minseong Bae reposted

Peter Holderrieth

@peholderrieth

Apr 25

Come to our oral presentation on Generator Matching at ICLR 2025 tomorrow (Saturday). Learn about a generative model that works for any data type and Markov process! Oral: 3:30pm (Peridot 202-203, session 6E) Poster: 10am-12:30pm #172 (Hall 3 + Hall 2B) arxiv.org/abs/2410.20587

peholderrieth's tweet image. Come to our oral presentation on Generator Matching at ICLR 2025 tomorrow (Saturday). Learn about a generative model that works for any data type and Markov process!

Oral: 3:30pm (Peridot 202-203, session 6E)
Poster: 10am-12:30pm #172 (Hall 3 + Hall 2B)

arxiv.org/abs/2410.20587

Minseong Bae reposted

Slater Stich

@slaterstich

Apr 14

Very excited to share our interview with @DrYangSong. This is Part 2 of our history of diffusion series — score matching, the SDE/ODE interpretation, consistency models, and more. Enjoy!

Minseong Bae reposted

Rafeeque Mavoor

@rafeequemavoor

Apr 13

Free alternative for ChemDraw! Meet MolDraw. 100% free to use!

Minseong Bae reposted

Sam Rodriques

@SGRodriques

Jan 6

Today in the @timmermanreport -- why we need natural language, and not just DNA, RNA, and protein, to represent complex concepts in biology: timmermanreport.com/2025/01/ai-nee… @ldtimmerman has the best blog in the world for biotech. Amazing to be publishing something with him!