kylebae1017's profile picture. M.S. Student in KAIST @MLVLab. Interested in #AI4Science.

Minseong Bae

@kylebae1017

M.S. Student in KAIST @MLVLab. Interested in #AI4Science.

Minseong Bae reposted

Diffusion language models are making a splash (again)! To learn more about this fascinating topic, check out ⏩ my video tutorial (and references within): youtu.be/8BTOoc0yDVA ⏩discrete diffusion reading group: @diffusion_llms

jbhuang0604's tweet card. Diffusion Language Models: The Next Big Shift in GenAI

youtube.com

YouTube

Diffusion Language Models: The Next Big Shift in GenAI


Minseong Bae reposted

Atomically accurate de novo antibody design with diffusion models Antibodies recognize specific molecular “patches” on proteins—epitopes. Modern therapeutics depend heavily on finding antibodies that bind exactly the right patch. But today, we still obtain them mostly through…

bravo_abad's tweet image. Atomically accurate de novo antibody design with diffusion models

Antibodies recognize specific molecular “patches” on proteins—epitopes. Modern therapeutics depend heavily on finding antibodies that bind exactly the right patch. But today, we still obtain them mostly through…

Minseong Bae reposted

LLMs are injective and invertible. In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space. (1/6)

GladiaLab's tweet image. LLMs are injective and invertible.

In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space.

(1/6)

Minseong Bae reposted

Recently gave a talk at @FlagshipPioneer on how far we've come in AI methods for RNA design, and where the field is potentially heading. Shared a similar (shorter) version with Rhiju Das' group at Stanford BioE this summer. Lots of fun convos thereafter 🤔 The talk contains some…

rishabh16_'s tweet image. Recently gave a talk at @FlagshipPioneer on how far we've come in AI methods for RNA design, and where the field is potentially heading. Shared a similar (shorter) version with Rhiju Das' group at Stanford BioE this summer. Lots of fun convos thereafter 🤔

The talk contains some…
rishabh16_'s tweet image. Recently gave a talk at @FlagshipPioneer on how far we've come in AI methods for RNA design, and where the field is potentially heading. Shared a similar (shorter) version with Rhiju Das' group at Stanford BioE this summer. Lots of fun convos thereafter 🤔

The talk contains some…
rishabh16_'s tweet image. Recently gave a talk at @FlagshipPioneer on how far we've come in AI methods for RNA design, and where the field is potentially heading. Shared a similar (shorter) version with Rhiju Das' group at Stanford BioE this summer. Lots of fun convos thereafter 🤔

The talk contains some…
rishabh16_'s tweet image. Recently gave a talk at @FlagshipPioneer on how far we've come in AI methods for RNA design, and where the field is potentially heading. Shared a similar (shorter) version with Rhiju Das' group at Stanford BioE this summer. Lots of fun convos thereafter 🤔

The talk contains some…

Minseong Bae reposted

Excited to release BoltzGen which brings SOTA folding performance to binder design! The best part of this project has been collaborating with many leading biologists who tested BoltzGen at an unprecedented scale, showing success on many novel targets and pushing its limits! 🧵..

HannesStaerk's tweet image. Excited to release BoltzGen which brings SOTA folding performance to binder design! The best part of this project has been collaborating with many leading biologists who tested BoltzGen at an unprecedented scale, showing success on many novel targets and pushing its limits! 🧵..

Minseong Bae reposted

Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've…

BERT is just a Single Text Diffusion Step! (1/n) When I first read about language diffusion models, I was surprised to find that their training objective was just a generalization of masked language modeling (MLM), something we’ve been doing since BERT from 2018. The first…



Minseong Bae reposted

Anyone know what program & settings was used to make these figs? I assume it isn't pymol or chimera

Protein designers have long dreamed of building new enzymes from scratch. Unfortunately, enzymes are highly dynamic; they move around a lot and have intermediate states that are difficult to design computationally. But a recent paper from David Baker’s group at the Institute…

AsimovPress's tweet image. Protein designers have long dreamed of building new enzymes from scratch.

Unfortunately, enzymes are highly dynamic; they move around a lot and have intermediate states that are difficult to design computationally.

But a recent paper from David Baker’s group at the Institute…


Minseong Bae reposted

In diffusion LMs, discrete methods have all but displaced continuous ones (🥲). Interesting new trend: why not both? Use continuous methods to make discrete diffusion better. Diffusion duality: arxiv.org/abs/2506.10892 CADD: arxiv.org/abs/2510.01329 CCDD: arxiv.org/abs/2510.03206

New survey on diffusion language models: arxiv.org/abs/2508.10875 (via @NicolasPerezNi1). Covers pre/post-training, inference and multimodality, with very nice illustrations. I can't help but feel a bit wistful about the apparent extinction of the continuous approach after 2023🥲

sedielem's tweet image. New survey on diffusion language models: arxiv.org/abs/2508.10875 (via @NicolasPerezNi1). Covers pre/post-training, inference and multimodality, with very nice illustrations.

I can't help but feel a bit wistful about the apparent extinction of the continuous approach after 2023🥲


Minseong Bae reposted

A Genetic Algorithm for Navigating Synthesizable Molecular Spaces 1. This paper introduces SynGA, a genetic algorithm designed to operate directly over synthesis routes, ensuring that all generated molecules are synthesizable. This is a significant step forward in addressing the…

BiologyAIDaily's tweet image. A Genetic Algorithm for Navigating Synthesizable Molecular Spaces

1. This paper introduces SynGA, a genetic algorithm designed to operate directly over synthesis routes, ensuring that all generated molecules are synthesizable. This is a significant step forward in addressing the…

Minseong Bae reposted

SimpleFold: Folding Proteins is Simpler than You Think "we introduce SimpleFold, the first flow-matching based protein folding model that solely uses general purpose transformer blocks. Protein folding models typically employ computationally expensive modules involving…

iScienceLuvr's tweet image. SimpleFold: Folding Proteins is Simpler than You Think

"we introduce SimpleFold, the first flow-matching based protein folding model that solely uses general purpose transformer blocks. Protein folding models typically employ computationally expensive modules involving…

Minseong Bae reposted

Intern-S1: A Scientific Multimodal Foundation Model "Intern-S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T tokens, including over 2.5T tokens from scientific domains. In the…

iScienceLuvr's tweet image. Intern-S1: A Scientific Multimodal Foundation Model

"Intern-S1 is a multimodal Mixture-of-Experts (MoE) model with 28 billion activated parameters and 241 billion total parameters, continually pre-trained on 5T tokens, including over 2.5T tokens from scientific domains. In the…

Minseong Bae reposted

TempRe reframes retrosynthesis as template generation: a seq-to-seq Transformer predicts SMARTS templates token-by-token (P2T) and even autowrites full multi-step routes (Direct TempRe) #GenAI #DrugDiscovery

AllThingsApx's tweet image. TempRe reframes retrosynthesis as template generation:

a seq-to-seq Transformer predicts SMARTS templates token-by-token (P2T) and even autowrites full multi-step routes (Direct TempRe) #GenAI #DrugDiscovery

Minseong Bae reposted

MolTextNet: A Two-Million Molecule-Text Dataset for Multimodal Molecular Learning 1.MolTextNet introduces a dataset of 2.5 million molecule-text pairs, designed to support large-scale multimodal learning between molecular graphs and natural language. Each entry provides…

BiologyAIDaily's tweet image. MolTextNet: A Two-Million Molecule-Text Dataset for Multimodal Molecular Learning

1.MolTextNet introduces a dataset of 2.5 million molecule-text pairs, designed to support large-scale multimodal learning between molecular graphs and natural language. Each entry provides…

Minseong Bae reposted

Chris Bishop, Technical Fellow and Director of Microsoft Research AI for Science, explains how Microsoft researchers recently achieved a milestone in solving a grand challenge that has hampered scientists and slowed innovation for decades. Watch the video: msft.it/6019SQtUX


Minseong Bae reposted

ChatGPT now can analyze, manipulate, and visualize molecules and chemical information via the RDKit library. Useful for scientific work across health, biology, and chemistry.

gdb's tweet image. ChatGPT now can analyze, manipulate, and visualize molecules and chemical information via the RDKit library.

Useful for scientific work across health, biology, and chemistry.
gdb's tweet image. ChatGPT now can analyze, manipulate, and visualize molecules and chemical information via the RDKit library.

Useful for scientific work across health, biology, and chemistry.

Minseong Bae reposted

The two most common questions I get via cold email are 1) what should I work on, and 2) how do I get a job doing research, engineering, etc.? I wrote a new post summarizing my advice for people trying to enter Biology+ML, from undergrads to early career. 👇 (link below)

nc_frey's tweet image. The two most common questions I get via cold email are 1) what should I work on, and 2) how do I get a job doing research, engineering, etc.? 

I wrote a new post summarizing my advice for people trying to enter Biology+ML, from undergrads to early career. 

👇 (link below)

Minseong Bae reposted

Come to our oral presentation on Generator Matching at ICLR 2025 tomorrow (Saturday). Learn about a generative model that works for any data type and Markov process! Oral: 3:30pm (Peridot 202-203, session 6E) Poster: 10am-12:30pm #172 (Hall 3 + Hall 2B) arxiv.org/abs/2410.20587

peholderrieth's tweet image. Come to our oral presentation on Generator Matching at ICLR 2025 tomorrow (Saturday). Learn about a generative model that works for any data type and Markov process!

Oral: 3:30pm (Peridot 202-203, session 6E)
Poster: 10am-12:30pm #172 (Hall 3 + Hall 2B)

arxiv.org/abs/2410.20587

Minseong Bae reposted

Very excited to share our interview with @DrYangSong. This is Part 2 of our history of diffusion series — score matching, the SDE/ODE interpretation, consistency models, and more. Enjoy!


Minseong Bae reposted

Free alternative for ChemDraw! Meet MolDraw. 100% free to use!


Minseong Bae reposted

Today in the @timmermanreport -- why we need natural language, and not just DNA, RNA, and protein, to represent complex concepts in biology: timmermanreport.com/2025/01/ai-nee… @ldtimmerman has the best blog in the world for biotech. Amazing to be publishing something with him!


United States Trends

Loading...

Something went wrong.


Something went wrong.