torchcompiled's profile picture. trying to feel the magic. cofounder at @leonardoai_ directing research at @canva

Ethan

@torchcompiled

trying to feel the magic. cofounder at @leonardoai_ directing research at @canva

置頂

personally I feel like the inflection point was early 2022. The sweet spot where clip-guided diffusion was just taking off, forcing unconditional models to be conditional through strange patchwork of CLIP evaluating slices of the canvas at a time. It was like improv, always…

torchcompiled's tweet image. personally I feel like the inflection point was early 2022. The sweet spot where clip-guided diffusion was just taking off, forcing unconditional models to be conditional through strange patchwork of CLIP evaluating slices of the canvas at a time. It was like improv, always…
torchcompiled's tweet image. personally I feel like the inflection point was early 2022. The sweet spot where clip-guided diffusion was just taking off, forcing unconditional models to be conditional through strange patchwork of CLIP evaluating slices of the canvas at a time. It was like improv, always…
torchcompiled's tweet image. personally I feel like the inflection point was early 2022. The sweet spot where clip-guided diffusion was just taking off, forcing unconditional models to be conditional through strange patchwork of CLIP evaluating slices of the canvas at a time. It was like improv, always…
torchcompiled's tweet image. personally I feel like the inflection point was early 2022. The sweet spot where clip-guided diffusion was just taking off, forcing unconditional models to be conditional through strange patchwork of CLIP evaluating slices of the canvas at a time. It was like improv, always…

Image synthesis used to look so good. These are from 2021. I feel like this was an inflection point, and the space has metastasized into something abhorrent today (Grok, etc). Even with no legible representational forms, there was so much possibility in these images.

eprombeats's tweet image. Image synthesis used to look so good. These are from 2021. I feel like this was an inflection point, and the space has metastasized into something abhorrent today (Grok, etc). Even with no legible representational forms, there was so much possibility in these images.
eprombeats's tweet image. Image synthesis used to look so good. These are from 2021. I feel like this was an inflection point, and the space has metastasized into something abhorrent today (Grok, etc). Even with no legible representational forms, there was so much possibility in these images.
eprombeats's tweet image. Image synthesis used to look so good. These are from 2021. I feel like this was an inflection point, and the space has metastasized into something abhorrent today (Grok, etc). Even with no legible representational forms, there was so much possibility in these images.
eprombeats's tweet image. Image synthesis used to look so good. These are from 2021. I feel like this was an inflection point, and the space has metastasized into something abhorrent today (Grok, etc). Even with no legible representational forms, there was so much possibility in these images.


I think this aged well. There’s been quite a bit on training VAEs to have favorable representations, aligning them with embedding models. Why not just use the embedding models themself as the latent space?

torchcompiled's tweet image. I think this aged well. 

There’s been quite a bit on training VAEs to have favorable representations, aligning them with embedding models. Why not just use the embedding models themself as the latent space?

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

sainingxie's tweet image. three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right.

today, we introduce Representation Autoencoders (RAE).

>> Retire VAEs. Use RAEs. 👇(1/n)


In getting the outcomes we want from models, it all comes down to search. There’s two strategies here - reducing size of search space - searching efficiently Finetuning, RL, and prompt engineering all tighten the generative distribution around the outputs we want. Searching…


Ethan 已轉發

New post! As opposed to building reward models over human ratings and using them for RL, can a model develop its own reward function? Humans seem to develop their own aesthetic preferences through exploration and socializing. How can we mimic this for generative models?

torchcompiled's tweet image. New post! As opposed to building reward models over human ratings and using them for RL, can a model develop its own reward function? 

Humans seem to develop their own aesthetic preferences through exploration and socializing. 

How can we mimic this for generative models?

2 raised solutions here: 1 captures the social aspect while leaving out the difficulty of exploration: That one's taste develops by learning about other's tastes, we could imagine training a generative model trained over a dataset of many reward models, and sample new plausible…

torchcompiled's tweet image. 2 raised solutions here:

1 captures the social aspect while leaving out the difficulty of exploration: That one's taste develops by learning about other's tastes, we could imagine training a generative model trained over a dataset of many reward models, and sample new plausible…
torchcompiled's tweet image. 2 raised solutions here:

1 captures the social aspect while leaving out the difficulty of exploration: That one's taste develops by learning about other's tastes, we could imagine training a generative model trained over a dataset of many reward models, and sample new plausible…

New post! As opposed to building reward models over human ratings and using them for RL, can a model develop its own reward function? Humans seem to develop their own aesthetic preferences through exploration and socializing. How can we mimic this for generative models?

torchcompiled's tweet image. New post! As opposed to building reward models over human ratings and using them for RL, can a model develop its own reward function? 

Humans seem to develop their own aesthetic preferences through exploration and socializing. 

How can we mimic this for generative models?


Ethan 已轉發

New post! I believe we can think of ourselves in two different lenses: an exact point of experience and the history of our patterns of behavior. Though the two are deeply interconnected.

torchcompiled's tweet image. New post! I believe we can think of ourselves in two different lenses: an exact point of experience and the history of our patterns of behavior. Though the two are deeply interconnected.

United States 趨勢

Loading...

Something went wrong.


Something went wrong.