athleticKoder's profile picture. ml @zomato; prev: ai consultant @google

anshuman

@athleticKoder

ml @zomato; prev: ai consultant @google

anshuman reposted

Non-natural image gen and editing are difficult tasks. We tested the state of the art at the time — including Nano Banana 1.0 & GPT-image — all performed quite poorly on StructBench. Nano Banana 2 (NB2) just dropped, and its improvements strongly validate a direction we studied…

RisingSayak's tweet image. Non-natural image gen and editing are difficult tasks. We tested the state of the art at the time — including Nano Banana 1.0 & GPT-image — all performed quite poorly on StructBench.

Nano Banana 2 (NB2) just dropped, and its improvements strongly validate a direction we studied…

You're in a ML Engineering Interview at Google, and the interviewer asks: "Salary kitni loge?" Here's how you answer:


anshuman reposted

Gemini 3 Pro System prompt

athleticKoder's tweet image. Gemini 3 Pro System prompt

FLUX.2, Create a hand drawn isometric schematic diagram of this street.

athleticKoder's tweet image. FLUX.2, Create a hand drawn isometric schematic diagram of this street.
athleticKoder's tweet image. FLUX.2, Create a hand drawn isometric schematic diagram of this street.

Nano Banana Pro, Create a hand drawn isometric schematic diagram of this street.

EphraimDuncan_'s tweet image. Nano Banana Pro, Create a hand drawn isometric schematic diagram of this street.
EphraimDuncan_'s tweet image. Nano Banana Pro, Create a hand drawn isometric schematic diagram of this street.


Current OCR Models landscape 1. HunyuanOCR 2. PaddleOCR-VL 3. MinerU2.5 4. Qwen3-VL-235B-Instruct 5. MonkeyOCR-pro-3B 6. dots.ocr 7. Gemini-2.5-Pro 8. Deepseek-OCR 9. olmOCR 10. Mistral-OCR 11. GPT-4o 12. Dolphin BaiduOCR 13. PaddleOCR 14. Qwen3-VL-2B-Instruct 15.…

We are thrilled to open-source HunyuanOCR, an expert, end-to-end OCR model built on Hunyuan's native multimodal architecture and training strategy. This model achieves SOTA performance with only 1 billion parameters, significantly reducing deployment costs. ⚡️Benchmark Leader:…

TencentHunyuan's tweet image. We are thrilled to open-source HunyuanOCR, an expert, end-to-end OCR model built on Hunyuan's native multimodal architecture and training strategy. This model achieves SOTA performance with only 1 billion parameters, significantly reducing deployment costs.

⚡️Benchmark Leader:…
TencentHunyuan's tweet image. We are thrilled to open-source HunyuanOCR, an expert, end-to-end OCR model built on Hunyuan's native multimodal architecture and training strategy. This model achieves SOTA performance with only 1 billion parameters, significantly reducing deployment costs.

⚡️Benchmark Leader:…
TencentHunyuan's tweet image. We are thrilled to open-source HunyuanOCR, an expert, end-to-end OCR model built on Hunyuan's native multimodal architecture and training strategy. This model achieves SOTA performance with only 1 billion parameters, significantly reducing deployment costs.

⚡️Benchmark Leader:…


The cycle never ends. Anthropic is sooo back.

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

claudeai's tweet image. Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use.

Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.


Loading...

Something went wrong.


Something went wrong.