logicandlemon's profile picture. Thinks a lot about AI.

Logic & Lemon

@logicandlemon

Thinks a lot about AI.

Gemini 3.0 Pro on RadLE v1 has achieved an impressive 51% accuracy, beating out radiology residents for the first time! Radiology residents scored 45%, and board-certified radiologists were around 83%. Gemini 3 Pro showed its thinking clearly in tough situations, like figuring…

🔥 Gemini 3.0 vs Radiologists: RadLE Benchmark Results Are OUT! ☠️ Is it game over for Radiology? Let us find out! ⬇️ 🫨 Since yesterday, Gemini 3.0 has been everywhere for crushing benchmarks. My inbox exploded asking: “But how did it do on the hardest visual reasoning…

DrDatta_AIIMS's tweet image. 🔥 Gemini 3.0 vs Radiologists: RadLE Benchmark Results Are OUT! 

☠️ Is it game over for Radiology? Let us find out! ⬇️

🫨 Since yesterday, Gemini 3.0 has been everywhere for crushing benchmarks. My inbox exploded asking: “But how did it do on the hardest visual reasoning…


NotebookLM has introduced Infographics and Slide Deck artifacts, which are now accessible to all Pro users. With this new feature, you can now generate high-quality infographics from a wide range of diverse sources.

Infographics & Slide Decks are 100% rolled out to all Pro users! One of our favorite examples that came out of early testing was uploading your resume/LinkedIn and creating a custom, visual representation of your career. Here is our social media manager's: Share yours too!

NotebookLM's tweet image. Infographics & Slide Decks are 100% rolled out to all Pro users! 

One of our favorite examples that came out of early testing was uploading your resume/LinkedIn and creating a custom, visual representation of your career. Here is our social media manager's:

Share yours too!


Perplexity has introduced a new feature in its finance function that displays real-time price movements for stocks, ETFs, indices, and cryptocurrencies directly within its web answers. Users can tap on any asset to access a detailed page featuring charts, news, and analysis. This…

Finance Plexgiving starts now: We're dropping a new Perplexity Finance feature every day through the end of November. Free for everyone. 1️⃣ Day 1/10: Live price tracking in answers Realtime price movement for stocks, ETFs, indices, and crypto now shows up inline in Perplexity…



Perplexity has launched the Comet browser for Android, which includes Agentic task support, voice mode, and tab control. Can’t wait for iOS version.

The most powerful AI browser now goes wherever you do. Ask it to handle tasks as you would on Comet for desktop. See exactly what actions the assistant is taking while you remain in full control.



If nano banana can make complex diagrams for consultants then they are on to something

Why spend hours tweaking slides when you could just… ask for what you want? Create and edit images and infographics with Nano Banana Pro in Slides or quickly liven up any slide with the “Beautify my slide” button. Experience the upgrade for yourself at slides.new!



Group chats finally launched. I dont think a lot of people will use but for the people that do, it will be quite useful.

Rolling out group chats to ChatGPT Plus, Pro, Go, and Free users starting today. Last week we piloted in Japan, Korea, New Zealand, and Taiwan, and early feedback has been great. Now anyone can bring friends, family, or coworkers and ChatGPT into the same conversation. Excited…



Locus is a long-running “artificial scientist” that keeps improving for days, orchestrating thousands of experiments and beating human experts on tough AI R&D benchmarks. It matters because it lets us scale serious research the same way we scale compute.

Introducing Locus: the first AI system to outperform human experts at AI R&D Locus conducts research autonomously over multiple days and achieves superhuman results on RE-Bench given the same resources as humans, as well as SOTA performance on GPU kernel & ML engineering tasks.…



OpenAI released GPT-5.1-Codex-Max on Codex with a better performance on coding tasks. Just in time to steal attention from Google. The models are getting good enough now that they can truly be a force multiple for development teams.

With compaction, GPT-5.1-Codex-Max can work independently for hours. In the Codex agent harness in the CLI, IDE extension, or cloud, it can work across multiple context windows, automatically pruning the session history to only retain context most relevant to the task at hand.



ChatGPT Atlas ramping up feature releases - Vertical tabs - extensions import - icloud passkeys - new downloads ui - setting to use control + tab to cycle to most recently used tab - select multiple tabs at once (shift + click) - ability to set google as default search -…

New ChatGPT Atlas release out! Just hit "update" in the top right. chatgpt.com/atlas - extensions import - icloud passkeys - new downloads ui - setting to use control + tab to cycle to most recently used tab - select multiple tabs at once (shift + click) - ability to set…



You can now create slides and sheets from perplexity. All the AI slide creators still have a long way to go.

Perplexity Pro and Max users can now create and edit slides, sheets and docs directly from your prompt sessions on Perplexity



Gemini 3 is for far ahead in the ARC-AGI-2 Eval. GPT 5 pro doesn’t come close.

Gemini 3 models from @Google @GoogleDeepMind have made a significant 2X SOTA jump on ARC-AGI-2 (Semi-Private Eval) Gemini 3 Pro: 31.11%, $0.81/task Gemini 3 Deep Think (Preview): 45.14%, $77.16/task

arcprize's tweet image. Gemini 3 models from @Google @GoogleDeepMind have made a significant 2X SOTA jump on ARC-AGI-2 (Semi-Private Eval)

Gemini 3 Pro:
31.11%, $0.81/task

Gemini 3 Deep Think (Preview):
45.14%, $77.16/task


Hitting limits on Gemini Pro 3 High on @antigravity pretty fast.


Google on Fire 🔥. Yet to test but looks very interesting. While vibe coding platforms exist Google has taken the IDE and shifted its focus to be Agent first while still having traditional IDE function. If you create a UI on nano banana you can essential ask antigravtiy to…

Meet Google Antigravity, your new agentic development platform. An evolution of the IDE, it's built to help you: - Orchestrate agents operating at a higher, task-oriented level - Run parallel tasks with agents across workspaces - Build anything with Gemini 3 Pro.



Manus launched extension that enable all browsers to become Agentic browsers.

Today we're launching Manus Browser Operator. Any browser can now become an AI browser. One extension. No download. No new setup. Your browser already works. Your logins. Your sessions. Your habits. Now with the full power of Manus.



Seeing this Deep Think chart, Gemini 3 looks built for hard-mode reasoning and knowledge work, hitting ~41% on Humanity’s Last Exam and 93.8% on GPQA Diamond, ahead of GPT-5.1 and Claude, and jumping to 45.1% on ARC-AGI-2 with tools on. Benchmarks are only one lens, though, so I…

logicandlemon's tweet image. Seeing this Deep Think chart, Gemini 3 looks built for hard-mode reasoning and knowledge work, hitting ~41% on Humanity’s Last Exam and 93.8% on GPQA Diamond, ahead of GPT-5.1 and Claude, and jumping to 45.1% on ARC-AGI-2 with tools on. Benchmarks are only one lens, though, so I…

Gemini 3 Pro landing ahead of GPT-5.1 on the Artificial Analysis Intelligence Index feels like a genuine shift, not just a tiny bump. It leads in 5 of the 10 evals, with big jumps on GPQA Diamond, MMLU-Pro, HLE, LiveCodeBench and SciCode, plus a strong showing on Humanity’s Last…

Gemini 3 Pro is the new leader in AI. Google has the leading language model for the first time, with Gemini 3 Pro debuting +3 points above GPT-5.1 in our Artificial Analysis Intelligence Index @GoogleDeepMind gave us pre-release access to Gemini 3 Pro Preview. The model…

ArtificialAnlys's tweet image. Gemini 3 Pro is the new leader in AI. Google has the leading language model for the first time, with Gemini 3 Pro debuting +3 points above GPT-5.1 in our Artificial Analysis Intelligence Index

@GoogleDeepMind gave us pre-release access to Gemini 3 Pro Preview. The model…


Grok 4.1 is out! Surprisingly decent model when I tested a few prompts

BREAKING 🚨: Grok 4.1 Beta is rolling out on the Grok web! It is available as a standalone option, next to the existing Grok 4 modes. Testing time 👀

testingcatalog's tweet image. BREAKING 🚨: Grok 4.1 Beta is rolling out on the Grok web! It is available as a standalone option, next to the existing Grok 4 modes. 

Testing time 👀


Loading...

Something went wrong.


Something went wrong.