OpsConfig's profile picture.

OpsConfig

@OpsConfig

OpsConfig heeft deze post opnieuw geplaatst

LLM as a judge has become a dominant way to evaluate how good a model is at solving a task, since it works without a test set and handles cases where answers are not unique. But despite how widely this is used, almost all reported results are highly biased. Excited to share our…

Kangwook_Lee's tweet image. LLM as a judge has become a dominant way to evaluate how good a model is at solving a task, since it works without a test set and handles cases where answers are not unique.

But despite how widely this is used, almost all reported results are highly biased.

Excited to share our…
Kangwook_Lee's tweet image. LLM as a judge has become a dominant way to evaluate how good a model is at solving a task, since it works without a test set and handles cases where answers are not unique.

But despite how widely this is used, almost all reported results are highly biased.

Excited to share our…
Kangwook_Lee's tweet image. LLM as a judge has become a dominant way to evaluate how good a model is at solving a task, since it works without a test set and handles cases where answers are not unique.

But despite how widely this is used, almost all reported results are highly biased.

Excited to share our…

OpsConfig heeft deze post opnieuw geplaatst

your post challenged me. every one of your points is wrong but i had to think about each for a while :)


OpsConfig heeft deze post opnieuw geplaatst

their 'superhuman' ai cleverly assigned all the work to non-default streams, which means the correctness test (which waits on all streams) passes, while the profiling timer (which only waits on the default stream) is tricked into reporting a huge speedup

miru_why's tweet image. their 'superhuman' ai cleverly assigned all the work to non-default streams, which means the correctness test (which waits on all streams) passes, while the profiling timer (which only waits on the default stream) is tricked into reporting a huge speedup

OpsConfig heeft deze post opnieuw geplaatst

GPT was actually based on ULMFiT, written by a person with no PhD (me).


OpsConfig heeft deze post opnieuw geplaatst

Johns Hopkins University is excited to announce its new tuition promise program will offer free tuition for undergraduate students from families earning up to $200,000 and tuition plus living expenses for families earning up to $100,000. As a result of the change, students from…

JohnsHopkins's tweet image. Johns Hopkins University is excited to announce its new tuition promise program will offer free tuition for undergraduate students from families earning up to $200,000 and tuition plus living expenses for families earning up to $100,000. 

As a result of the change, students from…

OpsConfig heeft deze post opnieuw geplaatst

The comments on software patents made me chuckle. After selling both Id Software and Oculus, my continued employment contracts included “will not participate in software patents” clauses, and yet I got asked to reconsider in both cases. It wasn’t a lot of pressure, so I don’t…


OpsConfig heeft deze post opnieuw geplaatst

The vast gulf between all of our actual daily experience with LLM improvements plateauing, vs what some researchers claim is going on, is pretty interesting. You can't properly understand AI progress without using the models for real work every day.

Are the sigmoids in the room with us right now?

TolgaBilge_'s tweet image. Are the sigmoids in the room with us right now?


OpsConfig heeft deze post opnieuw geplaatst

First Law of @matt_levine: the funniest thing is the thing that will happen

DOGE trying to vibe-refactor COBOL code with AI tools would be the funniest way for the complete collapse of the global economy to happen



OpsConfig heeft deze post opnieuw geplaatst

My student's comment on that LLMs generating more novel research ideas than humans paper making the rounds I think this says more about NLP researchers than about LLMs Ouch..


OpsConfig heeft deze post opnieuw geplaatst

To be fair, he has no fucking idea what he is talking about. NVIDIA is a strong WFH culture, and we're doing just fine.


OpsConfig heeft deze post opnieuw geplaatst

Australia lost one of our medical legends yesterday, with the death of Nobel Laureate Dr Robin Warren. Together with Dr Barry Marshall, they discovered the link between the bacteria Helicobacter Pylori & Peptic Ulcer Disease. It fundamentally changed how we treat ulcer disease.

SParnis's tweet image. Australia lost one of our medical legends yesterday, with the death of Nobel Laureate Dr Robin Warren. Together with Dr Barry Marshall, they discovered the link between the bacteria Helicobacter Pylori & Peptic Ulcer Disease. It fundamentally changed how we treat ulcer disease.

I am sorry to report that Dr (John) Robin Warren (Nobel Prize in Medicine 2005) died yesterday evening (July 23 rd).  Robin was cared for at Brightwater in Perth Western Australia. He had become very frail in recent years and passed away peacefully in the company of his family.



OpsConfig heeft deze post opnieuw geplaatst

The first rule of data breaches: if it exists in a database on the Internet, it will be stolen. The second rule of data breaches: the service that lost your data will be incredibly vague about exactly what the hackers took, because it’s way worse than you imagine.

NEW: Hackers say they stole 33 million cell phone numbers of users of two-factor app Authy. Twilio (owner of Authy) confirmed "threat actors were able to identify" phone numbers, but didn't say how many. The risk is better tailored phishing attacks. techcrunch.com/2024/07/03/twi…



OpsConfig heeft deze post opnieuw geplaatst

please stop its physically painful

nearcyan's tweet image. please stop its physically painful

OpsConfig heeft deze post opnieuw geplaatst

This entire paper boils down to “clamp importance ratios to 1.0”, which could have been nicely communicated in the abstract. I can only assume that academic publishing has a lot of motivating forces I am happily ignorant of. arxiv.org/abs/1606.02647


It is a very strange feeling to see an aspect of something myself and a bunch of other people were randomly posting about in August show up in a Google DeepMind Paper in November: not-just-memorization.github.io/extracting-tra…

OpsConfig's tweet image. It is a very strange feeling to see an aspect of something myself and a bunch of other people were randomly posting about in August show up in a Google DeepMind Paper in November: not-just-memorization.github.io/extracting-tra…
OpsConfig's tweet image. It is a very strange feeling to see an aspect of something myself and a bunch of other people were randomly posting about in August show up in a Google DeepMind Paper in November: not-just-memorization.github.io/extracting-tra…

OpsConfig heeft deze post opnieuw geplaatst

Don't listen to me. I don't understand language model fine tuning. I'm merely the 1st author of the paper "Universal Language Model Fine Tuning", which explained 5 years ago how to fine tune universal language models.

jeremyphoward's tweet image. Don't listen to me. I don't understand language model fine tuning.

I'm merely the 1st author of the paper "Universal Language Model Fine Tuning", which explained 5 years ago how to fine tune universal language models.

OpsConfig heeft deze post opnieuw geplaatst

I recommend talking to the model to explore what it can help you best with. Try out how it works for your use case and probe it adversarially. Think of edge cases. Don't rush to hook it up to important infrastructure before you're familiar with how it behaves for your use case.


Loading...

Something went wrong.


Something went wrong.