tensorcore's profile picture. Hiroyuki Ootomo. High-precision GEMM emulation on Tensor Cores. Work at #76B900. Ph.D. from @TokyoTech_en. Cooking @cp_async. Hai-to-Yoka: https://x.enp1s0.dev

tsuki

@tensorcore

Hiroyuki Ootomo. High-precision GEMM emulation on Tensor Cores. Work at #76B900. Ph.D. from @TokyoTech_en. Cooking @cp_async. Hai-to-Yoka: https://x.enp1s0.dev

置頂

emulation is all you need


LLMが生成したコードを考察することの意味がまるで分からない


多様体仮説を多項式階層くらいに信じたい


tsuki 已轉發

Good news. SCAsia/HPCAsia2026 Early bird reg. extended to Jan 5! ディスカウントの早期登録を1月5日まで延長しました。This is to accommodate late posters and visa issues, but everyone could take advantage. ポスターやビザ関係を考慮してですが、全ての参加者に適当されます。日本語…


I feel more relaxed working on weekends than on weekdays, and late at night rather than during the day...


Attached an ADS-B receiver to a Raspberry Pi Zero W for Flightradar24. I still need to update the OS, though...

tensorcore's tweet image. Attached an ADS-B receiver to a Raspberry Pi Zero W for Flightradar24. I still need to update the OS, though...

学生にお肉を食べさせたり、研究会で先生方にアイスをねじ込んだり、多分僕はそう言う類のハラスメント気質があるのだと思う


I’d like to try making a picture in the style of 黃文勇. I saw his work at his solo exhibition in Kaohsiung last year (相無所相) and really liked it.


船舶免許使い道ないと思っていたけれど、羽田空港のD滑走路の桟橋の杭や誘導灯の写真を海から撮りに行けるじゃん、素敵


暗い写真を綺麗に撮りたいのだけれど、当然レンズがお高い...うぅ...。


Nago City Hall in Okinawa. I love the architecture. It feels somewhat brutalist to me.

tensorcore's tweet image. Nago City Hall in Okinawa. I love the architecture. It feels somewhat brutalist to me.
tensorcore's tweet image. Nago City Hall in Okinawa. I love the architecture. It feels somewhat brutalist to me.
tensorcore's tweet image. Nago City Hall in Okinawa. I love the architecture. It feels somewhat brutalist to me.
tensorcore's tweet image. Nago City Hall in Okinawa. I love the architecture. It feels somewhat brutalist to me.

visited Okinawa Institute of Science and Technology. It’s in Japan, but it doesn’t feel like Japan at all. I want to be a Ph.D. student here.

tensorcore's tweet image. visited Okinawa Institute of Science and Technology. It’s in Japan, but it doesn’t feel like Japan at all. I want to be a Ph.D. student here.
tensorcore's tweet image. visited Okinawa Institute of Science and Technology. It’s in Japan, but it doesn’t feel like Japan at all. I want to be a Ph.D. student here.
tensorcore's tweet image. visited Okinawa Institute of Science and Technology. It’s in Japan, but it doesn’t feel like Japan at all. I want to be a Ph.D. student here.
tensorcore's tweet image. visited Okinawa Institute of Science and Technology. It’s in Japan, but it doesn’t feel like Japan at all. I want to be a Ph.D. student here.

昔書いた、カメラ映像から流れ星を検出して自動的にお願い事を標準出力に唱えるプログラム(autopray)を、良いレンズのある今こそ動かすときな気がしてきた。ふたご座流星群でお願い事叶え放題。マニ車的祈念に効果があるのか分からないけれど。


tsuki 已轉發

gemm and gmem are now being typed interchangeably by tired stupid me


ChatGPT, Perplexity, Gemini, etc. somehow manage to include every possible lie in the world when answering a question about Fortran+CUDA libs. That's a waste of energy.


tsuki 已轉發

=> "Is Mixed Precision Computing really the Top Priority?", Hartwig Anzt, TUM, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY… Bandwidth & Latency can not keep up with growth in compute power Ginkgo github.com/ginkgo-project… AI WH, Rio Yokota x.com/ogawa_tter/sta…

ogawa_tter's tweet image. =>
"Is Mixed Precision Computing really the Top Priority?", Hartwig Anzt, TUM, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
Bandwidth & Latency can not keep up with growth in compute power
Ginkgo github.com/ginkgo-project…

AI WH, Rio Yokota x.com/ogawa_tter/sta…
ogawa_tter's tweet image. =>
"Is Mixed Precision Computing really the Top Priority?", Hartwig Anzt, TUM, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
Bandwidth & Latency can not keep up with growth in compute power
Ginkgo github.com/ginkgo-project…

AI WH, Rio Yokota x.com/ogawa_tter/sta…
ogawa_tter's tweet image. =>
"Is Mixed Precision Computing really the Top Priority?", Hartwig Anzt, TUM, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
Bandwidth & Latency can not keep up with growth in compute power
Ginkgo github.com/ginkgo-project…

AI WH, Rio Yokota x.com/ogawa_tter/sta…

"Emulating High-precision Matrix Operations on Low-precision Matrix Engines" Rio Yokota, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY… <= M. Fasi, Oct 8 x.com/ogawa_tter/sta… NVIDIA, Oct 8 x.com/ogawa_tter/sta… Next-gen TPU? x.com/ogawa_tter/sta… Block Data

ogawa_tter's tweet image. &quot;Emulating High-precision Matrix Operations on Low-precision Matrix Engines&quot; Rio Yokota, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
&amp;lt;= M. Fasi, Oct 8 x.com/ogawa_tter/sta…
NVIDIA, Oct 8 x.com/ogawa_tter/sta…

Next-gen TPU? x.com/ogawa_tter/sta…
Block Data
ogawa_tter's tweet image. &quot;Emulating High-precision Matrix Operations on Low-precision Matrix Engines&quot; Rio Yokota, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
&amp;lt;= M. Fasi, Oct 8 x.com/ogawa_tter/sta…
NVIDIA, Oct 8 x.com/ogawa_tter/sta…

Next-gen TPU? x.com/ogawa_tter/sta…
Block Data
ogawa_tter's tweet image. &quot;Emulating High-precision Matrix Operations on Low-precision Matrix Engines&quot; Rio Yokota, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
&amp;lt;= M. Fasi, Oct 8 x.com/ogawa_tter/sta…
NVIDIA, Oct 8 x.com/ogawa_tter/sta…

Next-gen TPU? x.com/ogawa_tter/sta…
Block Data
ogawa_tter's tweet image. &quot;Emulating High-precision Matrix Operations on Low-precision Matrix Engines&quot; Rio Yokota, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
&amp;lt;= M. Fasi, Oct 8 x.com/ogawa_tter/sta…
NVIDIA, Oct 8 x.com/ogawa_tter/sta…

Next-gen TPU? x.com/ogawa_tter/sta…
Block Data


tsuki 已轉發

=> "Emulation of Complex Matrix Multiplication based on the Chinese Remainder Theorem", @uchino_error (RIKEN Kobe), et al., arXiv, Dec 9, 2025 arxiv.org/abs/2512.08321 Ozaki-II scheme ScalAH 2025 (SC25 WS) dl.acm.org/doi/10.1145/37… K. Ozaki, Jul 2 x.com/ogawa_tter/sta…

ogawa_tter's tweet image. =&amp;gt;
&quot;Emulation of Complex Matrix Multiplication based on the Chinese Remainder Theorem&quot;, @uchino_error (RIKEN Kobe), et al., arXiv, Dec 9, 2025 arxiv.org/abs/2512.08321
Ozaki-II scheme
ScalAH 2025 (SC25 WS) dl.acm.org/doi/10.1145/37…
K. Ozaki, Jul 2 x.com/ogawa_tter/sta…
ogawa_tter's tweet image. =&amp;gt;
&quot;Emulation of Complex Matrix Multiplication based on the Chinese Remainder Theorem&quot;, @uchino_error (RIKEN Kobe), et al., arXiv, Dec 9, 2025 arxiv.org/abs/2512.08321
Ozaki-II scheme
ScalAH 2025 (SC25 WS) dl.acm.org/doi/10.1145/37…
K. Ozaki, Jul 2 x.com/ogawa_tter/sta…
ogawa_tter's tweet image. =&amp;gt;
&quot;Emulation of Complex Matrix Multiplication based on the Chinese Remainder Theorem&quot;, @uchino_error (RIKEN Kobe), et al., arXiv, Dec 9, 2025 arxiv.org/abs/2512.08321
Ozaki-II scheme
ScalAH 2025 (SC25 WS) dl.acm.org/doi/10.1145/37…
K. Ozaki, Jul 2 x.com/ogawa_tter/sta…
ogawa_tter's tweet image. =&amp;gt;
&quot;Emulation of Complex Matrix Multiplication based on the Chinese Remainder Theorem&quot;, @uchino_error (RIKEN Kobe), et al., arXiv, Dec 9, 2025 arxiv.org/abs/2512.08321
Ozaki-II scheme
ScalAH 2025 (SC25 WS) dl.acm.org/doi/10.1145/37…
K. Ozaki, Jul 2 x.com/ogawa_tter/sta…

=> "Emulating Matrix Multiplication Using Mixed-Precision Computation", K. Ozaki, NGT - Openlab "Optimising Floating Point Precision" WS, Jul 2 (MP4) indico.cern.ch/event/1538409/… indico.cern.ch/event/1538409/… Ozaki Scheme II, Apr 27 (10) arxiv.org/abs/2504.08009 Aug 8 x.com/ogawa_tter/sta…

ogawa_tter's tweet image. =&amp;gt;
&quot;Emulating Matrix Multiplication Using Mixed-Precision Computation&quot;, K. Ozaki, NGT - Openlab &quot;Optimising Floating Point Precision&quot; WS, Jul 2
(MP4) indico.cern.ch/event/1538409/…
indico.cern.ch/event/1538409/…
Ozaki Scheme II, Apr 27 (10) arxiv.org/abs/2504.08009
Aug 8 x.com/ogawa_tter/sta…
ogawa_tter's tweet image. =&amp;gt;
&quot;Emulating Matrix Multiplication Using Mixed-Precision Computation&quot;, K. Ozaki, NGT - Openlab &quot;Optimising Floating Point Precision&quot; WS, Jul 2
(MP4) indico.cern.ch/event/1538409/…
indico.cern.ch/event/1538409/…
Ozaki Scheme II, Apr 27 (10) arxiv.org/abs/2504.08009
Aug 8 x.com/ogawa_tter/sta…
ogawa_tter's tweet image. =&amp;gt;
&quot;Emulating Matrix Multiplication Using Mixed-Precision Computation&quot;, K. Ozaki, NGT - Openlab &quot;Optimising Floating Point Precision&quot; WS, Jul 2
(MP4) indico.cern.ch/event/1538409/…
indico.cern.ch/event/1538409/…
Ozaki Scheme II, Apr 27 (10) arxiv.org/abs/2504.08009
Aug 8 x.com/ogawa_tter/sta…


> The next major change in hardware design will be shared exponents

"Emulating High-precision Matrix Operations on Low-precision Matrix Engines" Rio Yokota, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY… <= M. Fasi, Oct 8 x.com/ogawa_tter/sta… NVIDIA, Oct 8 x.com/ogawa_tter/sta… Next-gen TPU? x.com/ogawa_tter/sta… Block Data

ogawa_tter's tweet image. &quot;Emulating High-precision Matrix Operations on Low-precision Matrix Engines&quot; Rio Yokota, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
&amp;lt;= M. Fasi, Oct 8 x.com/ogawa_tter/sta…
NVIDIA, Oct 8 x.com/ogawa_tter/sta…

Next-gen TPU? x.com/ogawa_tter/sta…
Block Data
ogawa_tter's tweet image. &quot;Emulating High-precision Matrix Operations on Low-precision Matrix Engines&quot; Rio Yokota, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
&amp;lt;= M. Fasi, Oct 8 x.com/ogawa_tter/sta…
NVIDIA, Oct 8 x.com/ogawa_tter/sta…

Next-gen TPU? x.com/ogawa_tter/sta…
Block Data
ogawa_tter's tweet image. &quot;Emulating High-precision Matrix Operations on Low-precision Matrix Engines&quot; Rio Yokota, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
&amp;lt;= M. Fasi, Oct 8 x.com/ogawa_tter/sta…
NVIDIA, Oct 8 x.com/ogawa_tter/sta…

Next-gen TPU? x.com/ogawa_tter/sta…
Block Data
ogawa_tter's tweet image. &quot;Emulating High-precision Matrix Operations on Low-precision Matrix Engines&quot; Rio Yokota, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
&amp;lt;= M. Fasi, Oct 8 x.com/ogawa_tter/sta…
NVIDIA, Oct 8 x.com/ogawa_tter/sta…

Next-gen TPU? x.com/ogawa_tter/sta…
Block Data


健康診断のために非人道的な早起きをして不健康です。終わったのでしばらく不健康生活をします。


tsuki 已轉發

=> "Floating-Point Matrix Multiply with Integer Arithmetic", M. Fasi, U of Leeds, with A. Abdelfattah, J. Dongarra, M. Mikaitis & F. Tisseur, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY… arXiv. Jun 12 arxiv.org/abs/2506.11277 Sep 5, 2023 x.com/ogawa_tter/sta…

ogawa_tter's tweet image. =&amp;gt;
&quot;Floating-Point Matrix Multiply with Integer Arithmetic&quot;, M. Fasi, U of Leeds, with A. Abdelfattah, J. Dongarra, M. Mikaitis &amp;amp; F. Tisseur, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
arXiv. Jun 12 arxiv.org/abs/2506.11277

Sep 5, 2023 x.com/ogawa_tter/sta…
ogawa_tter's tweet image. =&amp;gt;
&quot;Floating-Point Matrix Multiply with Integer Arithmetic&quot;, M. Fasi, U of Leeds, with A. Abdelfattah, J. Dongarra, M. Mikaitis &amp;amp; F. Tisseur, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
arXiv. Jun 12 arxiv.org/abs/2506.11277

Sep 5, 2023 x.com/ogawa_tter/sta…
ogawa_tter's tweet image. =&amp;gt;
&quot;Floating-Point Matrix Multiply with Integer Arithmetic&quot;, M. Fasi, U of Leeds, with A. Abdelfattah, J. Dongarra, M. Mikaitis &amp;amp; F. Tisseur, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
arXiv. Jun 12 arxiv.org/abs/2506.11277

Sep 5, 2023 x.com/ogawa_tter/sta…
ogawa_tter's tweet image. =&amp;gt;
&quot;Floating-Point Matrix Multiply with Integer Arithmetic&quot;, M. Fasi, U of Leeds, with A. Abdelfattah, J. Dongarra, M. Mikaitis &amp;amp; F. Tisseur, WS on Approx Comp in NLA, Oct 8, 2025 sdrive.cnrs.fr/s/djQWs8W6gcdY…
arXiv. Jun 12 arxiv.org/abs/2506.11277

Sep 5, 2023 x.com/ogawa_tter/sta…

=> "DGEMM on Integer Tensor Cores", @tensorcore, NHR PerfLab Seminar, Sep 5, 2023 youtube.com/watch?v=ouK0gw… hpc.fau.de/files/2023/09/… Can DL processors be used for HPC applications? Can we emulate DGEMM in the same manner? We can! Ozaki scheme arXiv, Jun 22 arxiv.org/abs/2306.11975

ogawa_tter's tweet image. =&amp;gt;
 &quot;DGEMM on Integer Tensor Cores&quot;, @tensorcore, NHR PerfLab Seminar, Sep 5, 2023 youtube.com/watch?v=ouK0gw…
hpc.fau.de/files/2023/09/…
Can DL processors be used for HPC applications?
Can we emulate DGEMM in the same manner?
We can!
Ozaki scheme
arXiv, Jun 22 arxiv.org/abs/2306.11975
ogawa_tter's tweet image. =&amp;gt;
 &quot;DGEMM on Integer Tensor Cores&quot;, @tensorcore, NHR PerfLab Seminar, Sep 5, 2023 youtube.com/watch?v=ouK0gw…
hpc.fau.de/files/2023/09/…
Can DL processors be used for HPC applications?
Can we emulate DGEMM in the same manner?
We can!
Ozaki scheme
arXiv, Jun 22 arxiv.org/abs/2306.11975
ogawa_tter's tweet image. =&amp;gt;
 &quot;DGEMM on Integer Tensor Cores&quot;, @tensorcore, NHR PerfLab Seminar, Sep 5, 2023 youtube.com/watch?v=ouK0gw…
hpc.fau.de/files/2023/09/…
Can DL processors be used for HPC applications?
Can we emulate DGEMM in the same manner?
We can!
Ozaki scheme
arXiv, Jun 22 arxiv.org/abs/2306.11975


Loading...

Something went wrong.


Something went wrong.