Dataocean AI

@DataOceanAI

AI Data Resource & Data Service Provider

100 N Howard St Ste R, WA

dataoceanai.com

Joined May 2018

173Posts 103Followers 8Following

You might like

@kahnchana

@egferro

@nayyef_ali

Dataocean AI

@DataOceanAI

Nov 3

🌍 Unlock the Power of Multilingual OCR Datasets with @DataOceanAI! From natural scenes to handwritten documents, DataoceanAI provides diverse, high-quality OCR datasets to accelerate model training and expand global application coverage. #multilingualOCR #dataset #documentOCR

Dataocean AI

@DataOceanAI

Oct 15

GITEX GLOBAL 2025 Day 3 — The excitement continues! Visit us at Booth H14-A60! Our ASR, TTS, and Multimodal Datasets attracted strong interest from visitors eager to advance AI innovation through better data. #GITEX2025 #ASR #TTS #Multimodal #AIDatasets #AIInnovation

DataOceanAI's tweet image. GITEX GLOBAL 2025 Day 3 — The excitement continues!
Visit us at Booth H14-A60!
Our ASR, TTS, and Multimodal Datasets attracted strong interest from visitors eager to advance AI innovation through better data.

#GITEX2025 #ASR #TTS #Multimodal #AIDatasets #AIInnovation

Dataocean AI

@DataOceanAI

Sep 11

💡 What if your AI could interrupt you naturally—just like a real conversation? 🔹 Train with Dataocean AI’s 9,000-Hour Chinese Full-Duplex Corpus — powering the next generation of real-time, interruptible AI. 👉 Explore the full story here dataoceanai.com/can-you-interr… #Datasets

dataoceanai.com

"Can You Interrupt AI Mid-Response?” Discover the Full-Duplex Power Behind GPT Realtime × Gemini —...

Currently, most speech training datasets consist of continuous recordings with complete conversational turns, lacking the naturally occurring, hard-to-model

Source: dataoceanai.com

Dataocean AI

@DataOceanAI

Aug 28

🔥 Level Up Your Mandarin ASR! 🔊 9,000 Hours Chinese Mandarin Full Duplex Speech Recognition Corpus (Mobile & Desktop) — our most popular dataset for building smarter, more natural conversational AI. #AI #SpeechData #ConversationalAI #Dataset #FullDuplex #SpeechRecognition

Dataocean AI

@DataOceanAI

Aug 19

🚀 Day 2 at #Interspeech2025! @DataOceanAI is showing how our Data Services, DOTS Platform, and curated ASR&TTS Datasets are driving breakthroughs in generative AI applications. ✨Swing by our booth for a chance to win an LEGO Set #speechtechnology #generativeAI #AIdatasolutions

DataOceanAI's tweet image. 🚀 Day 2 at #Interspeech2025! @DataOceanAI is showing how our Data Services, DOTS Platform, and curated ASR&amp;TTS Datasets are driving breakthroughs in generative AI applications. ✨Swing by our booth for a chance to win an LEGO Set
#speechtechnology #generativeAI #AIdatasolutions

Dataocean AI

@DataOceanAI

Aug 12

#Interspeech2025 kicks off on August 17, in Rotterdam, the Netherlands! @DataOceanAI will be there showcasing our latest speech datasets! 👋 Come meet our experts to explore collaboration and accelerate your AI projects! #Interspeech2025 #SpeechAI #ASR #TTS #SpeechDatasets

Dataocean AI

@DataOceanAI

Jul 29

✨ It’s Day 2 at #ACL2025 and we’re still going strong in Vienna! Stop by Booth #4 to connect with the Dataocean AI team. 📊 Dive into our NLP datasets — from CoT and MT to OCR and beyond. 💬 Chat with our team about real-world AI applications. 🎁 Giveaways are waiting! #NLP #AI

Dataocean AI

@DataOceanAI

Jul 28

🚀 High-Quality Speech AI Datasets Released! Get Access & Get in Touch! These datasets are now available for licensing or collaboration. Please feel free to reach out to request access, download samples, or learn how they integrate with your AI pipeline. #SpeechRecognition

Dataocean AI

@DataOceanAI

Jul 21

#ACL2025NLP kicks off next week! Come and visit Dataocean AI at Booth #4 from July 27-30. 💡 We’re showcasing high-quality NLP datasets — including CoT, MT, OCR, and more. 🎁 Drop by for expert insights and fun giveaways. 🚀We look forward to seeing you! #NLPCommunity

Dataocean AI

@DataOceanAI

Jul 1

🎉 The #ICME2025 Audio Encoder Capability Challenge Workshop kicked off! Congratulations to all the winning teams for their outstanding solutions in audio encoder multi-task learning and real-world applications! Thank you to all our speakers and participants. #AudioEncoder

DataOceanAI's tweet image. 🎉 The #ICME2025 Audio Encoder Capability Challenge Workshop kicked off! Congratulations to all the winning teams for their outstanding solutions in audio encoder multi-task learning and real-world applications! Thank you to all our speakers and participants. #AudioEncoder

Dataocean AI

@DataOceanAI

Jun 26

The #ICME2025 Audio Encoder Capability Challenge Workshop is coming soon! ✅ Time: July 1st, 10:15 AM – 11:30 AM ✅ Location: Room 450, Cité Nantes Congress Centre, Nantes, France Join us at #ICME2025 to hear winning teams present their solutions and insights. #AudioEncoder

Dataocean AI

@DataOceanAI

Apr 13

🔥Open Source! #Dolphin🐬 vs OpenAI-Whisper in reduction of WER reaches up to 68%. Dolphin - A new Large-Scale Automatic Speech Recognition Model from Dataocean AI & Tsinghua University. Supporting 40 Eastern languages and 22 Chinese dialects. GitHub: lnkd.in/gyvBVuKg

Dataocean AI

@DataOceanAI

Apr 2

Meet #Dolphin🐬 - A SOTA Speech Recognition Model for 40 Eastern Languages + 22 Chinese dialects, from Dataocean AI & Tsinghua University. Dolphin vs OpenAI-Whisper in reduction of WER reaches up to 68%. Paper: arxiv.org/abs/2503.2021

DataOceanAI's tweet image. Meet #Dolphin🐬 - A SOTA Speech Recognition Model for 40 Eastern Languages + 22 Chinese dialects, from Dataocean AI &amp; Tsinghua University. Dolphin vs OpenAI-Whisper in reduction of WER reaches up to 68%.
Paper: arxiv.org/abs/2503.2021

Dataocean AI reposted

Tiezhen WANG

@Xianbao_QIAN

Mar 28

huggingface.co/DataoceanAI/do… huggingface.co/DataoceanAI/do…

DataoceanAI/dolphin-base · Hugging Face

Source: huggingface.co

Dataocean AI reposted

Tiezhen WANG

@Xianbao_QIAN

Mar 28

A new ASR model from Tsinghua university that focused on eastern languages, available on @huggingface

Dataocean AI

@DataOceanAI

Mar 21

It is so exciting to have you at #TechAD! ✅ Schedule a meeting with our data experts to get the latest In-Cabin and Autonomous Driving Data Solutions from Dataocean AI. Learn more data solutions: dataoceanai.com/industry-solut…

DataOceanAI's tweet image. It is so exciting to have you at #TechAD!

✅ Schedule a meeting with our data experts to get the latest In-Cabin and Autonomous Driving Data Solutions from Dataocean AI.

Learn more data solutions: dataoceanai.com/industry-solut…

Dataocean AI

@DataOceanAI

Mar 12

We’re excited to be part of the Tech.AD Europe 2025! Join us in Berlin from March 17-18 at Booth #11 Learn more about our auto-label platform - DOTS-AD, specifically tailored for OCC, BEV, and 2D/3D/4D Labeling. Stop by booth #11 and chat with our experts.

Dataocean AI

@DataOceanAI

Feb 27

🎉 We are thrilled to announce that Dataocean AI has been named "Best AI-Powered Data Solutions Company 2025" by Acquisition International Magazine for their Global Excellence Awards Programme. Learn more high-quality corpus and one-stop data solutions: dataoceanai.com

Dataocean AI

@DataOceanAI

Feb 6

🔥 The IEEE ICME 2025 Audio Encoder Capability Challenge is Open NOW! The Challenge is hosted by Xiaomi Technology, University of Surrey, and Dataocean AI. Website: dataoceanai.github.io/ICME2025-Audio… Register👇 BEFORE April 1 forms.gle/VGgRQdPLs9f72U…