ai4java's profile picture. Supercharge your Java application with the power of AI

ai4j

@ai4java

Supercharge your Java application with the power of AI

ai4j 님이 재게시함

🚀 Exciting News! 🚀 Introducing LocalAI v1.24.0 !! This is a hot one!🔥 🧵Let's see what's new 👇


ai4j 님이 재게시함

okay this paper is wild transformers *can* be taught arithmetic. simply reversing the format in which you present the answer, improves accuracy to 100% in ~5k steps (compared to non-convergence by 100k steps). this had been demonstrated previously in LSTMs arxiv.org/abs/2307.03381

Birchlabs's tweet image. okay this paper is wild
transformers *can* be taught arithmetic.
simply reversing the format in which you present the answer, improves accuracy to 100% in ~5k steps (compared to non-convergence by 100k steps).
this had been demonstrated previously in LSTMs
arxiv.org/abs/2307.03381
Birchlabs's tweet image. okay this paper is wild
transformers *can* be taught arithmetic.
simply reversing the format in which you present the answer, improves accuracy to 100% in ~5k steps (compared to non-convergence by 100k steps).
this had been demonstrated previously in LSTMs
arxiv.org/abs/2307.03381

ai4j 님이 재게시함

Starting today, you can set custom instructions in ChatGPT that will persist from conversation to conversation. 👀 📌 You can enable custom instructions in the beta panel from the settings.

OfficialLoganK's tweet image. Starting today, you can set custom instructions in ChatGPT that will persist from conversation to conversation. 👀 📌

You can enable custom instructions in the beta panel from the settings.

ai4j 님이 재게시함

Confirmed. 70B LLaMA 2 easily training on a single GPU with 48GB Green light on 70B 4-bit QLoRA & A6000. Go wild.

Yampeleg's tweet image. Confirmed.

70B LLaMA 2 easily training on a single GPU with 48GB

Green light on 70B 4-bit QLoRA & A6000.

Go wild.

ai4j 님이 재게시함

This is huge: Llama-v2 is open source, with a license that authorizes commercial use! This is going to change the landscape of the LLM market. Llama-v2 is available on Microsoft Azure and will be available on AWS, Hugging Face and other providers Pretrained and fine-tuned…


ai4j 님이 재게시함

Announcing FlashAttention-2! We released FlashAttention a year ago, making attn 2-4 faster and is now widely used in most LLM libraries. Recently I’ve been working on the next version: 2x faster than v1, 5-9x vs standard attn, reaching 225 TFLOPs/s training speed on A100. 1/

tri_dao's tweet image. Announcing FlashAttention-2! We released FlashAttention a year ago, making attn 2-4 faster and is now widely used in most LLM libraries. Recently I’ve been working on the next version: 2x faster than v1, 5-9x vs standard attn, reaching 225 TFLOPs/s training speed on A100. 1/
tri_dao's tweet image. Announcing FlashAttention-2! We released FlashAttention a year ago, making attn 2-4 faster and is now widely used in most LLM libraries. Recently I’ve been working on the next version: 2x faster than v1, 5-9x vs standard attn, reaching 225 TFLOPs/s training speed on A100. 1/

ai4j 님이 재게시함

We are excited to support research into safety and societal impacts via our Researcher Access Program. We aim to provide subsidized API access to our frontier models, including Claude 2, to as many researchers and academics as we can.


ai4j 님이 재게시함

this is wild — kNN using a gzip-based distance metric outperforms BERT and other neural methods for OOD sentence classification intuition: 2 texts similar if cat-ing one to the other barely increases gzip size no training, no tuning, no params — this is the entire algorithm:

goodside's tweet image. this is wild — kNN using a gzip-based distance metric outperforms BERT and other neural methods for OOD sentence classification

intuition: 2 texts similar if cat-ing one to the other barely increases gzip size

no training, no tuning, no params — this is the entire algorithm:

this paper's nuts. for sentence classification on out-of-domain datasets, all neural (Transformer or not) approaches lose to good old kNN on representations generated by.... gzip aclanthology.org/2023.findings-…

LukeGessler's tweet image. this paper's nuts. for sentence classification on out-of-domain datasets, all neural (Transformer or not) approaches lose to good old kNN on representations generated by.... gzip aclanthology.org/2023.findings-…


ai4j 님이 재게시함

❓💬We heard that converting documents into Q&A pairs before vectorizing them would yield better results for queries phrased as questions, so we created a basic benchmark using @LangChainAI and FAISS to prove it. Notebook: github.com/psychic-api/do… tl;dr: It works! But only for…

finic_ai's tweet image. ❓💬We heard that converting documents into Q&A pairs before vectorizing them would yield better results for queries phrased as questions, so we created a basic benchmark using @LangChainAI and FAISS to prove it. 

Notebook: github.com/psychic-api/do…
tl;dr: It works! But only for…

ai4j 님이 재게시함

Practical lessons from building an enterprise LLM-based assistant: 1/ There's a common misconception that you can just finetune an LLM on your company's data. However: - Finetuning is better suited to teaching specialized tasks, than to injecting new knowledge - Finetuning is…


ai4j 님이 재게시함

I went through many open-source models lately. Here are my current top models that I suggest you test for yourself: - Nous-Hermes: Still the best in my opinion for day-to-day usecases. It follows your instructions flawlessly nearly all the time. [Especially if you use beam…

Yampeleg's tweet image. I went through many open-source models lately.

Here are my current top models that I suggest you test for yourself:

- Nous-Hermes: Still the best in my opinion for day-to-day usecases. It follows your instructions flawlessly nearly all the time. [Especially if you use beam…

ai4j 님이 재게시함

Introducing Claude 2! Our latest model has improved performance in coding, math and reasoning. It can produce longer responses, and is available in a new public-facing beta website at claude.ai in the US and UK.

AnthropicAI's tweet image. Introducing Claude 2! Our latest model has improved performance in coding, math and reasoning. It can produce longer responses, and is available in a new public-facing beta website at claude.ai in the US and UK.

ai4j 님이 재게시함

The current state of open-source coding models A tweet for the "Let's reach GPT-4's coding abilities open-source" guys 👇 --- Why do I care so much about coding models? 1. Because (probably) coding also boosts reasoning abilities of LLMs. 2. Because in the future we can use…

Releasing 🚀 CodeGen2.5 🚀, a small but mighty LLM for code. - On par with models twice its size - Trained on 1.5T tokens - Features fast infill sampling Blog: blog.salesforceairesearch.com/codegen25 Paper: arxiv.org/abs/2305.02309 Code: github.com/salesforce/Cod… Model: huggingface.co/Salesforce/cod…

SFResearch's tweet image. Releasing 🚀 CodeGen2.5 🚀, a small but mighty LLM for code.

- On par with models twice its size
- Trained on 1.5T tokens
- Features fast infill sampling

Blog: blog.salesforceairesearch.com/codegen25
Paper: arxiv.org/abs/2305.02309
Code: github.com/salesforce/Cod…
Model: huggingface.co/Salesforce/cod…


ai4j 님이 재게시함

How do you contrast results like this with papers like: arxiv.org/abs/2307.03172


ai4j 님이 재게시함

🤔Which words in your prompt are most helpful to language models? In our #ACL2023NLP paper, we explore which parts of task instructions are most important for model performance. 🔗 arxiv.org/abs/2306.01150 Code: github.com/fanyin3639/Ret…

CaimingXiong's tweet image. 🤔Which words in your prompt are most helpful to language models? In our #ACL2023NLP paper, we explore which parts of task instructions are most important for model performance.
🔗 arxiv.org/abs/2306.01150
Code: github.com/fanyin3639/Ret…

ai4j 님이 재게시함

Adding Memory is important to help Chains and Agents remember previous interactions. Below are a few memory types you can use for your @LangChainAI apps. 1) ConversationBufferMemory: Keeps a list of the interactions and can extract the messages in a variable.


ai4j 님이 재게시함

I wrote a bit of a guide to ChatGPT’s Code Interpreter, which I have found to be the most useful and powerful mode of AI. It is, like every product made by OpenAI so far, terribly named. It is less a tool for coders and more a coder who works for you. oneusefulthing.org/p/what-ai-can-…


ai4j 님이 재게시함

LangChain: Chat with Your Data, a new free short course created with @hwchase17, is now available! deeplearning.ai/short-courses/… In this 1 hour course, you’ll learn how to build one of the most requested LLM-based applications: Answering questions using information from a document or…


ai4j 님이 재게시함

OpenAI Functions are very powerful! Check this conversation between customer and customer support agent (AI): [User]: Hi, I forgot when my booking is. [Agent]: Sure, I can help you with that. Can you please provide me with your booking number, customer name, and surname? 1/6


Loading...

Something went wrong.


Something went wrong.