LocalAI is a drop-in replacement REST API for local inference that is consistent with OpenAI API requirements. It enables models to be run locally or on-prem using consumer-grade hardware and supports different model families that are compatible with the ggml format.
LocalAI: A Drop-In Replacement for OpenAI’s REST API
LocalAI is a drop-in replacement REST API for local inference that is consistent with OpenAI API requirements. It enables models to be run locally or on-prem using consumer-grade hardware and supports different model families that are compatible with the ggml format.
Local AI is still in its early stages, but it has the potential to be a useful tool for developers and researchers that need to run AI models locally.
Usage
LocalAI comes by default as a container image.
The simplest approach to start LocalAI is using docker-compose:
git clone https://github.com/go-skynet/LocalAI cd LocalAI # (optional) Checkout a specific LocalAI tag # git checkout -b build <TAG> # copy your models to models/ cp your-model.bin models/ # (optional) Edit the .env file to set things like context size and threads # vim .env # start with docker-compose docker-compose up -d -pull always # or you can build the images with: # docker-compose up -d -build # Now API is accessible at localhost:8080 curl http://localhost:8080/v1/models # {“object”:”list”,”data”:[{“id”:”your-model.bin”,”object”:”model”}]} curl http://localhost:8080/v1/completions -H “Content-Type: application/json” -d ‘{ “model”: “your-model.bin”, “prompt”: “A long time ago in a galaxy far, far away”, “temperature”: 0.7 }’
Docker
Example of starting the API with `docker`:
docker run -p 8080:8080 -ti -rm quay.io/go-skynet/local-ai:latest -models-path /path/to/models -context-size 700 -threads 4
Build locally:
In order to build the Local AI container image locally you can use docker:
# build the image docker build -t LocalAI . docker run LocalAI
Or you can build the binary with make:
make build
Build on mac
Building on Mac (M1 or M2) works, but you may need to install some prerequisites using brew.
The below has been tested by one mac user and found to work. Note that this doesn’t use docker to run the server:
# install build dependencies brew install cmake brew install go # clone the repo git clone https://github.com/go-skynet/LocalAI.git cd LocalAI # build the binary make build # Download gpt4all-j to models/ wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j # Use a template from the examples cp -rf prompt-templates/ggml-gpt4all-j.tmpl models/ # Run LocalAI ./local-ai -models-path ./models/ -debug # Now API is accessible at localhost:8080 curl http://localhost:8080/v1/models curl http://localhost:8080/v1/chat/completions -H “Content-Type: application/json” -d ‘{ “model”: “ggml-gpt4all-j”, “messages”: [{“role”: “user”, “content”: “How are you?”}], “temperature”: 0.9 }’
Run LocalAI in Kubernetes
LocalAI can be installed inside Kubernetes with helm.
Add the helm repo:
helm repo add go-skynet https://go-skynet.github.io/helm-charts/
Create a values.yaml file with your settings:
deployment: image: quay.io/go-skynet/local-ai:latest env: threads: 4 contextSize: 1024 modelsPath: “/models” # Optionally create a PVC, mount the PV to the LocalAI Deployment, # and download a model to prepopulate the models directory modelsVolume: enabled: true url: “https://gpt4all.io/models/ggml-gpt4all-j.bin” pvc: size: 6Gi accessModes: – ReadWriteOnce auth: # Optional value for HTTP basic access authentication header basic: “” # ‘username:password’ base64 encoded service: type: ClusterIP annotations: {} # If using an AWS load balancer, you’ll need to override the default 60s load balancer idle timeout # service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: “1200”
Install the helm chart:
helm repo update helm install local-ai go-skynet/local-ai -f values.yaml
Benefits of using Local AI
- Reduced latency: By removing the requirement to make queries to a distant server, Local AI can minimize latency. This is particularly useful for applications requiring real-time reactions, such as gaming and robotics.
- Improving privacy: By storing data on the local device, Local AI can increase privacy. This is particularly useful for applications dealing with sensitive data, such as healthcare and banking.
- Saving money: By removing the need to pay for cloud computing resources, Local AI can save money. This is especially useful for apps that are only used occasionally or need to analyze tiny quantities of data.
If you want to run AI models locally, Local AI is an excellent choice. It is a useful tool for improving the performance, security, and cost-effectiveness of your applications.
Also Read e2b: Create your Virtual Software Developer using AI.
Features of LocalAI
- Simple to use: LocalAI is simple to use, even for novices. The documentation is straightforward and concise, and there is a strong user community eager to assist.
- Powerful: LocalAI is an extremely strong tool that may be used to create complicated AI applications. It is still in the works, but it has the potential to change the way AI is built.
- Flexible: Local AI is adaptable and can be used to construct AI applications in a wide range of languages and frameworks.
Local AI is an excellent choice if you need a strong and adaptable tool to run AI models locally. It is simple to use and has a huge number of users that are eager to assist.
This article is to help you learn Local AI. We trust that it has been helpful to you. Please feel free to share your thoughts and feedback in the comment section below.
LocalAI: A Drop-In Replacement for OpenAI’s REST API