LocalGPT is a project that allows you to use GPT models to communicate with your documents on your local device. No data leaves your smartphone, and it is completely private. Using the power of LLMs, you may utilize LocalGPT to pose questions to your documents without an online connection. LocalGPT is made up of LangChain, Vicuna-7B, and Instructor Embeddings.
As businesses generate more data, the need for a secure, scalable, and user-friendly document management system will increase. LocalGPT is an intriguing new technology that can assist businesses in meeting these difficulties. We’ll provide you a step-by-step tutorial on LocalGPT in this article.
Prerequisites
- Python 3.10 or above is required to execute LocalGPT. It is incompatible with previous versions of Python.
- A C++ compiler may be required to generate a wheel during the pip install process, which may result in an error message.
- For Windows 10 and 11
- To install a C++ compiler on Windows 10/11, do the following:
- Install Microsoft Visual Studio 2022.
- Make sure you include the following elements:
- C++ CMake development tools for the Universal Windows Platform
- MinGW installer can be downloaded from the MinGW website.
- Start the setup and choose the “gcc” component.
Environment Configuration
To run the code provided, you must first install the following prerequisites:
pip install -r requirements.txt
Test dataset
Instructions for inputting your own dataset.
Put any and all of your.txt,.pdf, or.csv files into the SOURCE_DOCUMENTS directory in the load_documents() method, replacing the docs_path with the absolute path of your source_documents directory.
The current default file types are.txt,.pdf,.csv, and.xlsx; if you want to use another file type, you must convert it to one of the default file types.
To ingest all of the data, execute the following command.
python ingest.py # defaults to cuda
To specify a particular device, use the device type option.
python ingest.py -device_type cpu
For a complete list of supported devices, use help.
python ingest.py -help
It will generate an index that includes the local vector store. According to the size of your papers, this will take some time. You can upload as many documents as you wish, and they will all be stored in the local embeddings database. Delete the index if you wish to start with an empty database.
Note : The first time you run this, it will take longer because the embedding model must be downloaded. After that, it will run locally, without the need for an internet connection.
Documents related questions
To ask a question, use the following command:
python run_localGPT.py
And wait for the script to ask for your input.
> Enter a query:
enter a query Press enter. The LLM model will analyze the prompt and produce an answer. It will also display the four sources from your documents that it used as context .You can ask more questions without having to restart the script. Simply wait for the prompt to appear again.
Note : When you run this script for the first time, it will download the vicuna-7B model from the internet. You can then disconnect from the internet while still running the script inference. Your data remains in your immediate environment.
To finish the script, type exit.
To run the scripts using CPU
The ingest.py and run_localGPT.py scripts in localGPT can use your GPU by default. This causes them to run faster. If you only have a CPU, you can still execute them, but they will be slower. To accomplish this, add -device_type cpu to both scripts.
Run the following Ingestion tests:
python ingest.py -device_type cpu
To ask a question, use the following command
python run_localGPT.py -device_type cpu
How it works
Using the correct local models and the capability of LangChain, you can run the full pipeline locally, without allowing any data to leave your environment, and with respectable performance.
ingest.py analyzes the document with LangChain tools and creates local embeddings with InstructorEmbeddings. It then saves the result in a local vector database using Chroma vector storage.
run_localGPT.py understands queries and generates replies using a local LLM (Vicuna-7B in this example). The context for the replies is collected from the local vector store via a similarity search, which finds the appropriate piece of information from the documents.
This local LLM can be swapped with any other LLM from the Hugging Face. Make certain that the LLM you select is in HF format.
Benefits of Using LocalGPT
There are numerous advantages of adopting LocalGPT for document management, such as:
Also read: For a more comprehensive overview of Chatbots refer to our guide How to Create Custom Chatbots with LLMs Using OpenChat
Conclusion
Finally, LocalGPT’s advanced natural language processing capabilities are poised to transform document management. It empowers users across disciplines by providing rapid information retrieval, improving collaboration, and ensuring data privacy. Embrace LocalGPT to realize the full potential of document repositories in the digital age. Please feel free to share your thoughts and feedback in the comment section below.
LocalGPT: The Future of Document Management