InternGPT is a new way to interact with ChatGPT. It allows you to use pointing and language to control ChatGPT. This makes it a more powerful and flexible tool for visual communication with chatbots.
What is InternGPT
InternGPT (short for iGPT) is a pointing-language-driven visual interactive system that allows you to engage with ChatGPT by using a pointing device to click, drag, and create. InternGPT is an acronym that stands for interaction, nonverbal, and ChatGPT.
InternGPT is an open-source project created by researchers from the University of Chinese Academy of Sciences’ OpenGVLab. It is built on OpenAI’s ChatGPT model, which is a large language model chatbot.

InternGPT may be used to generate text, translate languages, create other types of creative material, and provide helpful answers to your questions. It may also communicate with other programs, such as image editing software.
InternGPT is still in the works, but it has the potential to be a powerful tool for a wide range of jobs.
How InternGPT works
InternGPT interacts with ChatGPT by mixing pointing instructions with natural language commands. InternGPT creates a pointing instruction that defines the action when a user clicks, drags, or draws on an image or video. This command is subsequently transmitted to ChatGPT, which employs it to construct a response.
Installation
1. Basic Requirements:
Check that your system has the following minimum prerequisites installed:
- Linux
- Python 3.8+
- PyTorch 1.12+
- CUDA 11.6+
- GCC & G++ 5.4+
- GPU Memory >= 17G for loading basic tools (HuskyVQA, SegmentAnything, ImageOCRRecognition)
Please make sure you have the appropriate versions of these dependencies installed before proceeding.
2. Create and Activate Python Environment:
To create and activate a Python environment for iChat, open your terminal and run the following commands:
conda create -n ichat python=3.8 conda activate ichat
3. Install Python Dependencies:
Run the following command inside the ichat environment to install the necessary Python dependencies using pip:
pip install -r requirements.txt
This command will install all of the Python packages indicated in requirements.txt.
Running the iChat Gradio Service
Please follow the procedures below to get started with the iChat system and launch the Gradio service:
1. Starting the Gradio Service:
Run the following command in your terminal:
python -u app.py -load “HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0” -port 3456
The Gradio service for the iChat system will be launched using this command. It loads the required components (HuskyVQA, SegmentAnything, and ImageOCRRecognition) and listens on port 3456 on the given CUDA device (cuda:0).
2. Enabling Voice Assistant (Optional):
Follow these extra steps to enable the voice assistant feature:
- Create a directory named “certificate” using the command: mkdir certificate.
- Generate the certificate using OpenSSL by running the following command:
openssl req -x509 -newkey rsa:4096 -keyout certificate/key.pem -out certificate/cert.pem -sha256 -days 365 -nodes
To start the Gradio service with HTTPS, use the following command after generating the certificate:
python -u app.py -load “HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0” -port 3456 -https
User Manual Update
System Features:
- GPT (Generative Pre-trained Transformer): A GPT model, which is a language model capable of creating human-like text depending on specified prompts or inputs, powers the system.
Supported Features:
- InternGPT supports DragGAN:
- To begin the DragGAN process, click the “New Image” button.
- To set the start and finish positions, click on the picture. The initial point is blue, while the finish point is red.
- Make sure the number of blue points equals the number of red points.
- To begin the editing process, click the “Drag It” button.
- Following processing, you will receive an altered photograph as well as a movie that depicts the editing process.
- InternGPT supports ImageBind:
- To create a new image from a single audio file:
- Send a message like: “generate a real image from this audio.”
- To create a new image with audio and text:
- Send a message like: “create a real image from this audio and {your prompt}.”
- To create a new image using audio and an existing image:
- Upload an image and then send a message like: “generate a new image from the above image and audio.”
- To create a new image from a single audio file:
Main Features:
- Multi-Modal Dialogue:
- After you’ve uploaded an image, you may engage in a multi-modal debate by sending image-related messages.
- For example, you may use prompts like “What is it in the image?” to ask questions regarding the image. or “What color is the background of the image?”
- Interactive Image Operations:
- You can see the segmented section by clicking the image and then pressing the “Pick” button.
- The “OCR” button allows word recognition at the selected place.
- Image Editing:
- To remove the masked zone from a picture, send a message that says “Remove the masked region.”
- Send a message like “Replace the masked region with {your prompt}.” to replace the masked region in the image.
- These commands allow you to change the picture based on the locations you provide.
- Image Generation:
- Send a message that says something like, “Generate a new image based on its segmentation describing your prompt.”
- This command creates a new picture based on the image’s segmentation and the prompt supplied.
- Scribble-Based Image Creation:
- To access the drawing board, use the “Whiteboard” button.
- Draw the image you want on the board.
- To save the drawn picture, click the “Save” button.
- Send a message that says something like, “Generate a new image based on this scribble describing {your prompt}.” to generate a new artwork based on the scribbling and a specified prompt.
Also Read Tinygrad: Revolutionizing Deep Learning with Lightweight Efficiency.
This article is to help you learn about internGPT. We trust that it has been helpful to you. Please feel free to share your thoughts and feedback in the comment section below.
InternGPT: A New Way to Interact with ChatGPT