What is Stable Diffusion
In recent years, a significant breakthrough in the realm of Artificial Intelligence has reshaped the landscape of digital art: AI-generated images. Since this development, various image generation technologies have emerged, captivating audiences and making headlines globally. Among these pioneering technologies, one open-source image generation model stands out – “Stable Diffusion.”
Stable Diffusion quickly gained traction due to its impressive capabilities and openness, inspiring a new generation of models. With its ability to generate a wide variety of styles from short, human-readable prompts, Stable Diffusion has significantly lowered the barriers to creating AI art.
But what sets Stable Diffusion apart? It offers unique features like “inpainting” and “outpainting.” Inpainting allows users to edit within the image, enabling precise alterations and adjustments. Outpainting, on the other hand, empowers users to extend the image beyond its original boundaries, perfect for creating panoramic views or expansive scenes. Stable Diffusion also supports “image-to-image prompting,” a feature that lets users create a new image based on a sourced image. It’s like having a conversation with your AI, where the source image is your prompt, and the AI responds with a completely new image.
What is Chroma and Embeddings
Let’s start with an exciting piece of technology called Chroma. Chroma is an open-source database designed specifically for handling embeddings – a type of data representation widely used in the realm of AI, and especially in the context of Large Language Models (LLMs). An LLM is an AI model that understands and generates human-like text based on the input it receives.
Chroma is like a playground for these AI models. It facilitates the development of AI applications by providing a platform for storing, querying, and analyzing media embeddings. Media could range from text to images, and in future releases, audio and video.
In Chroma, each piece of media (a text document, an image, etc.) is transformed into a mathematical representation known as an embedding. Chroma can store these embeddings along with their associated metadata, essentially turning media into a format that AI models can readily understand and interact with. By storing the embeddings, Chroma lets you easily find similar media items, analyze your media collection, and much more.
So, what are embeddings? In the simplest terms, embeddings are a way of converting words or images into numbers, specifically vectors in a multi-dimensional space. The beauty of this technique is that “similar” items are placed close together in this space. For example, in word embeddings, words with similar meanings have similar embeddings. And yes, it’s not limited to just words! You can have embeddings for sentences, paragraphs, documents, or even images.
For instance, in the context of image embeddings, similar images (like pictures of cats) would have similar embeddings and therefore be close together in the multi-dimensional embedding space. This characteristic makes embeddings a powerful tool for tasks like image recognition or recommendation systems. Now, imagine coupling this power with the image generation capabilities of Stable Diffusion – the possibilities are endless!
What is Flask HTTP framework
In the rapidly evolving landscape of web development, one framework that consistently stands out is Flask. This Python-based web framework is known far and wide for its simplicity and lightness, yet its power and flexibility make it a top choice for seasoned developers as well as beginners.
Flask is celebrated for its minimalist, pragmatic approach. It doesn’t dictate which libraries or patterns you should use, instead providing a lean framework that allows you to freely choose what fits your project best. This openness doesn’t mean it’s lacking in functionality, though. In fact, Flask comes with a rich set of features right out of the box.
For example, Flask supports routing to elegantly handle URLs, empowering you to guide your users through your site. It also offers templates that make it easy to create dynamic HTML pages, breathing life into your web application. Plus, with its support for cookies and sessions, Flask has got you covered when it comes to storing user data.
What’s truly amazing is how Flask combines all these powerful features with such a straightforward and clean design. It’s not an overstatement to say that with just a basic understanding of Python, you can quickly have a Flask web server up and running. It’s this combination of power, flexibility, and ease-of-use that makes Flask a standout choice in the world of web development frameworks.
- Basic knowledge of Python and Flask
- Access to Stability.ai API
- A Chroma database set up
- Initializing the Project
- Setting Up the Required Libraries
- Write the main File
- Testing the Basic Chatbot
- Setting Up Chroma Database
- Testing the Enhanced Chatbot
Initializing the Project
Let’s dive into the code! Our first step is to set up our project directory, which we’ll name chroma-sd. Open your favorite terminal and navigate to your projects directory. Then, create and move into the project directory with the following commands:
As responsible Python developers, we’ll create a new virtual environment for this project. This practice keeps our project’s dependencies separate from our global Python environment, which is essential when working on multiple projects with different dependencies. Furthermore, a virtual environment allows us to “freeze” our dependencies into a requirements.txt file, effectively documenting them for future reference.
Let’s go ahead and create our virtual environment:
With our virtual environment created, the next step is to activate it. The command for this varies depending on your operating system:
- For Windows users, enter the following command:
- For Linux or MacOS users, use this command instead:
After running the appropriate command, the name of your environment (env) should appear in parentheses at the start of your terminal prompt. This indicates that the environment is active and you’re ready to start developing!
Here’s an example of what an activated virtual environment looks like in the terminal:
Congratulations, you’ve successfully set up and activated your virtual environment! Let’s proceed to the next step.
Setting Up the Required Libraries
Before we dive into the coding part, let’s ensure we have all the necessary libraries installed. Our application will primarily use Flask and ChromaDB:
Flask: A lightweight and flexible Python web framework. We’ll use Flask to create a user interface for our application, handling user inputs and displaying the results.
ChromaDB: A powerful database for storing and querying embeddings. In our application, we’ll use ChromaDB to store the embeddings of images generated by Stable Diffusion, which will enable us to perform similarity searches.
Make sure you’re working with Python 3, as Python 2 has officially reached its end-of-life. You can check your Python version by typing python -version in your terminal.
Now, let’s install these libraries. We’ll use Python’s built-in package manager, pip, for this task. If you’ve set up and activated your virtual environment as discussed in the previous section, the libraries will be installed within the environment, keeping your global Python setup clean and organized. Here’s how to install Flask and ChromaDB:
Great! Now that we have our required libraries, we’re ready to start building our application.
Writing the Project Files
Now it’s time to dive back into coding! Before we start, ensure you’re in the correct directory – the root of our project.
Next, open your preferred IDE or code editor and create a new file. Since we’re working with Flask, it’s conventional to name the main file as app.py. This is because, by default, the flask run command looks for an application in a file called app.py in the current directory.
However, remember that if your main application file is named differently or located elsewhere, you can specify its location using the FLASK_APP environment variable.
In the next sections, we’ll be populating our app.py with the necessary code to set up our Flask server and define the routes for our web application.
- Importing necessary modules:
We start by importing all the necessary modules that our script will require. This includes:
- logging for error logging and debugging.
- os for interacting with the operating system, particularly for accessing environment variables and file paths.
- flask for creating and managing our web application.
- requests for making HTTP requests to the image generation API.
- dotenv for loading environment variables from our .env file.
- Setting up the logging and Flask app:
In these lines, we’re setting up logging with a level of DEBUG, which means it will capture and print all logging messages, helping us debug any issues. Next, we initialize our Flask application with Flask(__name__).
- Loading environment variables:
Here, we’re using the load_dotenv() function to load environment variables from a .env file. This file usually stores sensitive data like API keys, which should be kept out of your script to avoid unintentional exposure.
- Defining API endpoints:
Next, we define several API endpoints that handle different tasks:
Each function decorated with @app.route is associated with a specific URL path. The search_images function handles search requests from the client as well as returning a list of all image generation requests, generate handles image generation requests, and home renders the home page.
- Running the Flask app:
Finally, we ensure the Flask app runs if the script is executed directly (not imported as a module) with if __name__ == ‘__main__’. When running in development mode with debug=True, the server will automatically reload if it detects changes, and it will provide detailed error messages in case of failures.
In the next steps, we will delve deeper into the individual endpoint functions and explain how they handle their respective tasks.
Finally, please note that this example uses the CDN version of Tailwind CSS, loaded with a <script> tag. In a more complex production application, you might consider setting up Tailwind CSS as part of your build process to enable additional features and optimizations.
This script begins by setting the onload event handler, which triggers once all assets for the web page are fully loaded and rendered. In this event handler, we add click listeners to the search (searchBtn) and generate (generateBtn) buttons. These listeners trigger the sendInput and generateImages functions, respectively.
The sendInput function uses the Fetch API to send a GET request to the /api/search endpoint, appending the user’s input as a parameter. The returned promise is then converted into a JSON object, and the data is handled as required. If there’s an error, it is caught and logged to the console.
The generateImages function does a similar job but sends the request to the /api/generate endpoint. The function fetches the list of images after a new one is generated and then iterates over the list, creating a new img element for each image and appending it to the serverResponse div.
In both functions, if an error occurs, it is caught and logged to the console.
Our .env file serves to store various API keys and other settings as environment variables. This approach allows us to keep sensitive information, like API keys, out of our source code, which is a best practice for security and allows us to manage and change this information easily and safely.
This file contains four environment variables that hold the API keys for the Stability AI service and OpenAI’s embedding function, the ID of the Stability AI engine being used, and the name of the OpenAI model used for text embeddings.
Keep in mind that .env files should never be committed to public repositories because they often contain sensitive data. If you’re using version control like git, make sure to add .env to your .gitignore file to prevent it from being tracked. Always remember that keeping your sensitive data secure is crucial when developing any software application.
One important aspect of software development, especially in Python, is managing dependencies. Dependencies are external Python libraries your project relies on. One popular way to manage these dependencies is by creating a requirements.txt file.
This requirements.txt file, while optional, is considered best practice as it provides an organized list of dependencies. This allows other developers to easily install the required libraries with a single command.
To create the requirements.txt file, ensure you have activated your Python virtual environment and your current working directory is your project’s directory. Once you’re all set, run the following command:
This command lists all the libraries (and their specific versions) your virtual environment is using and redirects (“>”) this list to a file called requirements.txt.
Following this, anyone who wants to run your project will be able to install all the required dependencies using the following command:
This command reads the requirements.txt file and installs each library listed using pip, Python’s package installer. This step is typically done by someone who wants to run your project after cloning it from a repository.
Including instructions for using the requirements.txt file in your project’s README.md file, typically under a section titled “Installation” or “Getting Started”, is a good practice. It helps new users or contributors get your project up and running more quickly and with less potential for confusion or errors.
The Project’s Structure
After following the steps above, our project structure should look like this:
Each file and directory has its specific purpose in the project:
- app.py: This is where our Flask application is located.
- .env: Stores our environment variables, such as API keys.
- requirements.txt: Lists all the Python dependencies our project needs to run.
- templates/: Holds all HTML templates that our Flask application will render.
- .gitignore: This file is used to tell git which files or patterns it should ignore.
The project structure gives us a clear understanding of the project at a glance. It shows us what files and folders exist and where our main application is located. Understanding how to structure a project is an essential skill in software development, making it easier for others (and for future you) to navigate and understand your project.
In the next steps, we will revisit our app.py file to complete the backend part of our image generation app.
Here’s how the revised version of the “Completing the Endpoint Functions” section might look like with the improvements:
Completing the Endpoints Functions
We will revisit our app.py file to finalize the endpoint functions. We wrote four functions to handle requests to four different endpoints. In this section, we will complete the images() and generate() functions. The home() function will render our index.html template.
Next, the images() function returns the list of all image generation requests in JSON format.
The generate() function is the key to our application. It handles image generation requests, sends a request to the Stability.ai API, and writes the image received to a file. The function then appends the user’s prompt and the image file path to the user_requests list. Below is the finalized generate() function:
We are using HTTP status codes to represent different states of the application, providing more information to the client. We’ve also included extensive error handling and logging to debug issues.
Finally, the home() function renders our home page. This function uses the render_template function to render our index.html file and passes the user_requests list to it.
Test Running the Image Generation App
Now, let’s test our web app. In your terminal, type the following command and press enter:
If the application is configured correctly, you’ll see this output in your terminal:
Navigate to localhost:5000 in your web browser to see the user interface of our image generation app:
Let’s generate some images. Type your desired text into the input field and click the “Generate” button. The loading indicator will appear while the request is sent to the Stability.ai API. Once the generated image is received, it will display under the input field:
Hover over the generated image to see the text prompt used to create it:
If you face any issues while running the application, try checking your .env configuration and ensure all dependencies are installed properly. If the problem persists, refer to Flask and Stability.ai’s documentation for further troubleshooting.
Adding Search By Similar Term Feature
In this part of our tutorial, we’re going to add a “Search” feature. Unlike traditional search functionality, which finds results based on exact matches, we’re going to build a search feature that finds similar terms. For example, if we’ve generated images for “unicorn” and “mushroom”, we could find the unicorn image by searching for “horse”.
We’ll accomplish this using ChromaDB, a database designed for storing and querying embedding vectors. Embeddings are a type of representation that can capture the semantic meaning of data. By using ChromaDB, we can perform similarity searches in a more sophisticated and efficient manner.
Now, let’s begin by initializing ChromaDB and our embedding function:
Next, let’s integrate ChromaDB into our images() function. We’ll update the function to query ChromaDB for results based on the user’s input:
In the above code, we’ve replaced user_requests with result_list, a list populated by query results from ChromaDB. We retrieve the documents (image generation prompts) and metadata (image paths) from the query results and combine them into a dictionary for each result.
Finally, let’s update our generate() function to store the image generation prompt and the image path as metadata in ChromaDB:
In this updated code, we’re now using collection.count() to determine the next available index for our ChromaDB entries. The collection.add() method is used to insert the image generation prompts into the ChromaDB collection as embeddings, and the image paths as metadata.
This concludes our update for integrating ChromaDB. These changes provide a more sophisticated and efficient way of storing and retrieving image generation prompts and paths. We’ve also added the capability to perform similarity searches, enhancing the functionality of our application.
Please note, in a production-grade application, it would be important to add error handling around database operations. This simplified example does not include that detail.
Also, it’s worth noting that while the approach used here for indexing is appropriate for a tutorial, in a production system, you’d want to implement a more scalable solution for indexing, as querying all entries for each insertion could become slow with a large number of entries.
Testing the Search Capabilities of the Image Gallery App
To conclude this tutorial, let’s test the search capabilities of our image gallery application. Assume we’ve generated three distinct images using some imaginative prompts: a unicorn, a mushroom, and butterflies, as shown below.
Suppose a visitor, upon seeing our vibrant unicorn image, decides to search for more horse-related images in our growing gallery. They might enter “horse” into the search bar. What do you think will happen?
Amazingly, the application returns the unicorn image as the top result! This happens even though the term “horse” wasn’t in the original prompt, which was “an unicorn on a green field”. It also returns images of butterflies and a glowing mushroom. These results might seem surprising at first. However, they demonstrate how embeddings capture the semantic meaning of words, leading to nuanced search results that take into account more than just exact matches.
From the application logs, you can see how the search operation works. The embeddings for the search term and the prompts are converted into vector representations. Then, the application calculates the cosine similarity between these vectors. This measure helps determine how close the vectors — and thus, the semantic meanings of the search term and the prompts — are to each other. By ordering results by cosine similarity, the application ensures that the most semantically relevant results are returned first.
We’ve reached the end of our tutorial on creating an image generation gallery app. This application leverages the power of Stable Diffusion AI for image generation and Chroma database for embedding storage. While our use case might seem simplistic, it serves to demonstrate the potency of embedding storage when combined with image generation. This combination allows us to create an image gallery capable of semantic searches, as opposed to searching by exact keyword match.
In our current app, we’ve stuck to the basics with a text input, two buttons, and a container element for displaying images. However, the possibilities for expansion are vast. For instance, to further explore the capabilities of embedding storage and image generation, we could store the embeddings of the images themselves. This approach would allow for search functionality based on uploading similar images.
Additionally, we could introduce an “inpainting” feature, which is a form of image manipulation where you fill in part of an image using the information from the surrounding area. This would allow for creative transformations of existing images!
I want to thank you for reading this tutorial. I hope it was as educational and fun for you as it was for me to create. You can view the completed project on my Github page.