Deploy Stable Diffusion Inference Service for Realistic Image Generation for Marketing

In today's digital age, marketing strategies are increasingly relying on dynamic and visually engaging content to capture consumer attention. Leveraging artificial intelligence for creating realistic images can revolutionize how brands create advertisements, social media posts, and other marketing materials. This tutorial will guide you through setting up and deploying an AI-based image generation model using Covalent Cloud, specifically focusing on generating high-quality, photorealistic images

The major benefit of using Covalent Cloud is that it allows us to deploy a production-ready image generation backend effortlessly, which can be integrated into various marketing workflows to enhance visual content dynamically..

Before you start, ensure you have the latest version of covalent-cloud installed. You can update or install it using:

pip install covalent_cloud -U

Environment Configuration

First, let's set up an environment on Covalent Cloud specifically designed for running our image generation model. This environment will include all necessary libraries such as PyTorch and Hugging Face's transformers. To learn more check here. Here's how you can create an environment in Covalent Cloud:

import covalent_cloud as cc

cc.create_env(name="stable-diffusion-env",pip=["torch","diffusers","transformers","peft","huggingface_hub"],)

Environment Already Exists.

Executor Configuration

To ensure our model runs smoothly, we will configure a cloud executor with the appropriate resources. You can set other GPUs as shown here. Here's how to configure it:

service_executor = cc.CloudExecutor(
    env="stable-diffusion-env", 
    num_cpus=12, 
    memory="100GB", 
    num_gpus=1, 
    gpu_type=cc.cloud_executor.GPU_TYPE.A100, 
    time_limit="30 minutes"
)

Model Deployment

Let's define a service that hosts the image generation model. This service will be used to generate images based on the input parameters. You can learn more about defining services here. Here's how you can deploy the model:

Note:

Since we cannot JSON serialize a PIL Image, we will convert the image to a base64 string before returning it and then decode it on the client side.

@cc.service(executor=service_executor, auth=False, name="RealVis-XL")
def image_model(model="SG161222/RealVisXL_V4.0"):
    import torch
    from diffusers import StableDiffusionXLPipeline
    from diffusers.models import AutoencoderKL

    vae = AutoencoderKL.from_pretrained(
        "madebyollin/sdxl-vae-fp16-fix",
        torch_dtype=torch.float16,
    )
    pipe = StableDiffusionXLPipeline.from_pretrained(
        model,
        vae=vae,
        torch_dtype=torch.float16,
        custom_pipeline="lpw_stable_diffusion_xl",
        use_safetensors=True,
        add_watermarker=False,
        use_auth_token=None,
        variant="fp16",
    )
    return {"pipe": pipe}


@image_model.endpoint("/generate_image")
def generate_image(
    pipe,
    prompt: str,
    negative_prompt: str = "",
    seed: int = 0,
    guidance_scale: float = 7.0,
    num_inference_steps: int = 30,
    use_upscaler: bool = False,
    upscaler_strength: float = 0.55,
    upscale_by: float = 1.5,
):
    import torch
    from diffusers import StableDiffusionXLImg2ImgPipeline
    import base64
    import io

    pipe.to("cuda")

    def seed_everything(seed):
        import random
        import numpy as np
        torch.manual_seed(seed)
        np.random.seed(seed)
        random.seed(seed)
        if torch.cuda.is_available():
            torch.cuda.manual_seed_all(seed)
        return torch.Generator().manual_seed(seed)

    generator = seed_everything(seed)

    if use_upscaler:
        upscaler_pipe = StableDiffusionXLImg2ImgPipeline(**pipe.components)
    
    if use_upscaler:
        latents = pipe(
            prompt=prompt,
            negative_prompt=negative_prompt,
            guidance_scale=guidance_scale,
            num_inference_steps=num_inference_steps,
            generator=generator,
            output_type="latent",
        ).images
        
        upscaled_latents = torch.nn.functional.interpolate(latents, scale_factor=upscale_by, mode="nearest")
        
        images = upscaler_pipe(
            prompt=prompt,
            negative_prompt=negative_prompt,
            image=upscaled_latents,
            guidance_scale=guidance_scale,
            num_inference_steps=num_inference_steps,
            strength=upscaler_strength,
            generator=generator,
            output_type="pil",
        ).images
    else:
        images = pipe(
            prompt=prompt,
            negative_prompt=negative_prompt,
            guidance_scale=guidance_scale,
            num_inference_steps=num_inference_steps,
            generator=generator,
            output_type="pil",
        ).images

    # Convert image to base64 string
    image = images[0]
    bytes_io = io.BytesIO()
    image.save(bytes_io, format='PNG')
    image_as_str = base64.b64encode(bytes_io.getvalue()).decode('utf-8')
    
    return image_as_str

For the current realistic image generation, we will use the RealVisXL from huggingface model hub

# Deploy the function service
client = cc.deploy(image_model)(model="SG161222/RealVisXL_V4.0")
image_generator = cc.get_deployment(client, wait=True)
print(image_generator)

╭────────────────────────────── Deployment Information ──────────────────────────────╮
│  Name          RealVis-XL                                                          │
│  Description   Add a docstring to your service function to populate this section.  │
│  Function ID   664c31d9f7d37dbf2a468a6a                                            │
│  Address       https://fn.prod.covalent.xyz/1664c31d9f7d37dbf2a468a6a              │
│  Status        ACTIVE                                                              │
│  Tags                                                                              │
│  Auth Enabled  No                                                                  │
╰────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [3m                                             POST /generate_image                                              [0m │
│  Streaming    No                                                                                                │
│  Description  Either add a docstring to your endpoint function or use the endpoint's 'description' parameter    │
│               to populate this section.                                                                         │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Let us quickly check the quality of the generated images using the deployed model:

Note: Here we are using the client directly to interact with the deployed model. You can also use the REST API endpoint to interact with the model. To learn more, check here

import io
import base64
from PIL import Image

prompt = "closeup portrait view of an american cow boy with cinematic lighting, photorealistic,canon mark 5"

negative_prompt = "(octane render, render, drawing, anime, bad photo, bad photography:1.3), (worst quality, low quality, blurry:1.2), (bad teeth, deformed teeth, deformed lips), (bad anatomy, bad proportions:1.1), (deformed iris, deformed pupils), (deformed eyes, bad eyes), (deformed face, ugly face, bad face), (deformed hands, bad hands, fused fingers), morbid, mutilated, mutation, disfigured ,unrealistic, cartoonish, CGI, 3D render, sketch, painting, illustration, low quality, blurry, grainy, pixelated, distorted, deformed, disfigured, out of focus, overexposed, underexposed, oversaturated, washed out, bad anatomy, bad proportions, extra limbs, missing limbs, floating limbs, disconnected limbs, mutated hands, mutated feet, fused fingers, elongated fingers, text, watermark, signature, logo, frame, border"

seed = 24
guidance_scale = 6.5
num_inference_steps = 20
use_upscaler = True
upscaler_strength = 0.52
upscale_by = 1.55


image_str = image_generator.generate_image(
    prompt=prompt,
    negative_prompt=negative_prompt,
    seed=seed,
    guidance_scale=guidance_scale,
    num_inference_steps=num_inference_steps,
    use_upscaler=use_upscaler,
    upscaler_strength=upscaler_strength,
    upscale_by=upscale_by,
)
image_arr = io.BytesIO(base64.b64decode(image_str))
raw_image = Image.open(image_arr)


import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 10))
ax.imshow(raw_image)
ax.axis("off")
plt.show()

CowBoy

Marketing Use Cases: Generating Images for Tourism Campaigns

With our realistic image generation model deployed on Covalent Cloud, we now turn to practical marketing applications. In this section, we explore the model's utility in generating high-quality images for various tourism campaigns. Such campaigns often demand captivating visuals that not only highlight the destination but also invoke a sense of adventure and allure.

Creating Images for Different Tourism Themes

We'll demonstrate how our model can be employed to create tailored images for different tourism sectors—each with unique themes like adventure, luxury, and culture. Here’s how you can utilize the model to generate images that adhere to specific marketing narratives:

travel_prompts = {
    "Adventure Tourism": "National Geographic style photo of sun-drenched canyon carved through red rock, lone hiker silhouetted against dramatic sky, weathered boots and map resting on sun-warmed boulder. (realistic, epic, dramatic, adventure)",
    "Luxury Tourism": "Architectural Digest style photo of a sprawling overwater bungalow perched above crystal-clear turquoise water. Hammock gently swaying on private deck, sunlight dappling the pristine wood floor. (luxurious, aspirational, paradise, serene)",
    "Cultural Tourism": "Golden hour photo, vibrant mosaic wall with local folklore, partially hidden by blooming flowers. Cobblestone street bathed in warm light, long shadows hinting at rich history. (cultural immersion, vibrant, timeless, storytelling)",
    "Wildlife Tourism": "Award-winning wildlife photo, majestic snow leopard perched on rocky outcrop, gaze fixed on snow-capped peaks. Faint trail of footprints in pristine snow, hinting at the rare encounter. (endangered species, raw beauty, conservation, awe-inspiring)",
    "Sustainable Tourism": "Morning light, rustic wooden lodge nestled in lush greenery, solar panels gleaming. Bicycle leaning against weathered porch, inviting exploration of surrounding nature. (eco-friendly, responsible travel, tranquil, harmonious)",
    "Culinary Tourism": "Food magazine style photo, top-down view of rustic wooden table laden with local delicacies: crusty bread, ripe fruits, steaming pot of stew. Single empty plate inviting participation. (farm-to-table, authentic cuisine, vibrant colors, shared experience)"
}


def generate_image(prompt):
    prompt+= "- correct texture, realistic, photorealistic, detailed, high quality, high resolution, natural,lifelike, sharp, crisp, Canon Mark 5"
    seed = 24
    guidance_scale = 5
    num_inference_steps = 30
    use_upscaler = True
    upscaler_strength = 0.52
    upscale_by = 1.55
    
    image_str = image_generator.generate_image(
        prompt=prompt,
        negative_prompt=negative_prompt,
        seed=seed,
        guidance_scale=guidance_scale,
        num_inference_steps=num_inference_steps,
        use_upscaler=use_upscaler,
        upscaler_strength=upscaler_strength,
        upscale_by=upscale_by,
    )
    image_arr = io.BytesIO(base64.b64decode(image_str))
    return Image.open(image_arr)

images=[]

for i, (category, prompt) in enumerate(travel_prompts.items()):
    images.append(generate_image(prompt))

Shutdown Deployment

Before looking at the images, Lets make sure to shutdown the deployment to avoid any unnecessary charges

image_generator.teardown()

'Teardown initiated asynchronously.'

Adventure Tourism

National Geographic style photo of sun-drenched canyon carved through red rock, lone hiker silhouetted against dramatic sky, weathered boots and map resting on sun-warmed boulder. (realistic, epic, dramatic, adventure)

images[0]

Adventure

Luxury Tourism

Architectural Digest style photo of a sprawling overwater bungalow perched above crystal-clear turquoise water. Hammock gently swaying on private deck, sunlight dappling the pristine wood floor. (luxurious, aspirational, paradise, serene)

images[1]

Luxury

Cultural Tourism

Golden hour photo, vibrant mosaic wall with local folklore, partially hidden by blooming flowers. Cobblestone street bathed in warm light, long shadows hinting at rich history. (cultural immersion, vibrant, timeless, storytelling)

images[2]

Cultural

Wildlife Tourism

Award-winning wildlife photo, majestic snow leopard perched on rocky outcrop, gaze fixed on snow-capped peaks. Faint trail of footprints in pristine snow, hinting at the rare encounter. (endangered species, raw beauty, conservation, awe-inspiring)

images[3]

Wildlife

Sustainable Tourism

Morning light, rustic wooden lodge nestled in lush greenery, solar panels gleaming. Bicycle leaning against weathered porch, inviting exploration of surrounding nature. (eco-friendly, responsible travel, tranquil, harmonious)

images[4]

Sustainable

Culinary Tourism

Food magazine style photo, top-down view of rustic wooden table laden with local delicacies: crusty bread, ripe fruits, steaming pot of stew. Single empty plate inviting participation. (farm-to-table, authentic cuisine, vibrant colors, shared experience)

images[5]

Culinary

Conclusion

This tutorial demonstrates how to deploy and utilize an AI-powered image generation model on Covalent Cloud for creating stunning marketing visuals. Experiment with different prompts and configurations to fully explore the potential of realistic image generation in your marketing campaigns. Note that this is just a starting point, and you can further customize the model to suit your specific requirements. With Covalent Cloud, you can also fine-tune and train the model on specific datasets to generate images that align perfectly with your brand's vision and marketing goals as well, right from Python and without worrying about the infrastructure.

Deploy Stable Diffusion Inference Service for Realistic Image Generation for Marketing

Environment Configuration​

Executor Configuration​

Model Deployment​

Marketing Use Cases: Generating Images for Tourism Campaigns​

Creating Images for Different Tourism Themes​

Shutdown Deployment​

Adventure Tourism​

Luxury Tourism​

Cultural Tourism​

Wildlife Tourism​

Sustainable Tourism​

Culinary Tourism​

Conclusion​