Deploy AI Apps with Cloudflare

Learn how to leverage Cloudflare's powerful edge computing and security features to deploy AI applications with high availability and low latency.

Dec 07, 2024

Introduction

Artificial Intelligence (AI) is transforming the way developers build and deploy applications. As AI capabilities continue to grow, businesses are increasingly adopting machine learning models, natural language processing (NLP), and other AI features in their applications. These features enable the creation of smarter, more intuitive apps, whether it's for chatbots, recommendation systems, or automated content generation. However, while the potential is vast, deploying AI apps can often be complex, requiring substantial computational resources and infrastructure management.

This is where Cloudflare steps in, offering a suite of tools and services that make it easier for developers to deploy AI apps globally, while leveraging its fast, secure, and highly scalable infrastructure. With Cloudflare Workers, you can build serverless applications that run at the edge of the network, reducing latency and enhancing performance. Cloudflare Workers AI enables you to incorporate powerful AI models into your applications without worrying about managing servers or extensive infrastructure.

In this article, we will explore how Cloudflare's Full-Stack AI Building Blocks can help developers deploy scalable AI applications with ease. We will walk through the process of building a simple "Hello World" AI application using the Hono framework and deploying it to Cloudflare Workers.

What are Full-Stack AI Building Blocks on Cloudflare?

Cloudflare provides a robust platform for deploying AI-powered applications by combining serverless computing with advanced AI capabilities. This infrastructure simplifies and scales the integration of AI into applications.

Key components of Cloudflare's AI ecosystem include:

Cloudflare Workers AI: Enables developers to deploy AI models directly on Cloudflare's global network. It allows for inference tasks with pre-trained models, such as natural language processing (NLP) and image recognition.
AI Models: Cloudflare offers a catalog of popular models like Llama-2, Whisper, and ResNet50, which developers can integrate into their apps for advanced AI functionalities.
Vectorize: A globally distributed vector database for generating and storing embeddings. It’s ideal for AI tasks like search, recommendations, and anomaly detection.
AI Gateway: Provides control over AI applications with features like caching, rate limiting, and analytics, ensuring reliability, scalability, and cost efficiency.
R2 Storage: Offers cost-effective, egress-free storage for large datasets, making it perfect for training custom AI models or moving data between cloud environments.

Create “Hello World” Hono App

Before we dive into deploying our AI, let’s first set up a local development environment. To do this, we'll use Hono, a lightweight, fast, and easy-to-use web framework that is perfect for serverless environments like Cloudflare Workers. Hono is designed to provide developers with a simple way to build APIs and applications that are optimized for the edge, making it a great choice for creating our "Hello World" AI app.

To get started, open your terminal and run the following command to create a new Cloudflare project:

npm create cloudflare@latest

Upon running the command, you'll be prompted with a few questions like:

Directory Setup: Choose the directory where you want to create the app. For this example, we'll call it ai-demo:
```
dir ./ai-demo
```
Framework Selection: Select Hono from the list of frameworks. You will see a list of available frameworks, and Hono is one of the options. It’s a popular framework for serverless applications due to its speed and simplicity.
```
Select from the most popular full-stack web frameworks
Which development framework do you want to use?
→ Hono
```

Complete the steps and Cloudflare will generate a new directory called ai-demo, where all your project files will reside. The project is now set up with Hono as your framework of choice. You should see this:

After the setup, navigate to the newly created ai-demo directory and open it in your preferred code editor.

# Navigate to the "ai-demo" directory
cd ai-demo

# Open the current directory in Visual Studio Code
code .

You’ll notice the index.ts file, where the core application logic will live. Here’s the simple code for our "Hello World" app using the Hono framework:

import { Hono } from 'hono'

const app = new Hono()

app.get('/', (c) => c.text('Hello World!'))

export default app

You can run the local server with the following command:

npm run dev

Head over to http://localhost:8787 in your browser, and you should see your "Hello World" app running locally. This step demonstrates how easy it is to get started with the Hono framework in Cloudflare Workers.

Integrate AI with Cloudflare

Now that we’ve set up the basic app, it’s time to add some AI power to it! Cloudflare Workers AI allows us to run inference on models like Llama-3.1-8b-instruct, hosted on Cloudflare’s global network. You don’t need to manage the infrastructure—Cloudflare takes care of it for you. It also provides a generous free tier, making it accessible for developers to experiment and build with minimal cost.

To enable AI features, you need to configure your project’s wrangler.toml file. Uncomment the AI section to connect your Worker to Cloudflare’s AI models:

Next, let’s add the AI-powered feature to the app. We’ll create an asynchronous function that interacts with a pre-trained model from Cloudflare’s AI catalog (e.g., Llama-3.1).

Here’s how you can add the function in index.ts:

import { Hono } from 'hono'

const app = new Hono<{ Bindings: CloudflareBindings }>()

app.get('/', (c) => {
  return c.text('Hello Hono!')
})

app.get('/hello-ai', async(c) => {
  const result = await c.env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    messages: [
      { role: 'user', content: 'Say Hello world in five different languages: English, Spanish, French, German, Chinese'}
    ] 
  })
  return c.json(result)
})

export default app

This code defines two routes: one at / that returns a simple "Hello Hono!" message, and another at /hello-ai that queries the AI model and returns the output in JSON format. The AI model will translate "Hello World" into five different languages based on the input message.

If you're using VSCode and see an error with the c.env.AI.run method, you may need to generate type definitions for Cloudflare bindings. To do this, run the following command in your terminal:

npm run cf-typegen

This command will generate the necessary type definitions for your project, which should resolve the error.

With your application code updated, it's time to deploy the app to Cloudflare Workers. Run the following command to deploy:

npm run deploy

If your Cloudflare account isn’t connected yet, running this command will trigger a web-based authentication process. You will be redirected to log in to Cloudflare, where you will authorize wrangler to access your account. Once the authentication is complete, you will see the message: "You have granted authorization to Wrangler!"

This confirms that your Cloudflare account is now connected and authorized for deployments.

After a successful deployment, you will receive a URL for your deployed application. This URL will look something like this:

https://workers-ai.your-subdomain.workers.dev

You can visit this URL in your browser to access your "Hello World" app with integrated AI functionality. When you visit /hello-ai, you will get the response generated by the Llama AI model, saying "Hello World" in five different languages.

Congratulations! You’ve successfully deployed your AI application.

With just a few lines of code, Cloudflare Workers AI and the Hono framework allow you to easily integrate powerful AI models into your applications. Whether you are building a simple "Hello World" app or a complex AI-powered solution, Cloudflare’s infrastructure makes it easy to deploy and scale applications globally.

Vectorize: Simplifying AI with a Global Vector Database

Cloudflare’s Vectorize is a powerful tool designed to enable developers to build AI applications using vector embeddings—numerical representations of objects such as text, images, and audio. By integrating Vectorize into your AI application, you can perform tasks such as semantic search, recommendation systems, anomaly detection, and more.

Vector embeddings play a crucial role in AI workflows by allowing models to compare and understand relationships between different pieces of data. For example, you could generate embeddings using Workers AI or third-party services like OpenAI, and store them in Cloudflare's globally distributed vector database. These embeddings could then be used for sophisticated queries to fetch results, such as retrieving similar documents or recommending items to users.

The beauty of Vectorize lies in its seamless integration with Cloudflare's infrastructure. By storing your embeddings in Cloudflare R2 storage, KV, or D1, you can create end-to-end AI applications that leverage powerful search and data management functionalities—without needing additional infrastructure.

R2 Storage: Cost-Effective, Scalable Storage for AI Workflows

Cloudflare's R2 Storage offers a unique advantage for developers dealing with large amounts of unstructured data, such as AI model training datasets or results from machine learning models. Unlike traditional cloud storage providers, R2 eliminates egress fees, making it a cost-effective solution for AI workloads that require extensive data movement and storage.

The S3-compatible API ensures you have the flexibility to use existing tools, libraries, and extensions, making it easier to integrate into your current infrastructure. For AI models that require access to vast datasets, R2’s egress-free policy is a significant cost-saving feature.

In addition, R2 helps optimize the storage and delivery of dynamic content, enhancing AI applications that rely on real-time data. Whether you're building AI for image classification, NLP, or recommendation systems, R2 provides the storage backbone necessary for seamless, scalable performance.

AI Gateway: Empowering AI Apps with Visibility and Control

Cloudflare's AI Gateway gives developers the tools to observe and control their AI applications, enabling them to optimize performance, reduce costs, and improve scalability. By integrating AI Gateway into your AI app, you can gain visibility through analytics and logging, helping you understand how users are interacting with your models.

AI Gateway also includes features like caching, rate limiting, and model fallback to improve resilience and reduce unnecessary costs. For example, you can cache frequent queries to reduce model usage, control how many requests each user can make, and define fallback models to ensure that your AI app remains available even if one model experiences an issue.

AI Gateway also supports a variety of popular AI providers, including Workers AI, OpenAI, Azure OpenAI, and Hugging Face, allowing you to easily integrate multiple AI models and manage their scaling with minimal configuration.

AI Firewall: Protecting Your AI Apps from Malicious Attacks

As AI models become more integrated into applications, security becomes increasingly important. Cloudflare is developing Firewall for AI, a specialized layer of protection for Large Language Models (LLMs). This firewall aims to identify and block potential abuses and attacks before they reach your AI models. The growing use of LLMs in connected applications exposes new vulnerabilities, such as prompt injections and model denial-of-service attacks. Firewall for AI works by analyzing incoming prompts for harmful content, blocking malicious requests, and preventing unauthorized access to sensitive information stored within models.

Moreover, Firewall for AI can protect against the unintended leakage of sensitive data, a concern when LLMs are used in public-facing applications. Cloudflare’s ability to offer this protection at the edge of its global network ensures that these security measures are applied quickly and efficiently, reducing the risk of data breaches and misuse.

Bringing It All Together: A Scalable and Secure AI Ecosystem

Cloudflare's ecosystem of tools — Workers AI, Vectorize, R2 Storage, AI Gateway, and AI Firewall — offers a powerful, integrated solution for building, deploying, and managing AI-powered applications at scale.

Cloudflare’s AI tools give developers a reliable, cost-effective, and scalable platform for integrating cutting-edge AI functionality into their applications — all while maintaining performance, security, and compliance with data privacy standards. With these tools at your disposal, you can quickly and easily build AI-driven apps that are ready for deployment and growth.

Codegiant’s One-Click SaaS Template is a full-stack solution designed to rapidly deploy and scale applications. With integrating Cloudflare AI, serverless hosting, and caching, it optimizes performance and scalability. The centralized monorepo setup simplifies managing front-end, back-end, and resources, while integrated tools for authentication, billing, analytics, and SEO save you time. A ready-to-use Shadcn/UI template with Tailwind CSS lets you focus on growth, not setup.

Deploy in minutes with one-click, ensuring your app scales as your SaaS grows. Plus, Codegiant is also working on integrating Hono natively, further enhancing the template’s capabilities. With Codegiant's GenIE, a coding assistant like Copilot, you can build even faster, streamlining development and accelerating your product's growth.

Codegiant