What you’ll learn
In this tutorial you’ll learn how to:- Create a FastAPI application to serve your API endpoints.
- Implement proper health checks for your workers.
- Deploy your application as a load balancing Serverless endpoint.
- Test and interact with your custom APIs.
Requirements
Before you begin you’ll need:- A Runpod account.
- Basic familiarity with Python and REST APIs.
- Docker installed on your local machine.
Step 1: Create a basic FastAPI application
First, let’s create a simple FastAPI application that will serve as our API. Create a file namedapp.py:
- A health check endpoint at
/ping - A text generation endpoint at
/generate - A statistics endpoint at
/stats
Step 2: Create a Dockerfile
Now, let’s create aDockerfile to package our application:
requirements.txt file:
Step 3: Build and push the Docker image
Build and push your Docker image to a container registry:Step 4: Deploy to Runpod
Now, let’s deploy our application to a Serverless endpoint:- Go to the Serverless page in the Runpod console.
- Click New Endpoint
- Click Import from Docker Registry.
- In the Container Image field, enter your Docker image URL:
Then click Next.
- Give your endpoint a name.
- Under Endpoint Type, select Load Balancer.
- Under GPU Configuration, select at least one GPU type (16 GB or 24 GB GPUs are fine for this example).
- Leave all other settings at their defaults.
- Click Create Endpoint.
Step 5: Access your custom API
Once your endpoint is created, you can access your custom APIs at:- Health check:
https://ENDPOINT_ID.api.runpod.ai/ping - Generate text:
https://ENDPOINT_ID.api.runpod.ai/generate - Get request count:
https://ENDPOINT_ID.api.runpod.ai/stats
ENDPOINT_ID and RUNPOD_API_KEY with your actual endpoint ID and API key:
(Optional) Advanced endpoint definitions
For a more complex API, you can define multiple endpoints and organize them logically. Here’s an example of how to structure a more complex API:Troubleshooting
Here are some common issues and methods for troubleshooting:- No workers available: If your request returns
{"error":"no workers available"}%, this means means your workers did not initialize in time to process the request. Running the request again will usually fix this issue. - Worker unhealthy: Check your health endpoint implementation and ensure it’s returning proper status codes.
- API not accessible: If your request returns
{"error":"not allowed for QB API"}, verify that your endpoint type is set to “Load Balancer”. - Port issues: Make sure the environment variable for
PORTmatches what your application is using, and that thePORT_HEALTHvariable is set to a different port. - Model errors: Check your model’s requirements and whether it’s compatible with your GPU.