OpenFaaS - Serverless Functions Made Simple

How to Protect Your Data with Self-Hosted LLMs and OpenFaaS Edge

2025-04-16T00:00:00+00:00

The rise of hosted LLMs has been meteoric, but many Non-Disclosure Agreements (NDAs) would prevent you from using them. We explore how a self-hosted solution protects your data.

This post at a glance:

Pros and cons of hosted vs. self-hosted LLMs.
Bill of materials for a PC with cost-effective Nvidia GPUs
Configuration for OpenFaaS Edge with Ollama
Sample function and test data for categorizing cold outreach emails
Past posts on AI and LLMs from OpenFaaS and sister companies

Why Self-Hosted LLMs?

Self-hosted models are great for experimentation and exploring what is possible, without having to worry about how much your API calls are costing you ($$$). Practically speaking, they are the only option if you are dealing with Confidential Information covered by an NDA.

The definition of Confidential Information varies by NDA, but it usually includes any information that is not publicly available, and that you would not want to be made public. This could include customer data, employee names, organisational charts, source-code, designs and schematics, trade secrets, or any other sensitive information.

Even if the data you want to process via an LLM is not protected under an NDA, if you work for a regulated company or a Fortune 500 enterprise, it’s likely that you will be required to use models hosted on-premises or in a private cloud.

Pictured: A computer eating private data in a datacenter, generated by Grok.

There are pros and cons to both self-hosted, and hosted LLMs for inference.

Pros for hosted models:

Require no capital expenditure (CapEx) on GPUs, or dedicated hardware
Can be invoked via API and paid for based upon tokens in/out
You can use the largest models available, which would cost tends of thousands of dollars to run locally
You get to access the best in class proprietary models, such as GPT-4, Claude, and Gemini

Downsides for hosted models:

Costs can be unpredictable, and can spiral out of control
You have no control over the model, and it can be changed or removed at any time
You have no control over the data, and it can be used to train the model - opting out may require an enterprise agreement
When used with customer data, it will almost certainly breach any NDA you have with your enterprise customers

Pros for self-hosted models:

Tools such as Ollama, llama.cpp, LLM Studio and vLLM make it trivial to run LLMs locally
A modest investment in 1 or 2 Nvidia GPUs such as 3060 or 3090 can give you access to a wide range of models
Running on your own hardware means there are no API costs - all you can eat
You have full control over the model, and can choose to use open source models, or your own fine-tuned models
You have full control over the data, and can choose to keep it on-premises or in a private cloud

Cons for self-hosted models:

The GPUs will need a dedicated machine or server to be set up and managed
The GPUs may become obsolete as the pace of innovation in LLMs accelerates requiring many more GB of VRAM to run the latest models
The results of self-hosted models are nowhere as good as the hosted models - which may also make tool calls to search the Internet and improve their results
Tool calling is usually not available on smaller models, or works poorly

Build of materials for a PC

For our sister brand actuated.com, we built a custom PC to show how to leverage GPUs and LLMs during CI/CD with GitHub Actions and GitLab CI.

The build uses an AMD Ryzen 9 5950X 16-Core CPU with 2x 3060 GPUs, 128GB of RAM, 1TB of NVMe storage, and a 1000W power supply.

It made practical sense for us to build a PC with consumer components, however you could just as easily build an affordable server using components from Supermicro, or even run a used PowerEdge server acquired from a reseller. Ampere’s range of Arm servers and workstations report good performance whilst running inference workloads purely on CPU.

Around 9 months later, we swapped the 2x 3060 GPUs for 2x 3090s taking the VRAM from 24GB total to 48GB total when both GPUs are allocated.

For this post, we allocated one of the two 3090 cards to a microVM, then we installed OpenFaaS Edge.

At the time of writing, a brand-new Nvidia 3060 card with 12GB of VRAM is currently available for around 250 GBP as a one-off cost from Amazon.co.uk. If you use it heavily, will pay for itself in a short period of time compared to the cost of API credits.

How to get started with OpenFaaS Edge

OpenFaaS Edge is a commercial distribution of faasd, which runs on a VM or bare-metal devices. It’s easy to setup and operate because it doesn’t include clustering or high-availability. Instead it’s designed for automation tasks, ETL, and edge workloads, which are often run on a single device.

Whilst there are various options for running a model locally, we chose Ollama because it comes with its own container image, and exposes a REST API which is easy to call from an OpenFaaS Function.

In our last post Eradicate Cold Emails From Gmail for Good With OpenAI and OpenFaaS, we showed a workflow for Gmail / Google Workspace users to filter out unwanted emails using OpenAI’s GPT-3.5 model. The content in the article could be used with OpenFaaS on Kubernetes, or OpenFaaS Edge.

We’ll focus on the same use-case, and I’ll show you a simplified function which receives an input and makes a call to the local model. It’ll then be up to the reader to retrofit it into the existing solution, if that’s what they wish to do. If you use another email provider, if they have an API, then you can adapt the code for i.e. Hotmail etc.

Install OpenFaaS Edge

Use the official instructions to install OpenFaaS Edge on your VM or bare-metal device, you can use any Linux distribution, but we recommend Ubuntu Server LTS.

Activate your license using your license key or GitHub Sponsorship.

Install the Nvidia Container Toolkit

Follow the instructions for your platform to install the Nvidia Container Toolkit. This will allow you to run GPU workloads in Docker containers.

Installing the Nvidia Container Toolkit

You should be able to run nvidia-smi and see your GPUs detected.

Add Ollama to OpenFaaS Edge

Ollama is a stateful service which requires a large amount of storage for models. We can add it to the docker-compose.yaml file so that OpenFaaS Edge will start it up and manage it.

Edit: /var/lib/faasd/docker-compose.yaml:

services:  
  ollama:
    image: docker.io/ollama/ollama:latest
    command:
      - "ollama"
      - "serve"
    volumes:
      - type: bind
        source: ./ollama
        target: /root/.ollama
    ports:
      - "127.0.0.1:11434:11434"
    gpus: all
    deploy:
      restart: always

Restart faasd:

sudo systemctl daemon-reload
sudo systemctl restart faasd

You can perform a pre-pull of a model using the following, run directly on the host:

curl -X POST http://127.0.0.1:11434/api/pull -d '{"name": "gemma3:4b"}'

The gemma3 model is known to work well on a single GPU. We’ve used the 4b version, but you can go smaller or larger if you like. Some experimentation may be required to find a model and parameter size that matches your specific needs.

If you wish to make this manual step a bit more automated, you can use an “init container” which runs after ollama has started.

ollama-init:
  image: docker.io/alpine/curl:latest
  command:
    - "curl"
    - "-s"
    - "-X"
    - "POST"
    - "http://ollama:11434/api/pull"
    - "-d"
    - '{"name":"gemma3:4b"}'
  depends_on:
    - ollama

Create a function to call the model

You can use the documentation to learn how to create a new function using a template such as Python.

We used Python in the previous post, but you can use any language that you like - with an existing template, or with one that you write for yourself.

Update requirements.txt with the requests HTTP client:

+requests

Then create a function handler which will call the model. The model is called llama2 in this example, but you can use any model that you have installed.

Here’s an example handler:

import requests
import json

def handle(event, context):
    url = "http://ollama:11434/api/generate"
    payload = {
        "model": "gemma3:4b",
        "stream": False,
        "prompt": str(event.body)
    }
    headers = {
        "Content-Type": "application/json"
    }

    response = requests.post(url, data=json.dumps(payload), headers=headers)

    # Parse the JSON response
    response_json = response.json()

    return {
        "statusCode": 200,
        "body": response_json["response"]
    }

As you can see, we can access Ollama via service discovery using its name as defined in docker-compose.yaml.

The example is simple, it just takes the body, forms a request payload and returns the result from the Ollama REST API.

However you could take this in any direction you wish:

Include the API call in a workflow, or a chain of functions to decide the next action
Trigger an API call to your email provider to mark the message as spam, or important
Save the result to a database or into S3, to filter out future messages from the same sender
Send a message with a confirmation button to Slack message or Discord for final human approval

Email is just one use-case, now that we have a working private function and private self-hosted LLM, we can send it any kind of data.

An emerging use-case is to take podcast episodes, transcribe them, and then to provide deep searching and chat capabilities across episodes and topics.

Deploy and invoke the function

You can now deploy and invoke the function:

faas-cli up

Here is a genuine cold outreach email I got:

You are a founder of a startup, and a target for cold outreach email, spam, and nuisance messages. Use the best of your abilities to analyze this email, be skeptical, and ruthless. Respond in JSON with a categorization between -1 and 1, and a reason for your categorization in one sentence only. The categorization should be one of the following:

-1: Spam
0: Neutral
1: Legitimate

{
  "categorization": 0,
  "reason": "The email is a generic outreach message that does not provide any specific value or relevance to the recipient. It lacks personalization and seems to be part of a mass email campaign."
}


Subject: Alex, Quick idea for your LinkedIn

Body:
Hi Alex
Quick message to say hello 👋 and tell you about a new service for founders.

We can transform a 30-minute interview with you into a month of revenue-generating LinkedIn content through strategic repurposing. Here's what a month could like for you...

4 short video clips - 30-60 second highlights with captions
12 professionally written LinkedIn posts with images and graphics
1 long-form LinkedIn article - In-depth piece highlighting key insights

If you want to drive revenue with LinkedIn, that's what we do at Expert Takes

Reply if you'd like to learn more:)

Have a great day!
Bryan Collins
Director Expert Takes
No longer interested in these messages? Unsubscribe

Save the above as “email.txt”, then invoke the function with the email as input:

cat ./email.txt | faas-cli invoke filter-email

Here’s the result I received, which took 0m1.391s:

{
  "categorization": -1,
  "reason": "The email employs generic language, lacks specific details about the recipient's business, and utilizes a high-pressure, 'transformative' sales pitch, strongly indicating it's a spam or low-quality marketing message."
}

If you’d like to invoke the function via curl, run faas-cil describe filter-email to get the URL.

Let’s try another email, this time, you’ll need to repeat the prompt, edit email.txt:

Subject: Refurbished Herman Miller Aeron Office Chairs for Openfaas Ltd

Body:

Dear Alex,

I am writing this email to introduce our wonderful deals on Refurbished Herman Miller Aeron Office Chairs, which we are discounted by up to 70% on the price of new ones!

Would you like to slash the cost of your office refurbishment by purchasing high quality chairs that will last for years?

The Aeron Office Chair is one of the best on the market and these are literally a fraction of the new price.

We have sizes A ,B & C in stock too with prices starting from just £450 each!

See our current stock here

All our chairs come with 12 months warranty on all parts, have a 14 day money-back guarantee and we provide a nationwide delivery service.

Discover more here

Kind Regards,

Michael Watkins

MW Office Furniture

Result:

{
  "categorization": -1,
  "reason": "This email employs a highly generic sales pitch for refurbished furniture, lacks any specific connection to Openfaas Ltd, and uses common sales tactics likely associated with spam."
}

Now whenever you’re doing any kind of testing, it’s just as important to do a negative test as it is a positive one.

So if you were planning on using this code, make sure that you get a categorization of 1 for a legitimate email from one of your customers.

Invoke the function asynchronously for durability and scale

Many of us have grown used to API calls taking milliseconds to execute, particularly in response to events such as webhooks. However LLMs can take seconds to minutes to respond to requests, especially if they involve a reasoning stage like DeepSeek R1.

One way to get around this, is to invoke the function asynchronously, which will queue the request and return an immediate HTTP response to the caller, along with an X-Call-Id header.

You can register a one-time HTTP callback/webhook by passing in an additional X-Callback-Url header to the request. The X-Call-Id will be returned along with the status and body of the invocation.

Here’s an example:

curl -i http://127.0.0.1:8080/async-function/filter-email \
    --data-binary @./email.txt \
    -H "X-Callback-Url: http://gateway.openfaas:8080/function/email-result

Now, we could queue up hundreds or thousands of asynchronous invocations, and each will be processed as quickly as the function can handle them. The “email-result” function will receive the responses, and can correlate the X-Call-Id with the original request.

If you’d like to try out an asynchronous invocation and don’t have a receiver function, just remove the extra header:

curl -i http://127.0.0.1:8080/async-function/filter-email \
    --data-binary @./email.txt

Now look at the logs of the filter-email function to see the processing:

faas-cli logs filter-email

Further work for the function

Our specific function was kept simple so that you can adapt it for your own needs, but perhaps if you were going to deploy this to production, you could improve the solution:

Index or save tokenized emails in a vector database for future reference and training
Let the LLM perform RAG to check for similar emails in the past, increasing confidence
Allow for a human-in-the-loop to approve or reject the categorization via a Slack or Discord message with a clickable button
Run two small models at the same time, and get a consensus on the categorization by invoking both in serial and combining the results

Whilst Ollama does not yet support multi-modal models, which can process and produce images, audio and video, it is possible to run OpenAI’s Whisper model to transcribe audio files, and then to use the text output as input to a model.

You can deploy the function we wrote previously on the blog that uses Whisper to OpenFaaS Edge as a core service, then send it HTTP requests like we did to the Ollama service.

You may find that despite the hype around LLMs, they are not a one-size fits all solution.

An alternative that is popular for classification is to use BERT (Bidirectional Encoder Representations from Transformers), a state-of-the-art NLP model from Google.

There’s good examples on the Kaggle, Keras, and Tensorflow sites.

Conclusion

The latest release of OpenFaaS Edge adds support for Nvidia GPUs for core services defined in the docker-compose.yaml file. This makes it easy to run local LLMs using a tool like Ollama, then to call them for a wide range of tasks and workflows, whilst retaining data privacy and complete confidentiality.

The functions can be written in any language, both synchronously and asynchronously for durability and scaling out.

Your function could be responding to a webhook, an event such as an incoming email, or get triggered from a cron schedule, to process data from a Google Sheet, S3 bucket, or database table.

If you’d like to discuss ideas and get a demo of anything we’ve talked about, feel free to attend our weekly call - or reach out via our pricing page.

We’ve covered various AI/LLM related topics across our blog in the past:

From our sister brands:

Inlets - Access local Ollama models from a cloud Kubernetes Cluster
Actuated - Run AI models with ollama in CI with GitHub Actions
Actuated - Accelerate GitHub Actions with dedicated GPUs

Disclosure: Ampere Computing is a client of OpenFaaS Ltd

Eradicate Cold Emails From Gmail for Good With OpenAI and OpenFaaS

2025-04-11T00:00:00+00:00

Learn how to connect OpenAI’s models to your Gmail to filter out unwanted messages with OpenFaaS, Python and Google Pub/Sub.

Cold outreach emails, those unsolicited pitches that clog your inbox and waste your time can be a constant annoyance. Marketers are often using subject lines with clickbait or misleading subjects to try to get people to open their content. The worst part is that these types of messages often slip past traditional spam filters.

Example of the type of cold outreach emails we regularly receive.

In this article we are going to build an OpenFaaS function to filter cold outreach messages from your Gmail inbox by leveraging the OpenAI API to analyze and classify emails. We will be using the new Google Cloud Pub/Sub event connector to receive Gmail inbox notifications and build an event driven email processing workflow.

OpenFaaS supports event-driven architectures through the built-in asynchronous function concept, and through event connectors, to import events from external systems. The Google Cloud Pub/Sub is the latest addition to our collection of official event triggers.

When a new message is received in Gmail the connector will invoke an OpenFaaS function. This function is going to fetch the content of any new emails in the inbox from the Gmail API and use the OpenAI API to analyze and classify each message. Based on the output of the LLM and its confidence an appropriate action, like labeling or deleting the message, can then be taken.

In the next sections we will run through the steps required to configure Google Cloud and set up Pub/Sub topics and permissions, deploy the gcp-pubsub-connector to the OpenFaaS cluster and create a function to process the Pub/Sub events. This example can serve as a starting point for more advanced email processing workflows or integrating other event-driven workflows with OpenFaaS using Google Cloud Pub/Sub.

Prerequisites

A Google Cloud Account
An OpenAI Account.
An OpenFaaS installation, k8s cluster with OpenFaaS Standard/Enterprise or OpenFaaS Edge

Receive push notifications for mailboxes via Pub/Sub

The Gmail API provides push notifications that let you watch for changes to Gmail mailboxes. It publishes messages to Pub/Sub and applications can use the Pub/Sub API to subscribe and process those messages.

In order to complete the rest of this tutorial you will need to go to the Google Cloud Console and select or create a new project. Make sure to enable the Pub/Sub API and Gmail API for your project.

Create a topic for Gmail notifications

Create a topic that the Gmail API should send notifications to. The topic name can be any name you choose under your project, i.e. projects//topics/. We will create a topic named gmail-notifications.

Cloud Pub/Sub requires that you grant Gmail privileges to publish notifications to your topic. To do this, you need to grant publish privileges to gmail-api-push@system.gserviceaccount.com. You can do this using the Cloud Pub/Sub Developer Console permissions interface or run this command with the gcloud CLI:

export PROJECT_ID=""

gcloud pubsub topics add-iam-policy-binding projects/$PROJECT_ID/topics/gmail-notifications \
  --member="serviceAccount:gmail-api-push@system.gserviceaccount.com" \
  --role="roles/pubsub.publisher"

Authenticate with the Gmail API

Obtaining an access token for authenticating to the Gmail API requires a user interaction to approve the application in a consent screen. The spam filtering function that we are going to create is intended to run as a headless function so it won’t be possible to redirect for consent. In order to keep things as simple as possible for this tutorial we will use a small python script that can be run separately to obtain an access token and save it to a file. The token file can be passed to our headless function as an OpenFaaS secret and used to authenticate with the Gmail API.

Save the following script in a file called auth.py:

from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
import os

SCOPES = ['https://www.googleapis.com/auth/gmail.modify']

def authenticate_gmail():
    creds = None
    # Ensure .secrets directory exists
    os.makedirs('.secrets', exist_ok=True)
    
    if os.path.exists('.secrets/gmail-token'):
        creds = Credentials.from_authorized_user_file('.secrets/gmail-token', SCOPES)
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file('.secrets/credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        with open('.secrets/gmail-token', 'w') as token:
            token.write(creds.to_json())
    return creds

def main():
    try:
        creds = authenticate_gmail()
        if creds:
            print("Successfully authenticated with Gmail!")
            print(f"Token saved to ./secrets/gmail-token")
        else:
            print("Failed to authenticate with Gmail")
    except Exception as e:
        print(f"Error during authentication: {str(e)}")

if __name__ == "__main__":
    main()

Make sure you install the required libraries to run the script:

pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

The script is also available on GitHub

Configure the OAuth consent screen

If this has not been done already, configure an OAuth consent screen for your Google Cloud Project.
Get client credentials

The script needs a client Id and credentials to run the OAuth consent flow and obtain an access token. Create a new desktop app client in your Google Cloud project. Download the JSON file with the credentials and save it as .secrets/credentials.json.
Run the authentication script
```
 python auth.py
```
This will open a browser window and ask you to login with a Google account. Login with the account you want to receive Gmail inbox notifications for and approve the app. After completing the OAuth flow you should see the access token file got created at .secrets/gmail-token.

Configure Gmail to send notifications

To configure Gmail accounts to send notifications to Pub/Sub the watch endpoint for a mailbox has to be called. When calling watch we need to provide the Pub/Sub topic on which we want to receive notifications and a list of labels to filter on. We will be filtering on the INBOX label. This allows us to receive notifications only for changes to the inbox e.g. when a new message is received.

Example request:

POST "https://www.googleapis.com/gmail/v1/users/me/watch"
Content-type: application/json

{
  topicName: "projects//topics/gmail-notifications",
  labelIds: ["INBOX"],
  labelFilterBehavior: "INCLUDE",
}

We will start by creating a new function to call the watch endpoint. Scaffold a new Python function using the faas-cli.

# Pull the python3-http template from the store
faas-cli template store pull python3-http

# Scaffold the function.
faas-cli new gmail-spam-detection --lang python3-http

We are using the python3-http template to scaffold the function. This template creates a minimal function image based on alpine linux. If your function depends on modules or packages that require a native build toolchain such as Pandas, Kafka, SQL etc. we recommend using the python3-http-debian template instead.

Update handler.py of the new function.

import os.path
import logging
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
from google.auth.transport.requests import Request

project_id = os.getenv('project_id')
# Pub/Sub topic to send inbox notifications to.
notification_topic = os.getenv('notification_topic')


def handle(event, context):
    try:
        gmail_client = get_gmail_client()
        request = {
            'labelIds': ['INBOX'],
            'topicName': f'projects/{project_id}/topics/{notification_topic}',
            'labelFilterBehavior': 'INCLUDE'
        }
        
        gmail_client.users().watch(userId='me', body=request).execute()
        logger.info("Successfully set up Gmail watch on INBOX with topic: %s", request['topicName'])
    except Exception as e:
        logger.error("Failed to set up Gmail watch: %s", str(e))
        return { "statusCode": 500, "body": "Failed to watch Gmail inbox" }

    return {
        "statusCode": 202,
    }

The function handler folder also includes a requirements.txt file. All Python packages the function code depends on need to be added here.

google-api-python-client
google-auth-httplib2
google-auth-oauthlib

When invoked the handler initializes a new Gmail API client by calling the get_gmail_client function. It uses this client to call the watch endpoint and configures Gmail to send notifications for all changes to the inbox.

Let’s take a look at the get_gmail_client function.

def get_gmail_client():
    # Scopes to use for authenticating with Gmail.
    scopes = ['https://www.googleapis.com/auth/gmail.modify']

    creds = None
    if os.path.exists('/var/openfaas/secrets/gmail-token'):
        creds = Credentials.from_authorized_user_file('/var/openfaas/secrets/gmail-token', scopes)
    if not creds:
        raise Exception("Failed to load credentials")
    if creds.expired and creds.refresh_token:
        creds.refresh(Request())
    
    return build('gmail', 'v1', credentials=creds)

The function tries to initialize the Gmail API client with the credentials file mounted at /var/openfaas/secrets/gmail-token. It reads the access token credentials and checks if they are expired. If this is the case the refresh token is used to obtain new credentials before returning the Gmail API client.

Note that the gmail-token file is mounted in the /var/openfaas/secrets directory. This is the default location used by OpenFaaS to mount secrets.

Confidential configuration like API tokens, connection strings and passwords should never be made available in the function through environment variables. Use OpenFaaS secrets instead.

The function configuration in the stack.yaml file needs to be updated to add the environment variables for the Google Cloud project id and name of the Pub/Sub topic to use for notifications.To tell OpenFaaS which secrets to mount for a function add the secret names to the secrets section.

functions:
  gmail-spam-detection:
    lang: python3-http
    handler: ./gmail-spam-detection
    image: ${SERVER:-ttl.sh}/${OWNER:-openfaas-demo}/gmail-spam-detection:0.0.1
+    environment:
+      project_id: "your-project-id"
+      notification_topic: "gmail-notifications"
+    secrets:
+      - gmail-token

Make sure the secrets are added to OpenFaaS before deploying the function.

# Gmail access token
faas-cli secret create \                                  
    gmail-token \
    --from-file .secrets/gmail-token

Deploy and invoke the function to configure notifications for Gmail.

faas-cli up
echo | faas-cli invoke gmail-spam-detection

Renew the watch

The mailbox watch expires after 7 days and the watch endpoint has to be re-called to keep receiving updates. The docs for the Gmail API recommend calling the watch once per day. The OpenFaaS cron-connector makes it very convenient to trigger functions on a timed-basis by simply adding annotations to a function.

Follow the instructions in the docs to deploy the cron-connector in your cluster.

Update the stack.yaml file to add a topic and schedule annotation for the cron-connector and redeploy the function.

functions:
  gmail-spam-detection:
    lang: python3-http
    handler: ./gmail-spam-detection
    image: ${SERVER:-ttl.sh}/${OWNER:-openfaas-demo}/gmail-spam-detection:0.0.1
    environment:
      project_id: "your-project-id"
      notification_topic: "gmail-notifications"
    secrets:
      - gmail-token
+    annotations:
+        topic: cron-function
+        schedule: "0 2 * * *"

With this configuration the cron-connector will invoke the function at 2:00 AM every day to renew the watch.

Renew the watch for Gmail notifications on a schedule using the cron-connector.

Deploy the Google Cloud Pub/Sub connector

In the previous sections we created a Pub/Sub topic for Gmail notifications and configured the Gmail API to publish messages to this topic. We now need to subscribe to these notifications and trigger our function whenever there is a new message.

Follow the installation instructions in the docs to deploy the Pub/Sub connector. Use the following values.yaml file to configure the connector to use the gmail-notifications-sub subscription. Make sure to replace the projectID parameter with your own project ID.

projectID: "your-project-id"
subscriptions:
  - gmail-notifications-sub

To verify notifications are being sent you can deploy the printer function. This function prints out the request body and headers and can be used to test connectors and inspect the message contents.

faas-cli store deploy printer --annotation topic=gmail-notifications-sub
faas-cli logs printer

To test the connector you can manually send a message to the gmail-notifications topic from the Google Cloud console. Or send a test email to your Gmail inbox to see an actual notification message.

The body of the Gmail notification messages is a JSON string containing the email address and the new mailbox history ID for the user:

{"emailAddress": "user@example.com", "historyId": "9876543210"}

Handle Pub/Sub messages and filter spam emails

In this section we are going to extend the gmail-spam-detection function to handle the Pub/Sub messages for mailbox updates. Whenever a mailbox update occurs that matches the watch we configured, a message is published to the gmail-notifications Pub/Sub topic describing the change. The OpenFaaS Pub/Sub connector receives the message and in turn invokes any function that has registered interest through the topic annotation.

Next our function needs to:

Parse the notifications message and get the current historyId
Get the IDs for all new messages added to the inbox since the last known historyId.
Fetch the content of the email message using the message ID
Invoke the OpenAI API with the content of each message and ask it to analyze and classify the message.
Take some action based on the classification response. In this case we are just going to add a label when a message is classified as spam.

The Gmail API does not send a list of changed messages in the notification. Instead we receive a historyId. The history.list endpoint can be used to get the change details for the user since their last known historyId.

To get an initial historyId the watch endpoint can be called. Each time a new notification is received we call the history.list endpoint to get changes that occurred between the last historyId and the receipt of the notification message. We can then filter out the ids of messages that were added to the inbox from the list of events returned. After we have processed the message the last historyId gets replaced with the historyId from the current notification.

def get_changed_messages(gmail_client, user_id='me', start_history_id=None):
    response = gmail_client.users().history().list(
        userId=user_id,
        startHistoryId=start_history_id,
        labelId='INBOX', # Only return messages in the INBOX
        historyTypes=['messageAdded']
    ).execute()

    message_ids = []
    if 'history' in response:
        for history in response['history']:
            if 'messages' in history:
                for msg in history['messagesAdded']:
                    message = msg['message']
                    message_ids.append(message['id'])
                    
    return message_ids

Lets start by refactoring handler.py. We will move the code to renew the watch to a separate function, handle_watch and create a new function handle_notification. The handle_watch function gets called if the request is coming from the cron connector to renew the watch or when the /watch path gets invoked explicitly. All other invocations get handled by handle_notifications.

Before our function starts we call the watch endpoint of the Gmail API to get the initial historyId. We are also reading secrets, environment variables and initializing the OpenAI client.

Let’s start by updating handler.py:

import os.path
import json
import base64
import logging
from openai import OpenAI
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
from google.auth.transport.requests import Request

def watch(gmail_client, project_id, topic):
    request = {
        'labelIds': ['INBOX'],
        'topicName': f'projects/{project_id}/topics/{topic}',
        'labelFilterBehavior': 'INCLUDE'
    }
    
    response = gmail_client.users().watch(userId='me', body=request).execute()
    return response.get('historyId')

# Configuration
project_id = os.getenv('project_id')
# Pub/Sub topic to send inbox notifications to.
notification_topic = os.getenv('notification_topic')

# Prompt to use for classifying emails.
classify_prompt = read_prompt()
# Label to add to cold outreach emails.
cold_outreach_label = os.getenv('cold_outreach_label')

# OpenAI API key.
openai_api_key = read_secret("openai-api-key")
openAIClient = OpenAI(api_key=openai_api_key)

# State
lastHistoryId = watch(get_gmail_client(), project_id, notification_topic)

def handle(event, context):
    if event.path == '/watch' or event.headers.get('X-Connector') == 'cron-connector':
        # Handle watch requests from the cron-connector.
        return handle_watch(event, context)
    else:
        # Handle notification requests from the pubsub-connector.
        return handle_notification(event, context)

The handle_notification function executes the steps we described at the beginning of the section when a Pub/Sub messages is received.

def handle_notification(event, context):
    global lastHistoryId

    try:
        eventData = parse_pubsub_event(event)

        email = eventData["emailAddress"]
        historyId = eventData["historyId"]
        logger.info(f"Received notification: email: {email}, historyId: {historyId}")
    except Exception as e:
        return { "statusCode": 404, "body": "Invalid pubsub event" }

    try:
        gmail_client = get_gmail_client()
        # Ensure the label for tagging messages exits.
        label_id = get_or_create_label(gmail_client, cold_outreach_label)
        # Get all messages that have been added to the INBOX since the last notification.
        msg_ids = get_changed_messages(gmail_client, start_history_id=lastHistoryId)
    except Exception as e:
        logger.error(f"Failed to handle notification: {e}")
        return { "statusCode": 500, "body": "Failed to handle notification" }

    for msg_id in msg_ids:
        try:
            # Get the content of the email using the Gmail API.
            msg_content = get_email_content(gmail_client, msg_id)
            # Call the OpenAI API to classify the email.
            classification = classify_email_content(classify_prompt, msg_content)
            logger.info(f"Classification response for message {msg_id}: {classification}")

            # If the email is classified as cold outreach, add a label to the message.
            if classification['is_cold_outreach']:
                add_label(gmail_client, label_id, msg_id)
        except Exception as e:
            logger.warning(f"Failed to process message {msg_id}: {e}")
            continue
    
    # Update the history ID to indicate we have processed messages up to this point.
    lastHistoryId = historyId

    return { "statusCode": 202 }

The full handler.py implementation can be found on GitHub.

Classifying Emails using the OpenAI API

To determine whether an email is a cold outreach message, we send a structured prompt to the OpenAI API that includes key metadata from the email: sender, address, subject and content.

The prompt asks the model to return a classification in a specific JSON format. This ensures consistent, machine-readable output that can be easily parsed and used downstream in the function logic. We ask the language model to include a confidence and reasoning field to help with classification and debugging. The reasoning can be logged in the function and is useful for debugging and during development to improve the prompt and understand why an email was classified as spam.

Analyze the following email and determine if it is a cold outreach email.

Consider the following indicators:
- Keywords like "proposal," "partnership," "demo," "schedule," "solution," or "reach out."
- Generic greetings (e.g., "Dear [Name]," "Hi there") with no personal context.
- Mentions of companies, tools, or services being offered.
- Links to scheduling tools (e.g., Calendly) or company websites.
- Formal or overly enthusiastic tone typical of sales pitches.

Return your answer in the following JSON format:
{
  "is_cold_outreach": [true | false],
  "confidence": [float between 0 and 1],
  "reasoning": "Short explanation",
  "from": "Sender address of the email",
  "subject": "The email subject"
}

If the email is incomplete or ambiguous, base your judgment on available content and note any limitations in the reasoning.

Email:

The classification function shown below constructs the full prompt dynamically based on the email content and sends it to the OpenAI Chat API using the gpt-3.5-turbo model:

def classify_email_content(prompt, content):
    full_prompt = f"{prompt}\n\nFrom: {content['from']}\nSubject: {content['subject']}\nBody:\n{content['body']}"
    response = openAIClient.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are an assistant that classifies emails. Always respond with JSON."},
            {"role": "user", "content": full_prompt}
        ],
        temperature=0.2,  # Low randomness for consistent output
        max_tokens=300
    )
    return response.choices[0].message.content

When implementing your own version of the function feel free to experiment with different available models. The prompt we use in this example is very minimal and you might want to give it more context and examples for a more reliable and consistent output. OpenAI also has a great article on how to optimize the correctness and accuracy of an LLM for specific tasks.

Deploy the function

Before deploying the function we have to update the stack.yaml configuration to make sure it gets triggered by the Pub/Sub connector. We need to add the name of the Pub/Sub subscription we want to receive messages for to the topic annotation. Also add the OpenAI API key to the list of secrets.

functions:
  gmail-spam-detection:
    lang: python3-http
    handler: ./gmail-spam-detection
    image: ${SERVER:-ttl.sh}/${OWNER:-openfaas-demo}/gmail-spam-detection:0.0.1
    environment:
      project_id: "your-project-id"
      notification_topic: "gmail-notifications"
    secrets:
      - gmail-token
+     - openai-api-key
    annotations:
-        topic: cron-function
+        topic: cron-function,gmail-notifications
        schedule: "0 2 * * *"

Generate a new API key from your OpenAI dashboard and save it at .secrets/openai-api-key.

Make sure the secret for the OpenAI API key is added added to OpenFaaS:

faas-cli secret create \
  openai-api-key \
  --from-file .secrets/openai-api-key

Deploy the function:

faas-cli up

You should see the function gets triggered whenever there is a new email in your inbox. Use the faas-cli to check the logs of the function and see the reasoning behind decisions.

faas-cli logs gmail-spam-detection

Taking it further

Add senders to a blacklist

When an email is flagged as spam, the sender’s address can be added to a blacklist stored in persistent storage. This ensures that any future emails from the same sender are immediately flagged as spam, reducing the number of requests made to the OpenAI API and saving costs, especially in high-traffic scenarios.

Support multiple inboxes

For simplicity, the function in this tutorial handles a single user and Gmail inbox. To support multiple users, you can deploy multiple instances of the spam filtering function, each configured for a different user. Alternatively, you could modify the code to handle multiple users from a single function by implementing a system to store and retrieve access tokens for different users.

Use a different LLM

While this tutorial uses the OpenAI API, there are numerous other language models available that can be integrated into this workflow. For instance, Google’s Gemini or Meta’s Llama models offer robust natural language processing capabilities. Self-hosted open-source models like DeepSeek can be used for those who prefer to maintain control over their data and infrastructure. These models can be fine-tuned to better suit specific needs, providing flexibility and potentially reducing costs associated with API usage.

Conclusion

In this tutorial, we’ve explored how to enhance email filtering by creating a custom OpenFaaS workflow with Google Cloud Pub/Sub for receiving Gmail inbox notifications and OpenAI for analyzing emails. This approach allows for more sophisticated detection of cold outreach emails, which are often missed by traditional spam filters. We encourage you’re to experiment with different language models, or implement built similar AI driven data processing pipelines, the possibilities for customization and improvement are vast. This tutorial is intended to serve as a foundation for building more advanced and tailored AI data processing workflows, demonstrating the capabilities and flexibility of OpenFaaS.

We showed you how to deploy and use the OpenFaaS GCP Pub/Sub connector to receive Gmail notification. The Pub/Sub connector allows teams and companies who are already using Google Cloud Pub/Sub or integrating with the Google Cloud platform to easily trigger their OpenFaaS functions.

If you’re approaching OpenFaaS and have no existing message broker in use such as AWS SQS, Apache Kafka, then we strongly recommend using the built-in NATS JetStream support for asynchronous processing, it provides a convenient HTTP API, and is built into every OpenFaaS installation.

Reach out to us if you’d like a demo, or if you have any questions about the GCP Pub/Sub connector, or OpenFaaS in general.

How to Convert Scripts & HTTP Servers to Serverless Functions

2025-03-24T00:00:00+00:00

In this post we’ll look at how to take a regular command-line program, script, or HTTP server and convert it into a serverless function, along with some of the benefits of doing so.

I just got off a call with a Director of IT for a non-profit in North Carolina. He told me that he had around 40 Python scripts that he kept on his laptop, and ran manually from time to time. He also wanted to make one of them available to around 600 employees to submit an annual report, for central processing. You could think of this collection of code as traditional “back office” processing - the parts that make the system work. He found out about functions, and thought it would be easier to manage than writing an API and deploying it to a cloud VM.

This post is for you, if like him, want to get your code into production in a quick and reliable way, without getting bogged down with making choices about infra, hosting, monitoring, and security. Serverless covers most of this for you, so you can focus on solving the problem at hand.

We’ll first look at the concept of a function, how they run on cloud solutions, and how self-hosted can sometimes be a better option. We’ll then go through the mechanics of input, output, configuration, state, dealing with files and secrets, and there’ll be lots of code examples along the way.

What is a Serverless Function?

Functions are stateless, ephemeral, and event-driven, meaning they can be triggered by various events such as HTTP requests, file uploads, or database changes. Developers can focus on writing code, rather than managing servers.

The concept originated from the cloud, being popularised by the AWS Lambda service and is now widely available from other providers such as Google Cloud Functions and Azure Functions. Lambda is a SaaS service designed to serve hundreds of thousands of tenants in an efficient, and cost-effective manner.

In order to run a reliable and profitable service, AWS had to implement a stringent set of limits, and capabilities, which can leave developers feeling frustrated when they want to do something outside of the set limits, such as running an execution over an extended period of time, running on-premises, deploying existing code to another cloud provider, or even using a GPU.

Functions also offer benefits over traditional server-based applications by simplifying packaging, deployment, and management.

So what if we could take the concept of functions, but solve for some of these issues?

How OpenFaaS helps

OpenFaaS takes the familiar model of functions, and makes them portable, and configurable. You can now run them not only on AWS using a service like AWS EKS, but on Google Cloud, Azure, Oracle Cloud, and even on-premises with your own hardware.

How? The paradigm shifts when you self-host functions using containers and Kubernetes. Where once you were limited to a 15 minute timeout, you can now run an execution for hours, or even days. Where you couldn’t use a GPU, you can now allocate one or more to a function, or even package a popular LLM such as Deepseek to serve requests.

It also improves the developer experience. You can install the same platform on your machine and test your functions fully on your own machine, with a fast feedback loop, before publishing them to production.

You’re also not tied to a specific set of events. Instead of being limited to AWS SNS or AWS SQS, you can start consuming events from Apache Kafka, or RabbitMQ - or just receive HTTP requests. The options are vast and extensible

Once your functions are deployed, you can monitor them through the OpenFaaS Dashboard, and Grafana dashboards for latency, throughput, and error rate.

The autoscaler helps your code respond to spikes in demand, and scale to zero can keep your costs and utilization down when demand is low.

In the above conceptual overview, we have the following:

A user working on his local machine runs faas-cli up to deploy a function
A synchronous invocation is in progress via the gateway to Function A - written in Python
Two asynchronous invocations are in progress to Function B which has two replicas - traffic is being load balanced across the two
Function C is scaled down to zero replicas, and will be invoked by the Cron Connector at midnight every day

Once OpenFaaS is installed to a Kubernetes cluster or to a VM using OpenFaaS Edge, then adding a new function is as simple as running faas-cli new followed by faas-cli up. It’ll then be managed, scaled, and monitored for you by the platform, and can be invoked by any number of different triggers.

How do traditional programs work?

Traditional programs can be divided into two categories:

One-shot CLIs - start up, take some configuration and work parameters, output, then exit - think of something like curl or psql
Long-lived daemons - start up, often binding to a TCP port, then wait for requests - think of something like nginx or postgres

We will explore options for configuration, inputs, outputs, and state and storage for both types of programs and how they can be converted into functions.

The examples will be a mixture of Go, Node, and Python.

Listening on a port

In the case of a long-lived daemon, traditional programs will often listen to HTTP requests on a given TCP port.

This is done automatically in OpenFaaS, and is hidden as part of the template’s entrypoint implementation.

Every function once deployed will get its own path on the OpenFaaS gateway, and will be able to receive HTTP requests both synchronously and asynchronously.

Here are a few examples of how to start up a HTTP server in various languages, in regular CLI programs:

func main() {
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        fmt.Fprintf(w, "Hello, world!")
    })

    log.Fatal(http.ListenAndServe(":8080", nil))
}

Or if we’d used Flask:

from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello_world():
    return 'Hello, World!'
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

Or if we’d used Node.js and Express:

const express = require('express');
const app = express();

app.get('/', (req, res) => {
    res.send('Hello, World!');
});

app.listen(8080, () => {
    console.log('Server is running on port 8080');
});

With OpenFaaS templates, this is already done for us, so we focus on the logic of the program. How to handle a request and return a response.

The template for Go golang-middleware uses a regular http.HandlerFunc to handle the request.

Most of the other templates use a similar pattern, but with a “context” and “request” object used in a very similar way.

package function

import (
	"fmt"
	"io"
	"net/http"
)

func Handle(w http.ResponseWriter, r *http.Request) {
	var input []byte

	if r.Body != nil {
		defer r.Body.Close()

		body, _ := io.ReadAll(r.Body)

		input = body
	}

	w.WriteHeader(http.StatusOK)
	w.Write([]byte(fmt.Sprintf("Body: %s", string(input))))
}

To generate the function, you can run the following replacing the variable in OPENFAAS_PREFIX with your own container registry and account.

export OPENFAAS_PREFIX=ttl.sh/openfaas-test
faas-cli new --lang=golang-middleware http-to-json

faas-cli up

Then invoke the function with curl or using the OpenFaaS CLI:

curl http://127.0.0.1:8080/function/http-to-json

faas-cli invoke http-to-json

Flags and arguments

Flags and arguments are passed to the program at start-up to configure its behaviour or to give the input for the task.

If you had a program which took an URL as an argument, then made a HTTP request and printed the response back, it’d perhaps look like this:

./http-to-json https://hacker-news.firebaseio.com/v0/topstories.json

In OpenFaaS, these need to be read via the handler within the function.

For example, in Go with the golang-middleware template, we’d write:

package function

import (
	"io"
	"net/http"
	"net/url"
)

func Handle(w http.ResponseWriter, r *http.Request) {
	if r.Body != nil {
		defer r.Body.Close()
	}

	parseUrlV := r.Header.Get("X-Parse-Url")
	parseUrl, err := url.Parse(parseUrlV)
	if err != nil {
		http.Error(w, "Invalid URL", http.StatusBadRequest)
		return
	}

	res, err := http.Get(parseUrl.String())
	if err != nil {
		http.Error(w, "Error fetching URL", http.StatusInternalServerError)
		return
	}
	defer res.Body.Close()

	if res.StatusCode != http.StatusOK {
		http.Error(w, "Error fetching URL", res.StatusCode)
		return
	}

	w.Header().Set("Content-Type", "application/json")
	io.Copy(w, res.Body)
}

In Go, we’re able to use r.Header.Get to read the header value from the request. In this case, we are using X-Parse-Url as the header name.

In Python, we use event.headers.get('X-Parse-Url') to read the header value from the request.

And then in Node.js, we’d use event.headers['X-Parse-Url'] to read the header value from the request.

Once deployed, our function will get its own URL and we can call it with a simple HTTP request:

curl -X POST http://localhost:8080/function/http-to-json \
  -H "X-Parse-Url: https://hacker-news.firebaseio.com/v0/topstories.json"

This is a simple example, but it shows how we can take a program that takes an argument and convert it into a function that takes an HTTP request.

The faas-cli can also be used to invoke the function:

echo | faas-cli invoke http-to-json \
  -H "X-Parse-Url=https://hacker-news.firebaseio.com/v0/topstories.json"

The faas-cli command requires an input from STDIN, so we can either run the command and give an input, or pass an empty input with echo.

Environment variables for configuration

Environment variables are used for static configuration for many kinds of programs, including HTTP servers. You may be setting an option for log verbosity, or the URL for a dataset that is needed for the program to operate.

OpenFaaS makes a distinction between confidential and non-confidential environment variables. Let’s start with non-confidential ones, also known as “configuration”.

Typically, if we wanted to run our previous program with higher verbosity, it may look like this:

export VERBOSE=1
./http-to-json https://hacker-news.firebaseio.com/v0/topstories.json

In OpenFaaS, we can set the environment variable in the stack.yml file:

version: 1.0
provider:
  name: openfaas
  gateway: http://127.0.0.1:8080
functions:
  flags:
    lang: golang-middleware
    handler: ./flags
    image: ttl.sh/flags:latest
+   environment:
+     VERBOSE: "1"

Alternatively, we can supply the name of an environment file. This is useful for when you want to deploy the same function to multiple different environments or regions, and just want to change the environment variables for each one.

Create a dev.env file with the following contents:

VERBOSE=1

Then, in the stack.yml file, we can reference the file:

version: 1.0
provider:
  name: openfaas
  gateway: http://127.0.0.1:8080
functions:
  flags:
    lang: golang-middleware
    handler: ./flags
    image: ttl.sh/flags:latest
+   environment_file:
+   - dev.env

Then, if you needed to replace the name of the environment file, you could do so in the stack.yml file using environment variables.

functions:
  flags:
    lang: golang-middleware
    handler: ./flags
    image: ttl.sh/flags:latest
    environment_file:
+   - ${ENV_FILE:-dev.env}

Then you have the option to set the environment variable ENV_FILE to the name of the file you want to use.

# Deploy with the development configuration
faas-cli up

# Deploy with the staging configuration
ENV_FILE=staging.env faas-cli up

# Deploy with the production configuration
ENV_FILE=prod.env faas-cli up

Here’s how we can read the environment variable in Go:

func Handle(w http.ResponseWriter, r *http.Request) {
	if r.Body != nil {
		defer r.Body.Close()
	}

	parseUrlV := r.Header.Get("X-Parse-Url")

+	if v, ok := os.LookupEnv("VERBOSE"); ok {
+		if v == "1" {
+			log.Printf(`Value of "X-Parse-Url": %s`, parseUrlV)
+		}
+	}

	parseUrl, err := url.Parse(parseUrlV)
	if err != nil {
		http.Error(w, "Invalid URL", http.StatusBadRequest)
		return
	}

	res, err := http.Get(parseUrl.String())
	if err != nil {
		http.Error(w, "Error fetching URL", http.StatusInternalServerError)
		return
	}
	defer res.Body.Close()

	if res.StatusCode != http.StatusOK {
		http.Error(w, "Error fetching URL", res.StatusCode)
		return
	}

	w.Header().Set("Content-Type", "application/json")
	io.Copy(w, res.Body)
}

In Python, we’d use os.Getenv to read the environment variable. In Node.js, we’d use process.env.VARIABLE_NAME to read the environment variable.

In addition to headers, the HTTP Path, Query String and Body can also be read and parsed by your function.

Reading files from the filesystem

A classic use-case for Go programs is to read a Go template from the filesystem, and to use it to generate some dynamic content.

With a regular CLI program, we’d do something like this:

func main() {
	content, err := os.ReadFile("template.tmpl")
	if err != nil {
		log.Fatalf("Failed to read template: %s", err)
	}

	fmt.Println(string(content))
}

One thing you must never do in a function is to call log.Fatal or os.Exit as this will crash the function, and cause it to restart.

Instead, you should return a HTTP error response, and if you think it makes sense, also log an error message. Anything logged to stdout or stderr can be viewed via faas-cli logs or in the OpenFaaS Dashboard.

In Go, we can use the http.Error function to return a HTTP error response.

http.Error(w, "Failed to read template", http.StatusInternalServerError)

Most OpenFaaS templates support bundling files in a folder named static inside of the function’s source-code directory.

faas-cli new --lang=golang-middleware tmpl
mkdir -p tmpl/static

cat <<EOF > tmpl/static/welcome.html.tpl

Hello, {{.Name}}

EOF

Write a program that reads a Go HTML template and uses it to generate some dynamic content.

package function

import (
	"fmt"
	"html/template"
	"io"
	"net/http"
	"strings"
)

const templatePath = "./static/welcome.html.tpl"

var welcomeTemplate *template.Template

func init() {
	tpl, err := template.ParseFiles(templatePath)
	if err != nil {
		panic(err)
	}
	welcomeTemplate = tpl
}

type WelcomeRequest struct {
	Name string
}

func Handle(w http.ResponseWriter, r *http.Request) {
	var input []byte

	if r.Body != nil {
		defer r.Body.Close()
		body, _ := io.ReadAll(r.Body)
		input = body
	}

	req := WelcomeRequest{Name: strings.TrimSpace(string(input))}

	w.Header().Set("Content-Type", "text/html")
	if err := welcomeTemplate.Execute(w, req); err != nil {
		http.Error(w, fmt.Sprintf("Error executing template: %s ", err), http.StatusInternalServerError)
		return
	}
}

Notice that we read the template from disk only once at start-up, and then use it to generate the content for each request. We can use func init() for this in Go.

For languages and templates that do not support running code outside of the handler, you can assign the value on the first request, and then use it for subsequent requests.

I got the following response from curl http://127.0.0.1:8080/function/tmpl --data "Alex":


Hello, Alex

As a bonus, Go unlike other languages has built-in support for embedding files directly into the executable via the embed package. This is a good option for small files like HTML templates, but less so for larger binaries which would increase the size of the binary.

OpenFaaS uses container images, and so keeping our files in the static folder means we can take better advantage of layer caching, sending only the changed files rather than the binary to the registry when we change these files.

Writing state and files

Functions are stateless and you shouldn’t assume that state or files set up by one request will be available in the future.

If you want to take in a file in the HTTP body, and store it on disk for processing, then you can do so in the /tmp directory.

Use-cases may be video conversion, image processing, encryption/decryption, segmentation, or uploading to AWS S3, Google Cloud Storage, or Azure Blob Storage.

faas-cli new --lang=python3-http store-file

import os
import tempfile

def handle(event, context):

    tmp_dir = tempfile.mkdtemp()
    tmp_file = os.path.join(tmp_dir, "request_body")

    with open(tmp_file, "wb") as f:
        f.write(event.body)
        f.flush()
        f.close()
        
    file_size = os.path.getsize(tmp_file)
    print(f"File size: {file_size}")

    os.remove(tmp_file)
    os.rmdir(tmp_dir)

    return {
        "statusCode": 200,
        "body": f"Size: {file_size}"
    }

Now invoke the function with a file:

curl http://127.0.0.1:8080/function/store-file \
  -X POST \
  -H "Content-Type: application/octet-stream" \
  --data-binary @./stack.yaml

I got: Size: 347

Consuming secrets

OpenFaaS uses the built-in Kubernetes secret store for storing sensitive information such as connection strings, API keys, and passwords.

You can create a secret in two ways - either from a literal string on the command line, or from a file.

  faas-cli secret create api-key --from-literal "my-secret"

Note the two spaces preceding the command, this ensures bash will not store the command in the history, which would expose the secret.

Alternatively use a file:

echo "my-secret" > secret.txt
faas-cli secret create api-key --from-file secret.txt

faas-cli new --lang python3-http protected-fn

Then, in the stack.yml file, we can request them using the name of the secret i.e. api-key:

version: 1.0
provider:
  name: openfaas
  gateway: http://127.0.0.1:8080
functions:
  protected-fn:
    lang: python3-http
    handler: ./protected-fn
    image: protected-fn:latest
+    secrets:
+    - api-key

In the handler, you can now read the secret from the filesystem. These files will be mounted in /var/openfaas/secrets/ directory and will be available to the function.

def handle(event, context):

    with open("/var/openfaas/secrets/api-key", "r") as f:
        api_key = f.read().strip()

    if event.headers.get("Authorization") != f"Bearer {api_key}":
        return {
            "statusCode": 401,
            "body": "Unauthorized"
        }

    return {
        "statusCode": 200,
        "body": "Hello from OpenFaaS!"
    }

Example of invoking the function with curl:

$ curl -i --silent http://127.0.0.1:8080/function/protected-fn

HTTP/1.1 401 Unauthorized
Content-Length: 12
Content-Type: text/html; charset=utf-8
Date: Mon, 24 Mar 2025 10:02:07 GMT
Server: waitress
X-Duration-Seconds: 0.000827

Unauthorized

$ curl -i --silent http://127.0.0.1:8080/function/protected-fn \
    -H "Authorization: Bearer my-secret"

HTTP/1.1 200 OK
Content-Length: 20
Content-Type: text/html; charset=utf-8
Date: Mon, 24 Mar 2025 10:05:29 GMT
Server: waitress
X-Duration-Seconds: 0.001077

Hello from OpenFaaS!

External triggers and events

Many people think of functions as being event-driven, however this term is often misused.

It does not mean that functions have to be invoked through a connection to an event broker like Apache Kakfa. It simply means that they are invoked through a trigger, and are short-lived. They cannot invoke themselves or run in the background without an initial event. That event could be a HTTP request.

Along with synchronous and asynchronous HTTP requests, OpenFaaS adds various Event Connectors or “Triggers” for common brokers and queues such as Apache Kafka, RabbitMQ, and AWS SQS.

The most popular and ubiquitous Event Connector is the Cron Connector which invokes functions on a schedule defined in the stack.yml file.

Write a function to print out the current time:

faas-cli new --lang=node20 clock

'use strict'

module.exports = async (event, context) => {
  console.log("Time: ", new Date().toISOString())

  return context
    .status(204)
    .succeed("")
}

Now define a cron schedule for the function in the stack.yml file to run every 5 minutes:

  clock:
    lang: node20
    handler: ./clock
    image: clock:latest
+    annotations:
+      topic: cron-function
+      schedule: "*/5 * * * *"

Then tail the logs of the function:

$ faas-cli logs clock --follow

Time:  2025-03-24T10:17:00.000Z
Time:  2025-03-24T10:17:05.360Z
Time:  2025-03-24T10:17:10.360Z

A cron schedule is a convenient way to kick off jobs that need to run on an hourly, or daily basis to import or transform data.

Scale to zero

We just saw how a function could be triggered every 5 minutes using a cron schedule. But what if the function is only needed once per day?

Scale to Zero in OpenFaaS is an opt-in feature that allows you to scale your function to zero when it is not needed. For every other function, just leave them as they are and they’ll always have at least 1 replica running meaning you can beat the cold start time seen with cloud-based solutions.

To enable scale to zero, just update the labels in stack.yaml:

  clock:
+	labels:
+	  com.openfaas.scale.zero: "true"

This function will now get scaled to zero at idle, using the system-wide configured idle period.

We can tune it further on a per-function basis by adding an extra label:

  clock:
	labels:
	  com.openfaas.scale.zero: "true"
+	  com.openfaas.scale.zero-duration: "10m"

Any function that is scaled to zero will be scaled back up when it gets invoked.

Built-in queue / asynchronous invocations

OpenFaaS has a built-in async queue system that can retry failed requests and invoke a callback URL when the function has finished processing.

The queue-worker is a separate process that pulls in work, and invokes the function asynchronously, some customers use it to handle millions of short-lived requests per day, whilst others use it to process a few very long-running requests to import or sync data.

You don’t have to do anything to make a function asynchronous, you just need to change its URL when you invoke it.

I’m going to limit our clock function so that it can only handle one request at a time:

  clock:
    environment:
	  max_inflight: 1

After running faas-cli up again, I use hey to generate 100 synchronous requests to the function:

$ hey -n 100 --method POST http://127.0.0.1:8080/function/clock

Status code distribution:
  [204] 24 responses
  [429] 76 responses

We can see that 76 requests were rejected with a 429 error, and the remaining 24 requests were successful. That’s because of the max_inflight limit.

Now let’s invoke the function asynchronously instead. All the requests will be accepted, then get invoked in the background by the queue-worker which can retry the requests if a 429 is returned due to the limit.

hey -n 100 --method POST http://127.0.0.1:8080/async-function/clock

We get an immediate response from the gateway with all requests accepted.

Then can see the function being invoked in the logs:

$ faas-cli logs clock --follow

Or you can look at the logs of the queue-worker:

$ kubectl logs -n openfaas deploy/queue-worker |grep "7a73f2cc-c1ed-413b-ae9e-89f8deada9fb"
2025-03-24T11:15:50.977Z  Invoke	{"callId": "7a73f2cc-c1ed-413b-ae9e-89f8deada9fb", "function": "clock", "delivery": 1}
2025-03-24T11:15:50.979Z  Invoked	{"callId": "7a73f2cc-c1ed-413b-ae9e-89f8deada9fb", "function": "clock", "delivery": 1, "status": 429, "duration": 0.002106761}
2025-03-24T11:16:10.993Z  Invoke	{"callId": "7a73f2cc-c1ed-413b-ae9e-89f8deada9fb", "function": "clock", "delivery": 2}
2025-03-24T11:16:11.007Z  Invoked	{"callId": "7a73f2cc-c1ed-413b-ae9e-89f8deada9fb", "function": "clock", "delivery": 2, "status": 204, "duration": 0.013043763}

The callId field is returned from the /async-function/ endpoint, and is used to track the request through the queue.

We see a few 429 errors, followed by the eventual successful response.

If you want to receive the result of a function, you can pass in the X-Callback-URL header with the URL to receive the result.

$ faas-cli store deploy printer

$ curl -i -X POST http://127.0.0.1:8080/async-function/clock \
   -H "X-Callback-URL: http://gateway.openfaas:8080/function/printer"

HTTP/1.1 204 No Content
Connection: keep-alive
Date: Mon, 24 Mar 2025 11:21:52 GMT
Keep-Alive: timeout=5
X-Duration-Seconds: 0.005280
X-Call-Id: 02ec6000-7379-46d0-9280-68076ee8c725

Note the X-Call-Id header, you’ll see it in the logs of the printer function:

$ faas-cli logs printer

Here’s the result:

2025-03-24T11:21:24Z Content-Type=[text/plain]
2025-03-24T11:21:24Z Accept-Encoding=[gzip]
2025-03-24T11:21:24Z Date=[Mon, 24 Mar 2025 11:21:24 GMT]
2025-03-24T11:21:24Z X-Call-Id=[02ec6000-7379-46d0-9280-68076ee8c725]
2025-03-24T11:21:24Z X-Duration-Seconds=[0.009309]
2025-03-24T11:21:24Z X-Function-Name=[clock]
2025-03-24T11:21:24Z X-Function-Status=[204]
2025-03-24T11:21:24Z X-Start-Time=[1742815284444741902 1742815284442047146]
2025-03-24T11:21:24Z 
2025-03-24T11:21:24Z 2025/03/24 11:21:24 POST / - 202 Accepted - ContentLength: 0B (0.0011s)

Conclusion

In this post we looked at some of the benefits of self-hosted serverless solutions, which are both portable and flexible whilst providing a similar experience to cloud-based solutions.

We then looked at various examples of how to convert from a regular CLI or HTTP daemon into a function that can be deployed to OpenFaaS. There’s much more to explore, but I hope this post gives you a good starting point.

Feature	Traditional App	OpenFaaS Functions	Cloud-based functions
Work to add a new program	Considerable	`faas-cli new` and `faas-cli up`	Create via cloud UI or CLI
Packaging	Zip files over SFTP/Docker images	Container images	Zip files uploaded to cloud storage or web-based IDE
Scale to Zero	No	Yes (opt-in)	Yes (always on)
GPU support	No	Yes	Yes (no)
Input	Flags, args, env, files	HTTP headers/body	HTTP headers/body
Configuration	Files or Environment variables	Environment variables	Environment variables
Execution time	Unbounded or long-running	Configurable (no enforced limit)	Limited to 60s or a few minutes
Secret handling	Environment variables	Kubernetes secrets / secrets manager	Environment variables / secrets manager
Event triggers	Manual work	Built-in connectors (Kafka, AWS, RabbitMQ, cron) or HTTP	Vendor’s proprietary connectors (AWS SQS / AWS S3)
Async queue	Manual work	Built-in async queue with retries and callbacks	Vendor’s proprietary async queue

If you’d like to explore more examples of functions, check out the OpenFaaS templates in the docs.

Kubernetes isn’t the only way to run OpenFaaS - OpenFaaS Edge can run everything you need for relatively low demand and event-driven automation on a single VM.

Connect with us

We run a Weekly Zoom call where you can come along to ask questions, or put us on our toes by requesting with a live demo on the spot!

Also, feel free to reach out to our team for help converting existing APIs, microservices, or scripts into functions.

How to Manage Stateful Services with OpenFaaS Edge

2025-03-18T00:00:00+00:00

OpenFaaS Edge is a distribution of OpenFaaS Standard that runs without Kubernetes, on a single host or VM.

We created OpenFaaS Edge (aka faasd-pro) as an antidote to the constant churn, change, and complexity of Kubernetes. It is designed to be simple to operate, easy to understand, and reliable so that it can be redistributed as part of an appliance or edge device.

In this post, we’ll explore some new commands added to the faasd binary for managing services, then I’ll show you how stateful services with via the docker-compose.yaml.

Finally, I’ll show you how the arkade tool can be used to upgrade the versions of the container images referenced in the YAML file, and how to ignore certain images if you want to hold them at a specific version.

Introducing the new service commands

The faasd binary for OpenFaaS Edge has a new set of commands that allow you to manage the services defined in the docker-compose.yaml file. These commands are available in OpenFaaS Edge, and they are designed to make it easier to manage stateful services.

faasd service list - Lists the status of each service including its age and uptime
faasd service logs - Displays the logs for a given service
faasd service restart - Restarts a given service
faasd service top - Shows the CPU and memory usage of each service

A simpler version of the service logs command was added to faasd CE.

faasd service list

The faasd service list command lists all the services defined in the docker-compose.yaml file. This command is useful for getting an overview of the services that are running on your OpenFaaS Edge instance.

$ faasd service list

NAME             IMAGE                                               CREATED          STATUS                 
cron-connector   ghcr.io/openfaasltd/cron-connector:0.2.9            23 seconds ago   running (23 seconds)   
faas-idler       ghcr.io/openfaasltd/faas-idler:0.5.6                23 seconds ago   running (23 seconds)   
gateway          ghcr.io/openfaasltd/gateway:0.4.38                  23 seconds ago   running (23 seconds)   
nats             docker.io/library/nats:2.10.26                      24 seconds ago   running (24 seconds)   
prometheus       docker.io/prom/prometheus:v3.2.1                    23 seconds ago   running (23 seconds)   
queue-worker     ghcr.io/openfaasltd/jetstream-queue-worker:0.3.46   23 seconds ago   running (23 seconds)   
 

Note that the IP address will change between restarts.

faasd service logs

The faasd service logs command displays the logs for a given service. This command is useful for debugging and troubleshooting issues with your services.

You can use various flags which you may recognise from docker logs, journalctl or kubectl logs:

$ faasd service logs gateway

$ faasd service logs gateway -f

$ faasd service logs gateway --lines 1000

$ faasd service logs gateway --since 1h

$ faasd service logs gateway --since-time "2025-03-18T12:00:00Z"

faasd service top

The faasd service top shows the RAM and CPU usage of each service, so you can keep an eye on whether your services will continue to fit on the host.

$ faasd service top

NAME            PID            CPU (Cores)    Memory
prometheus      5870           2m             50 MB
gateway         5976           2m             12 MB
nats            5759           2m             4.3 MB
cron-connector  6082           0m             5.1 MB
queue-worker    6184           0m             6.3 MB
faas-idler      6493           0m             7.3 MB

faasd service restart

The faasd service restart command restarts a given service.

Whilst this command does not reload any changes from the docker-compose.yaml file, it will stop and restart the service. This is useful if you hadn’t created a volume mount with the right permissions, the image you were using was not yet pushed to the registry, or there was a race condition or crash of your service.

To reload the compose file, run sudo systemctl restart faasd instead.

Perhaps you noticed an error with the faas-idler service and need to restart it?

First, we check the logs to see when it started:

$ faasd service logs faas-idler

OpenFaaS Pro: faas-idler (classic scale-to-zero)	Version: 0.5.6	Commit: a2fa75c6c82297d110e4c1922ead2df77c2b3cce
2025/03/18 12:48:08 read-only mode: false

Then we restart it

$ faasd service restart faas-idler

Next, we can check the running time, and see that it’s more recent than the other parts of the stack:

$ sudo faasd service list
NAME             IMAGE                                               CREATED         STATUS                 
cron-connector   ghcr.io/openfaasltd/cron-connector:0.2.9            2 minutes ago   running (2 minutes)    
faas-idler       ghcr.io/openfaasltd/faas-idler:0.5.6                2 minutes ago   running (43 seconds)   

New features for the docker-compose.yaml file

Disable a service for a period of time, whilst retaining it in the file:

  cron-connector:
    deploy:
      replicas: 0

Enable automatic restarts for a service such as nats:

  nats:
    restart: "always"

By default, services that exit will not be restarted unless you run sudo systemctl restart faasd.

How the `docker-compose.yaml` file works

Since OpenFaaS Edge runs on a VM, without Kubernetes, we needed a way to define the core services that make up the stack the gateway, NATS, Prometheus and the queue-worker. Docker’s Compose specification was a convenient way to do this because it uses a syntax that’s easy to understand and familiar to many developers who have worked with Docker at some point in their career.

Every installation comes with a pre-defined file, which contains definitions for the OpenFaaS gateway, queue-worker, cron-connectors, NATS and Prometheus. Typically you’d only change this file if you wanted to update a setting like the maximum timeout for functions or what version of an image you wanted to run.

Here’s how NATS can be defined in the stack for faasd CE:

services:
  nats:
    image: docker.io/library/nats:2.10.26
    user: "65534"
    restart: "always"
    command:
      - "/nats-server"
      - "-js"
      - "-sd=/nats"
    volumes:
      - type: bind
        source: ./nats
        target: /nats

We can also expose the TCP port for NATS on a given adapter, or just access it from the Linux bridge that is created by faasd.

ports:
- "127.0.0.1:4222:4222"

Service discovery is relatively simple. Each service created by this file can look up other containers by name from a /etc/hosts file that is injected dynamically.

So the code in the OpenFaaS gateway can simply open a TCP connection to nats:4222 and it will be routed to the NATS container.

All services within the compose file are already stateful in that they are run longing, and can write to their filesystem. However, if you want their data to survive restarts, then you can attach volumes to them just like with docker run -v.

For Prometheus, we can add a volume to store its Time Series Data Base (TSDB) on the host machine:

  prometheus:
    image: docker.io/prom/prometheus:v3.2.1
    user: "65534"
    restart: "always"
    volumes:
      - type: bind
        source: ./prometheus.yml
        target: /etc/prometheus/prometheus.yml
      - type: bind
        source: ./prometheus
        target: /prometheus
    cap_add:
      - CAP_NET_RAW
    ports:
       - "127.0.0.1:9090:9090"

We’ve also bind-mounted the prometheus.yml file so that we can change the configuration without having to build a custom image.

How to update images in the docker-compose.yaml file

The arkade tool can be used to check for newer images and upgrade the image tags within the docker-compose.yaml file.

cd /var/lib/faasd

arkade chart upgrade --file ./docker-compose.yaml --write --verbose

Example output:

root@build-1:/var/lib/faasd# arkade chart upgrade --file ./docker-compose.yaml --write --verbose
2025/03/18 12:30:57 Verifying images in: ./docker-compose.yaml
2025/03/18 12:30:57 Found 6 images
2025/03/18 12:30:57 [ghcr.io/openfaasltd/gateway] 0.4.38 => 0.4.39

If you want to hold the version for a certain service, such as postgresql, then you can create a file alongside the docker-compose.yaml which will be read by arkade:

/var/lib/faasd/arkade.yaml
/var/lib/faasd/docker-compose.yaml

Add the full path to each image you want arkade chart upgrade to ignore, for example:

ignore:
- services.postgresql.image

Then run sudo systemctl restart faasd to apply the changes.

Conclusion

We’ve added a new set of commands to help you manage and monitor services with OpenFaaS Edge, along with two new features for the compose file (restart policies and disabling services).

We also took a look at a couple of examples of how to define services in the docker-compose.yaml file using the NATS and Prometheus services from the OpenFaaS Edge YAML file. You can inspect your local YAML file at /var/lib/faasd/docker-compose.yaml to see how the other services are set up.

For additional examples of how to define services in the docker-compose.yaml file, please refer to Serverless For Everyone Else which is the handbook for faasd CE and OpenFaaS Edge. You’ll find detailed examples of how to write functions using Node.js and how to manage additional services such as Grafana, Postgresql, and InfluxDB.

Want to try it out?

For personal, non-commercial use, you have two options:

faasd CE - free for 15 functions
OpenFaaS Edge - get free access when you sponsor @openfaas via GitHub for 25 USD/mo or higher

For commercial use:

faasd CE - 1x installation for a 60 day trial or PoC
OpenFaaS Edge - purchase a license and run up to 250 functions with various OpenFaaS Pro features included

Find out more in the docs

How to Integrate WebSockets with Serverless Functions and OpenFaaS

2025-02-27T00:00:00+00:00

We show you how to deploy an existing WebSocket server as a function, and how to modify an existing template to support WebSockets.

We’ll also cover:

How OpenFaaS can support WebSockets natively, when cloud-based solutions do not
Auto-scaling for WebSockets
A singleton approach for maintaining state for AI agents and chat applications
Extended timeouts to support long-lived WebSockets
Server Sent Events (SSE) as an alternative to WebSockets

When we talk about serverless functions, that typically means a short-lived, stateless piece of code that is triggered by an event. WebSockets take a different approach, and need to run for an extended period of time, and maintain a stateful connection to the client. Cloud-based functions offerings like AWS Lambda and Google Cloud Run tend to have very short timeouts, and make it difficult to maintain state. This is where a framework like OpenFaaS, which is built to run with containers, on infrastructure that you control, comes into its own.

WebSockets offer bidirectional streaming, which makes them ideal for chat interfaces, notifications, LLM agents, and other cases where you need to push data to the client.

WebSocket servers need to handle the various events that occur during a connection: open, message, close, and error. They also support broadcasting messages to all of the currently connected clients, or sending messages to a specific client.

An alternative to WebSockets is Server Sent Events (SSE), which is a server push technology. It’s what you use when you work with the OpenAI or Ollama REST APIs. An initial connection is made to the server, the client sends its request and then the server streams the response back to the client. This is a simpler approach, and is easier to implement in a serverless environment, and we added support for this in OpenFaaS in January 2024: Stream OpenAI responses from functions using Server Sent Events.

Server Sent Events fit the serverless paradigm well, and allow for many of the same use cases as WebSockets, so we’d recommend them as a first port of call.

That said, WebSockets now be used with OpenFaaS Standard/Enterprise and the OpenFaaS Edge (faasd-pro). We’ll take a look at how in this post.

Two options for WebSocket support in a function

There are two options for WebSocket support in a function: modify an existing template to handle WebSocket events such as open, message, close, and error, or package a HTTP server in a Dockerfile.

Option 1 is to pick one of the existing templates and to adapt its entrypoint and handler to handle WebSockets in the way you want.

Whilst we did add SSE support to our official templates, we did not do the same for WebSockets, because one size does not fit all.

Option 2 is that you can write your code in exactly the same way you would, any other application in your preferred language, with your preferred frameworks. You then package it into a container image using Docker, and deploy it via faas-cli, as if it were a function.

For Go, that’s likely going to be gorilla/websocket, for Python that might be Flask-SocketIO, and for Node.js it’s probably ws.

In order to deploy your code as a function, you’ll just need to make sure its HTTP server binds to port 8080, and implements a health /_/health and a readiness /_/ready handler. It’s OK if you only return a 200 from these endpoints whilst you’re getting started. You’ll also need to write a Dockerfile that builds and packages your application and then you can build/deploy it via the OpenFaaS CLI.

To test out the WebSocket support for existing applications, I tried packaging the server component of our inlets product as a function. Inlets is used to expose HTTP or TCP services to the Internet over a WebSocket. I was able to deploy its container image via faas-cli and then connected a regular inlets-pro client to the function and accessed the exposed service.

So how do you decide which option to use?

The purpose of templates in OpenFaaS is to remove duplication and boilerplate between functions. For each new function, you pull down your custom template, and scaffold only the handler, and a way to provide dependencies.

If you end up feeling like the template approach doesn’t fit your specific use case, then you can always package an application as a function with a Dockerfile.

Option 1: Modify an existing template for WebSockets

I spent some time modifying the underlying index.js file to include similar code to our first example.

The changes the user will see involve the handler.js file, where we now export a wsHandler function, in addition to the existing handler function.

module.exports = {
  handler,
  wsHandler
};

This allows the index.js to send normal HTTP REST requests to one handler, and the WebSocket connections/events to another.

'use strict'

// handler handles a single HTTP request
const handler = async (event, context) => {
  const result = {
    'body': JSON.stringify(event.body),
    'content-type': event.headers["content-type"]
  }

  return context
    .status(200)
    .succeed(result)
}

// WebSocketHandler responds to events from all connected
// WebSocket connections
class WebSocketHandler {
  constructor(server) {
    this.server = server;
  }

  init(server) {
    this.server = server;
  }

  handleConnection(ws, request) {
    console.log('[wsHandler] Client connected');
  }

  handleMessage(message) {
    console.log('[wsHandler] Received:', message.toString());
  }

  handleClose() {
    console.log('[wsHandler] Client disconnected');
  }

  handleError(error) {
    console.error('[wsHandler] error:', error);
  }
}

const wsHandler = new WebSocketHandler();

module.exports = {
  handler,
  wsHandler,
}

The WebSocketHandler class will handle the various events from the WebSocket connection.

Here’s how you could respond to a message from a client:

handleMessage(message) {
  this.server.clients.forEach(client => {
    if (client.readyState === ws.OPEN) {
      client.send(message.toString());
    }
  };
}

To use this template, you’ll need to pull down the template from my sample repository on GitHub:

faas-cli template pull https://github.com/alexellis/node20-ws

faas-cli new --lang node20-ws ws1

Then you can edit the handler.js file to add your custom logic.

If the style doesn’t fit your needs, but you are sure that WebSockets are the right approach, then you can fork the repository and modify the template to your needs.

Option 2: Package existing code as a function with a Dockerfile

This example follows the first approach, which uses JavaScript/Node along with the express.js HTTP framework. WebSocket support is then added via the ws library.

mkdir -p ws/

faas-cli new --lang dockerfile fn1

cd ws/fn1
npm init -y
npm install express ws

Delete the contents of the ./fn1/Dockerfile file, and replace it with the following:

FROM --platform=${TARGETPLATFORM:-linux/amd64} node:20-alpine AS ship

ARG TARGETPLATFORM
ARG BUILDPLATFORM

RUN apk --no-cache add curl ca-certificates \
    && addgroup -S app && adduser -S -g app app

# Turn down the verbosity to default level.
ENV NPM_CONFIG_LOGLEVEL warn

RUN chmod 777 /tmp

USER app

RUN mkdir -p /home/app

# Entrypoint
WORKDIR /home/app
COPY --chown=app:app package.json ./

RUN npm i

COPY --chown=app:app . .

# Run any tests that may be available
RUN npm test

# Set correct permissions to use non root user
WORKDIR /home/app/

CMD ["node /home/app/index.js"]

Create an index.js file:

const express = require('express')
const app = express()
const ws = require('ws');

const okHandler = (req, res) => {
  res.status(200).send()
};

app.use("/_/health", okHandler);
app.use("/_/ready", okHandler);

const wsServer = new ws.Server({ noServer: true });

wsServer.on('connection', function connection(ws) {
  ws.on('error', error => {
    console.error('WebSocket error:', error);
  })

  ws.on('message', function incoming(message) {
    console.log('received: %s', message);
    ws.send(`echo ${message}`);
  });

});

const server = app.listen(8080);

server.on('upgrade', function upgrade(request, socket, head) {
  wsServer.handleUpgrade(request, socket, head, function done(ws) {
    wsServer.emit('connection', ws, request);
  });
});

Now deploy the function to OpenFaaS:

faas-cli up -f fn1.yml --tag=digest

You should really update the image tag inside of stack.yaml every time you change it i.e. 0.0.1 to 0.2.0 and so forth. For convenience, the --tag=digest flag will generate a new tag based upon the contents of the handler folder, and saves some typing during development.

You can now connect your WebSocket client to the fn1 function using the gateway’s URL:

When using TLS:

wss://openfaas.example.com/function/fn1

When using plain HTTP, i.e. on 127.0.0.1:

ws://127.0.0.1:8080/function/fn1

The following client can be used to test the function:

const ws = require('ws');
const client = new ws('ws://127.0.0.1:8080/function/fn1');
client.on('open', () => {

    client.on('message', (data) => {
      console.log(`Got message ${data.toString()}`);
    });

    client.on('close', () => {
      console.log('Connection closed');
    });

    let n = 0;
    
    let i = setInterval(() => {
        console.log('Sending message');
        client.send(`Hello ${n++}`);
    }, 1000);

    setTimeout(() => {
        clearInterval(i);
        client.close();
    }, 10000);
});

Timeouts for WebSockets

The default timeout for a function and the installation of OpenFaaS can be extended to very long periods of time. Whilst there is no specific limit, we’d encourage you to try to right-size the timeout to your typical needs, so that might mean setting it to 1 hour, instead of 24 hours. Browser-based clients can also be configured to reconnect.

If you’re using your own code, then you just need to configure the Helm chart with a longer timeout.

If you’re using one of our templates, with the of-watchdog, then you’ll also need to timeouts for the function via environment variables.

You can learn more in the docs: Extended timeouts

Scaling WebSockets

Functions which expose WebSockets can be scaled horizontally by adding in extra replicas, or scaled to zero if there are no connections.

You can also force a function to act like a singleton, if you want to make sure it has the same state between multiple connections. If you were implementing a chat application or an AI agent, you may want to have one individual function deployment per customer, to maintain state. Idle replicas can be scaled to zero to save on resources.

The best scaling mechanism for WebSockets is the capacity type which works on the amount of TCP connections running against all the replicas of a function.

functions:
  fn1:
    labels:
      com.openfaas.scale.min: 1
      com.openfaas.scale.max: 10
      com.openfaas.scale.type: capacity
      com.openfaas.scale.target: 10

The above rules will create a function with a minimum of 1 replica, a maximum of 10 replicas, and a target of 10 connections per replica.

The value of com.openfaas.scale.target is a target number, replicas may end up with slightly more or less than this

For hard-concurrency limits use the max_inflight environment variable, and make sure your code uses the OpenFaaS of-watchdog, which implements the limiting.

functions:
  fn1:
    environment:
      max_inflight: 10

When using max_inflight, replicas of a function with at least 10 ongoing connections will be taken out of the load balancer’s pool, and if they do receive a request will respond with a 429 “Too many requests” error. If you use this option, configure your client to retry requests until it can connect successfully.

If you want to create a singleton, you can override scaling to that there is only ever one replica of the function.

functions:
  fn1:
    labels:
      com.openfaas.scale.min: 1
      com.openfaas.scale.max: 1

Scale to zero is also supported with WebSockets:

functions:
  fn1:
    labels:
      com.openfaas.scale.zero: true
      com.openfaas.scale.zero-duration: 15m

The above will scale any functions to zero if they haven’t had a new connection established within the last 15 minutes.

Learn more:

Conclusion

We covered two approaches for integrating with WebSockets. In the first approach, we created a new template called node20-ws based upon an existing one. We added support for the Node.js ws library through functions in the handler for the lifecycle events of a WebSocket connection. That custom template could be shared with your team very easily by pushing it to a public or private git repository. In the second example, we packaged existing code as a function with a Dockerfile, which gave us more flexibility, but at the cost of having duplication between each function.

In both cases, a standard client was used to connect to the function, and messages were echoed back and forth between the client and the function.

We then touched on how to scale WebSockets, and how to configure timeouts for functions.

But why isn’t there “WebSocket support” in every official OpenFaaS template?

Server Sent Events (SSE) is a simpler, and more compatible, approach to streaming data to the client.
WebSockets are complex, and used in many different ways, we couldn’t build a template that suited every developer’s needs.
We’d rather have a small number of templates that are well-supported, and have a near-perfect developer experience.

We instead have provided a starting point where you can write your applications as if they were just being deployed through Docker, and an example of how to modify the template to support WebSockets.

If you’d like to try out websockets in OpenFaaS, feel free to get in touch or join our Weekly Community Call to see a live demo.

Scale to zero GPUs with OpenFaaS, Karpenter and AWS EKS

2025-02-05T00:00:00+00:00

Learn how to run GPU accelerated functions on OpenFaaS while using Karpenter to save on infrastructure cost.

If you have ETL pipelines where certain processing steps require some AI model to run. Or if you are doing tasks like, audio transcription, image analysis with object recognition or natural language processing (NLP) for text extraction, then using GPUs can significantly speed up these AI-driven tasks.

GPU nodes can be expensive and you don’t want these nodes to sit idle costing you money when they are unused. In his post we will walk you through an example of how to build and run these kinds of workloads with OpenFaaS. We will see how OpenFaaS features like scale-to-zero and asynchronous invocation can be used together with Karpenter to add and remove GPU nodes on demand.

The impact of scale to zero GPUs on cost.

Loading a mid-size LLM over 4x GPUs with 96GiB of VRAM on a g6.12xlarge instance would work out to a cost of 3359 USD per month if the node is permanently added to the cluster. Per hour the same instance is only 4.6 USD. Therefore if you only needed it for 1 hour per day, you’d pay roughly 138 USD per month when you configure functions to scale down to zero. If you could get it on spot, the price drops to 1.9 USD per hour or 57 USD per month.

This post is the second part in a series covering OpenFaaS and Karpenter. Make sure to read the first part to learn what makes Karpenter a good match for OpenFaaS and a detailed guide on how to deploy and configure OpenFaaS and Karpenter on AWS EKS.

In this post we will configure the cluster to run GPU accelerated workloads and create a basic Python function to transcribe audio using the OpenAI Whisper model. We will show how to invoke the function asynchronously and get the result back for further processing.

Invoke a function asynchronously to ensure the invocation is queued while a new GPU node is being provisioned.

The queue worker will dequeue the request and attempt to invoke the function as soon as it is ready to accept requests. The result of the invocation is than posted back for further processing using the async callback url.

Prerequisites

To follow along and run the examples yourself we assume you already have an AKS cluster running with OpenFaaS and Karpenter installed and have a basic knowledge of how Karpenter works.

If you don’t have a cluster yet, read the first part of this series on OpenFaaS and Karpenter. There we show how to deploy and configure OpenFaaS and Karpenter on AWS EKS in detail.

AWS has service quotas that limit the types and number of EC2 instances that can be provisioned in each region. To run the examples in this post make sure your quotas are set high enough to allow scheduling g on optionally p category instances. We recommend increasing the limit

Prepare the cluster for GPU support

If you want to add nodes to the cluster that utilize GPUs you need to deploy the appropriate device plugin daemonset. In this example we will be using Nvidia GPUs only. However Kubernetes and Karpenter support more GPU vendors and types of accelerators. See the Karpenter docs for more.

Apply the Nvidia device plugin Daemonset:

kubectl create \
  -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.0/deployments/static/nvidia-device-plugin.yml

This command applies a basic static Daemonset which is enough for the basic use cases covered in this article. For more advanced configurations like shared GPU access or if you need more control over the installation it is recommended to deploy the device plugin via Helm instead.

Note: Installation and configuration of the Nvidia container runtime is not required. We will be configuring Karpenter to provision nodes with an appropriate EKS optimized Amazon Machine Image (AMI) that comes with the runtime installed.

Schedule GPU nodes with Karpenter

Karpenter supports accelerators such as GPUs. A GPU can be requested by simply adding resource requests to a Pod e.g. nvidia.com/gpu: 1. We will be creating a separate Karptner NodePool and NodeClass to match GPU resource requests.

Add a GPU node pool

Create a NodePool for workloads that need an Nvidia GPU:

cat > gpu-nodeclass.yaml << EOF
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu
spec:
  template:
    metadata:
      labels:
        nvidia.com/gpu: "true"
    spec:
      taints:
        - key: nvidia.com/gpu
          value: "true"
          effect: NoSchedule
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["g", "p"]
        - key: "karpenter.k8s.aws/instance-gpu-manufacturer"
          operator: In
          values: ["nvidia"]
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["spot", "on-demand"]
        - key: nvidia.com/gpu
          operator: Exists 
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: gpu
      expireAfter: 720h # 30 * 24h = 720h
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
EOF

The spec.taints is used to add a taint to GPU nodes. This prevents other Pods that don’t need a GPU from running on the nodes.
In the requirements we tell Karpenter to allow both spot and on-demand instances. Karpenter will try to schedule spot instances first since they are usually cheaper and will only use the on-demand as a fallback. If your workload can not tolerate interruptions because an instance is reclaimed, request on-demand only.
We configure Karpenter to select from a range of GPU instances. By setting karpenter.k8s.aws/instance-category, we require instances from the g and p categories. We also set karpenter.k8s.aws/instance-gpu-manufacturer to allow Nivida GPUs only. See the instance type reference in the Karpenter docs for all available types and labels to select instances best suited for your workload.

It is recommended to let Karpenter select from a wide enough range of instance types and avoid running out of capacity when some instances are not available.

Create a gpu NodeClass. This class is referenced by the NodePool and is used to select the EKS optimized Amazon Machine Image (AMI) that should be used when provisioning GPU nodes. This should be a GPU optimized AMI that includes the correct drivers and runtime.

export GPU_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text)"
export CLUSTER_NAME="openfaas"

cat > gpu-nodeclass.yaml << EOF
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: gpu
spec:
  amiFamily: AL2 # Amazon Linux 2
  role: "KarpenterNodeRole-"${CLUSTER_NAME}" # replace with your cluster name
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  amiSelectorTerms:
    - id: "${AMD_AMI_ID}" # <- GPU Optimized AMD AMI 
EOF

Apply both the NodeCall ans NodePool to the cluster:

kubectl apply -f gpu-nodeclass.yaml
kubectl apply -f gpu-nodepool.yaml

Run a GPU accelerated function.

While the OpenFaaS function spec allows setting cpu and memory resources, gpu resources can not be configured directly through the function spec. They need to be set using an OpenFaaS Profile. Profiles allow for advanced configuration of function deployments on Kubernetes and allow you to easily apply the configuration to multiple functions.

Create a Profile named gpu. This profile can be applied to functions by adding the annotation com.openfaas.profile=gpu. The spec from the Profile will be added to the function deployment.

The gpu Profile needs to include resource request and limits for GPUs and a toleration that allows the function to run on GPU nodes.

kind: Profile
apiVersion: openfaas.com/v1
metadata:
  name: gpu
  namespace: openfaas
spec:
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
  resources:
    requests:
      nvidia.com/gpu: 1 # requesting 1 GPU
    limits:
      nvidia.com/gpu: 1

Deploy the nvidia-smi function from the OpenFaaS store for testing:

faas-cli store deploy nvidia-smi \
  --annotation com.openfaas.profile=gpu

Invoke the nvidia-smi function to verify it can make use of the GPU:

curl -i --connect-timeout 120 http://127.0.0.1:8080/function/nvidia-smi

Note that we explicitly increase the request timeout to 120 seconds. This is to make sure the request does not timeout while the function is getting scheduled. This larger timeout is required for the initial request because Karpenter has to provision a new GPU node. The function can only be started once the node is ready. During testing we saw adding a new node takes between 50 and 70 seconds on average.

The OpenFaaS gateway will hold on to the request while the function is pending. Once the Readiness probe has passed the request is forwarded.Any subsequent requests wont have this delay.

If the request is successful you should see the nvidia-smi output in the response.

Tue Jan 21 17:57:25 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       On  |   00000000:00:1E.0 Off |                    0 |
| N/A   23C    P8              9W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

You can always check the number of GPUs available on nodes by running:

kubectl get nodes \
"-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"

NAME                             GPU
ip-192-168-19-10.ec2.internal    
ip-192-168-35-233.ec2.internal   1

Tutorial: create a GPU accelerated function workflow.

In this section we are going to show you how to run a basic GPU accelerated function workflow. We will create a function that runs the Whisper speech recognition model to transcribe an audio file. The function takes a url to a file as the input, transcribes the files and returns the transcript in the response. The response will be submitted to the next function in the workflow for further processing.

This basic example is intended to show you how to:

Use the OpenFaaS autoscaler and scale to zero capabilities for cost effective, on demand GPU node provisioning with Karpenter. GPU nodes are removed from the cluster to save cost when functions are idle.
Use asynchronous invocations and callbacks to build a resilient function workflows. Invoke function asynchronously to handle, cold start delays when GPU nodes need to be provisioned. Use the async callback functionality to get the result from asynchronous invocations for further processing by a next function.
Use concurrency limits and retries for efficient scaling and to prevent overloading the cluster.

Create a function

Create a new function using the OpenFaaS CLI.

# Change this line to your own registry
export OPENFAAS_PREFIX="docker.io/welteki"

# Scaffold a new function using the python3-http-debian template
faas-cli new whisper --lang python3-http-debian

This scaffolds a new function named whisper using the python3-http-debian template, one of the official OpenFaaaS python templates.

The function handler whisper/handler.py is where you write the custom function code. In this case the function retrieves an audio file from a url that is passed in through the request body. Next the whisper model transcribes the audio file and the transcript is returned in the response.

import os
import tempfile
from urllib.request import urlretrieve

import whisper

def handle(event, context):
    models_cache = os.getenv("MODELS_CACHE", "/tmp/models")
    model_size = os.getenv("MODEL_SIZE", "tiny.en")

    url = str(event.body, "UTF-8")
    audio = tempfile.NamedTemporaryFile(suffix=".mp3", delete=True)
    urlretrieve(url, audio.name)

    model = whisper.load_model(name=model_size, download_root=models_cache)
    result = model.transcribe(audio.name)
    
    return (result["text"], 200, {'Content-Type': 'text/plain'})

The first time the function is invoked it will download the model and save it to the location set in the models_cache variable. /tmp/models is used by default. Subsequent invocations of the function will not need to refetch the model.

It is good practice to only write to the /tmp folder from function and make the function filesystem read-only by setting readonly_root_filesystem: true in the function stack.yaml. This provides tighter security by preventing the function from modifying the rest of the filesystem.

The function uses the tiny.en model by default but different model sizes can be selected by setting the MODEL_SIZE env variable for the function.

Add runtime dependencies

Our function handler uses the openai-whisper python packages. Edit the whisper/requirements.txt file and add the following line:

openai-whisper

Whisper relies on ffmpeg for audio transcoding. It requires that ffmpeg is installed it the function container as a runtime dependency. The OpenFaaS python3 templates support specifying additional packages that will be installed with apt through the ADDITIONAL_PACKAGE build arguments.

Update the stack.yaml file:

functions:
  whisper:
    lang: python3-http-debian
    handler: ./whisper
    image: docker.io/welteki/whisper:latest
+    build_args:
+      ADDITIONAL_PACKAGE: "ffmpeg"

Apply profiles

The function will need to apply the gpu profile that was created while preparing the cluster for GPU workloads. The profile sets gpu resource requests and adds the required tolerations to the function deployment. Add the com.openfaas.profile: gpu annotation to the stack.yaml file:

functions:
  whisper:
    lang: python3-http-debian
    handler: ./whisper
    image: docker.io/welteki/whisper:latest
    build_args:
      ADDITIONAL_PACKAGE: "ffmpeg"
+    annotations:
+      com.openfaas.profile: gpu

Configure scale to zero

The Karpenter gpu NodePool that we configured removes nodes when they are idle or when the resources are underutilized. Our function requests a GPU to run so as long as there are any function replicas holding on to these resource no nodes will get removed. To free up these resources and save money by removing GPU nodes from the cluster, functions can be configured to scale down to zero replicas when idle.

Scale down to zero is controlled by the OpenFaaS Pro autoscaler. By default the autoscaler does not scale functions to zero. This is by design and means that you need to opt-in each of your functions to scale down. Scale to zero for function is configured by setting the com.openfaas.scale.zero and com.openfaas.scale.zero-duration on a function.

Add the autoscaling labels to the stack.yaml configuration to scale down the function after 2 minutes of inactivity.

functions:
  whisper:
    lang: python3-http-debian
    handler: ./whisper
    image: docker.io/welteki/whisper:latest
    build_args:
      ADDITIONAL_PACKAGE: "ffmpeg"
    annotations:
      com.openfaas.profile: gpu
+    labels:
+      com.openfaas.scale.zero: true
+      com.openfaas.scale.zero-duration: 2m

The whisper function will be scaled down to zero replicas after 2 minutes if there are no more invocations. When Karpenter detects the GPU node is empty the node is removed from the cluster. This way you wont pay for idle GPU resources.

Conceptual diagram showing how an unused GPU node gets removed from the cluster by Karpter when the function running on the node gets scaled down to zero replicas by OpenFaaS.

Configure timeouts

It is common for inference or other machine learning workloads to be long running jobs. In this example transcribing the audio file can take some time depending on the size of the file and the GPU speed. To ensure the function can run to completion timeouts for the function and OpenFaaS components need to be configured correctly.

We will be increasing the timeout to 30min. Unlike AWS Lambda, which has a maximum runtime limit of 15 minutes, with OpenFaaS you can pick any value.

If you followed steps in our previous post, Save costs on AWS EKS with OpenFaaS and Karpenter, to set up your cluster the timeouts for the OpenFaaS core components should be set to 10 minutes. Make sure to increase these to match the longest function timeout, in this case 30 minutes. See: Core component timeouts

Update the stack.yaml file to set the appropriate timeouts for the function:

functions:
  whisper:
    lang: python3-http-debian
    handler: ./whisper
    image: docker.io/welteki/whisper:latest
    build_args:
      ADDITIONAL_PACKAGE: "ffmpeg"
    annotations:
      com.openfaas.profile: gpu
    labels:
      com.openfaas.scale.zero: true
      com.openfaas.scale.zero-duration: 2m
+    environment:
+      write_timeout: 30m5s
+      exec_timeout: 30m

See the section on extended timeouts in our docs for more info.

Invoke the function asynchronously and capture the result

Before the function can be invoked it needs to be deployed to the cluster. The faas-cli can build and deploy the function using a single command:

faas-cli up whisper

We are going to invoke the function asynchronously and set the X-Callback-Url header to receive the result. In this example we will be sending the result to the printer function for simplicity. The printer function is one of our utility functions that just logs the request headers and body when invoked.

In a production pipeline the callback function could be the next step in the workflow that does some further processing of the result or uploads it to some storage solution like a database or S3 bucket.

Deploy the printer function:

faas-cli store deploy printer

Invoke the function asynchronously using curl:

curl -i http://127.0.0.1:8080/async-function/wisper \
  -H "X-Callback-Url: http://gateway.openfaas:8080/function/printer"
  -d "https://raw.githubusercontent.com/welteki/openfaas-whisper-example/refs/heads/main/tracks/track.mp3"

Monitor the logs of the printer function to see the result.

faas-cli logs printer -t

Note that it can take some time before we get back the result. As we saw in the first section of the article with the nvidia-smi function Karpenter needs to provision a new GPU node before the function Pod can be scheduled.

Since the function was invoked asynchronously there is no need to worry about setting the correct request timeout. The OpenFaaS queue-worker will try to invoke the function once it becomes ready. Any failures are retried with a backoff and the result is posted back to the URL that we set in the X-Callback-Url header.

Build production ready workflows

In the previous section we touched on the base concepts for creating an async GPU accelerated workflow by chaining functions together, using the async callback to get the result for further processing. In this section we will run through some of the extra things that need to be considered to make a workflow ready for production.

Trigger a workflow

The default, and standard method for interacting with functions is through http requests. Like we showed in the tutorial, a workflow can be triggered by simply calling it from your application. You might want to trigger a workflow based on other events like:

Cron schedules - trigger functions upon a schedule.
Database changes - trigger functions whenever a row in the database changes.
S3 file uploads - trigger OpenFaaS functions when a new file is uploaded to a bucket.

OpenFaaS integrates with different event sources through event connectors.

If you are integrating with different AWS services, there is a connector available for AWS SNS and AWS SQS. You might also like one of our other post on integrating with AWS:

Model caching

One of the main things to consider when creating functions that run these types of AI workloads is where to store the ML models. Models are often large and should be cached between function invocations.

Fetch on first request

In the code example used in this post the model is fetched when the function is called the first time and cached for subsequent invocations. This has the disadvantage that the first invocation takes longer.
Fetch on Pod start

Another option is to download the model immediately when the function starts. Combine this with a custom readiness check to prevent Kubernetes from sending traffic to the function while the model is still getting fetched. See Custom health and readiness checks.
Add to container Image

Both of the previous methods have the disadvantage that the model has to be downloaded again each time a new function replica is created. As an alternative, the model can be pre-fetched and included in the function container image. This will result in larger images that can take longer to pull the first time but can make use of image layer caching. This improves cold starts when scaling up a function if the image or certain layers are already cached.
Include in AMI

In a setup with Karpenter where nodes are created and removed often there is a high likelihood an image is not present in the cache and has to be pulled anyway. To work around that you could take it one step further and create a custom AMI with pre-pulled images. The Karpenter NodeClass can be updated to use the custom AMI.

As you can see there are trade-offs for all of the options and there might be some future work for us here to further improve the platform for this type of use case.

Handle the result callback

In the tutorial we used the printer function to collect the result of an async invocation and log the result. While this is a handy function for debugging and experimentation in a production pipeline the result probably needs further processing.

You could upload the result to an AWS S3 bucket, which might in turn trigger another function.
You could send the result to a next function that runs another inference model or LLM to do further processing of the result.

While we used a second function to receive the X-Callback-Url call in this example, the target does not have to be a function but can be any http service running inside or outside the cluster.

OpenFaaS is very flexible and does impose little limits in how you chain functions together to create these types of pipelines. You can easily fan-out by having a function invoke multiple other functions and with some extra state management, fanning in is also possible.

We have written up some hands on examples of this kind of patterns, including storing results in S3, in other posts:

Setting limits

Depending on the number of GPUs assigned to a function and the available memory for each GPU you might want to limit the amount of requests that can go to the function at once. Kubernetes doesn’t implement any kind of request limiting for applications, but OpenFaaS can help here.

To prevent overloading the Pod and GPU, we can set a hard limit on the number of concurrent requests the function can handle. This is done by setting the max_inflight environment variable on the function. When a function cannot accept any more connections due to the inflight setting, it will return a 429 error, which indicates the message can be retried at a later time. When the function is invoked asynchronously retries are handled automatically.

Update the stack.yaml file to apply such a limit to the whisper example function:

functions:
  whisper:
    lang: python3-http-debian
    handler: ./whisper
    image: docker.io/welteki/whisper:latest
    environment:
      write_timeout: 5m5s
      exec_timeout: 5m
+      max_inflight: 6

To avoid any unexpected charges it might be good to set a limit on the number of GPUs in the cluster.

OpenFaaS sets a default limit of 20 replicas for a single function. This limit can be changed using the com.openfaas.scale.max autoscaling label.

While this already prevents the cluster from scaling excessively the number of GPU nodes Karpenter will try to add to the cluster depends on the number of function requesting GPUs and the number of GPUs requested by each function.

To set a fixed limit use the limits section in the NodePool spec.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu
spec:
  limits:
    nvidia.com/gpu: 10

This limit allows a maximum of 10 GPUs in total in the NodPool. If the limit has been exceeded, nodes provisioning is prevented until some nodes have been terminated.

Keep in mind that AWS has service quotas. This limits the types and number of EC2 instances that can be provisioned in each region. Make sure your quota are set high enough to reach your desired cluster size.

Conclusion

In this second part on OpenFaaS and Karpenter we wanted to show how to deploy functions that require a GPU on OpenFaaS, invoke them and get the result back for further processing. By combining Karpenter node provisioning with OpenFaaS autoscaling capabilities we ensure you don’t have to pay for idle GPU resources.

In the first section we prepared the cluster and configured Karpenter to run GPU accelerated workloads. The Nvidia device plugin was installed and a new Karpenter NodePool with matching NodeClass for GPU nodes was applied to the cluster. This NodePool enables Karpenter to provision GPU instances and dynamically add GPU nodes to the cluster when required.

In the second part of the article we showed you how to create and deploy a basic Python function that uses OpenAI Whisper to transcribe audio. We walked though the different configuration options required to run the function and discussed more advanced patterns and features like:

Scale to zero to trigger Karpenter to remove unused GPU nodes.
Invoking the function asynchronously to handle cold start delays when Karpenter has to add extra GPU resources to the cluster.
Configure concurrency limiting to prevent overloading your GPU while still making sure all requests can run to completion.
Use the async callback to get back the result and combine functions together to create pipelines.

How to access APIs with OAuth tokens from OpenFaaS functions

2025-01-31T00:00:00+00:00

Learn how to access APIs that require OAuth tokens from your functions, from the initial token exchange and capture, to refreshing the token periodically.

Introduction

As we explored my previous blog post How to integrate with OpenFaaS Functions, a very common use-case for functions is a background job triggered from a cron schedule. Most of these background jobs import, check, transform, or update some external data. This pattern is often called Extract, Transform & Load (ETL).

What kinds of use-cases are there for accessing APIs in a background job?

Expense reconciliation - an employee has been away at a conference, and needs to submit receipts to Concur.com or expensify.com. The function gains access to your Gmail account, and scans for emails that have the same dollar value as the unexplained card transaction, then uploads the receipt to the expense system.
Quota checks - your director doesn’t want your cloud spend to increase beyond 10,000 USD per month, but the spending is spread across multiple cloud providers. A function obtains the credentials for each using a restrictive IAM policy, then queries cloud usage and saves it in a database. Another function runs hourly to check the total spend, and sends an alert if it’s over the limit.
Analytics / reports - an entrepreneur who sells digital products on Gumroad wants to know how many sales they made in the last week vs the previous week to analyse their products’ performance. With a function, they’re able to enhance the Gumroad product without having to rely on the Gumroad team to build the feature.
Notifications - you may have launched a new product, and want an alert to be sent to the team when you hit certain milestones such as 100, 1000, 10000 daily active users. Rather than building this into your product’s roadmap and sprints, you could build an external function to query the API and send the alert.
Data enrichment for a CRM - say you have people signing up for your service with their work email, a function could use a third-party API such as Crunchbase.com to look up the company, and enrich the user’s profile with the company’s size, funding, and other details.
Collaboration - you may have a team that uses Slack, and you want to create a function that can be triggered from a Slack command to create a Zoom meeting, or to create a new Google Doc and invite a certain set of users.

These are just a few ideas, they typically only need one replica of copy of the code to be running. They run a request in the response to an event, or on an hourly, daily, or weekly cron schedule. They don’t need to be highly available, but they do need to be secure.

External APIs exposed over HTTP tend to have their own authentication mechanisms.

Static Access Tokens

These are the simplest to work with, you go to an admin or settings page, and click “Generate API key” or “Generate Access Token”. Sometimes you’ll also select a set of scopes or permissions for what the token can access.

For instance, for a very long time, DigitalOcean the cloud provider used to offer only Read or Write tokens for their API. This meant that if you wanted to create a Droplet from a function, you’d need a Write token. Quite recently, they added much more fine-grained scopes for various operations, so if you wanted a token that can only create Droplets, but not delete them, you can do that.

Another very popular API would be GitHub’s. They offer a number of more advanced integration options using OAuth or GitHub Apps, but the most popular due to its simplicity is the Personal Access Token. This token can be used to access the GitHub API, and can be scoped to read or write access to repositories, or even more fine-grained access.

So why are Static Access Tokens not used everywhere?

Static Access Tokens are great for single users, like a function built only to access your own account. But if you wanted to offer a service to others, you would need them to create and provide you with an access token, permissions, etc. OAuth is a standard that allows users to grant access to their data to third-party applications.

OAuth Tokens for humans and webpages

The first experience most people will have of OAuth may be something like a “social login” - log into a website with your Facebook.com account, without creating a username and password. Or perhaps you pay for an add-on for YouTube that creates descriptions for your videos, and it needs to access your YouTube account, it’ll use an OAuth flow to obtain that permission, then save it and hold onto a credential. The website may encrypt and store the token in a browser cookie or a short-lived session.

The most typical web-based flow is the Authorization Code Grant, which requires a user to be present. The user must grant the application the specific permissions/scopes and click buttons to accept the request. But what if you want to access that same API from a function?

A function triggered from Cron i.e. once per day cannot click through a web flow, type in your password, or click “Allow”. So how do you get an OAuth token?

OAuth Tokens for Functions

The main option for headless functions to obtain an OAuth token is through a flow that uses a specific credential created in the OAuth system, namely Client Credentials with a Client ID and Client Secret. This flow is not much better than a Static Token, and you’ll find that for some APIs, it’s simply not offered in any way. Our team moved from Slack to Discord for work, and found that Discord lacked a Zoom add-on for creating meetings on the fly. We set out to build one, and were pleasantly surprised that Zoom offered a server-to-server / headless OAuth flow.

You can see how this worked in my article from the time: Build a serverless Discord bot with OpenFaaS and Golang

If you need to access an API that uses OAuth, client credentials and similar headless flows will likely be the most convenient way to do so.

But when I was trying to interact with my bank’s API, I couldn’t find any kind of client credentials or headless flow. I had to come up with an alternative solution.

The hybrid approach involves an initial activation by a human user present at a keyboard:

The function starts off in a disabled state, and cannot be used
A human user visits a specific path such as /function/NAME/enable and an Authorization Code flow is initiated, with the callback being the function itself i.e. /function/NAME/oauth2/callback
The user logs in, and grants the function access to their account, the function captures the Authorization Code and uses it to obtain an Access Token and Refresh Token.
The function stores the tokens in a secure store, such as a Kubernetes Secret, or uses a symmetric encryption key to encrypt the string and store it in a database.

User authorization and token storage, ready for subsequent invocations

Once activated with a credential, the function can be scheduled to run on a cron, or triggered by an event without anyone present.

The function starts, and reads the token from memory or the secret
If the token is expired, it uses the Refresh Token to obtain a new Access Token, without the user being present, the new Access Token is stored in the same way as during activation.
The function can now access the API on behalf of the user, until the token expires again.

Example Function in Go

We’ll use Go for the example, since that’s the language I tend to use the most for OpenFaaS, and OpenFaaS itself is built in Go. However, you could port the example to our Node or Python template in a matter of minutes, you could even use an LLM to do that for you.

This function will be used to fetch the last recorded sale of an eBook, software subscription, or a physical product from a vendor’s store on Gumroad. What you do with the API, is very much down to your own use-case, we’re just showing how to do something with the token to show it works.

I used the documentation at https://gumroad.com/api to help me write the example.

The below will mostly cover the setup adn configuration of the function, with a brief mention of the handler, the rest is on GitHub since the code spans a number of pages.

Create a new function, setting OPENFAAS_PREFIX to your account or repository on a container registry:

export OPENFAAS_PREFIX="alexellis2"

faas-cli new --lang golang-middleware etl-oauth

Navigate to the API in question and create an OAuth App (sometimes called an “Application”), you will be given a Client ID and a Client Secret. Make sure you provide a callback URL to the function i.e. https://example.com/function/etl/oauth2/callback.

Create an OAuth App

Edit stack.yaml and include a secrets: section:

functions:
  etl-oauth:
    secrets:
    - oauth2-client-id
    - oauth2-client-secret
    - oauth2-access-token
    - oauth2-refresh-token

Create secrets for the function:

faas-cli secret create oauth2-client-id --from-literal=""
faas-cli secret create oauth2-client-secret --from-literal ""

# These will be obtained later
faas-cli secret create oauth2-access-token --from-literal "empty"
faas-cli secret create oauth2-refresh-token --from-literal "empty"

There are a number of places where the token can be stored once obtained, for the purposes of this example, we’ll use the OpenFaaS REST API to store the token in a Kubernetes Secret.

Create a secret for the function with the admin password for the gateway in the openfaas-fn namespace so it’s accessible from the function:

PASSWORD=$(kubectl get secret -n openfaas basic-auth -o jsonpath='{.data.basic-auth-password}'|base64 --decode)
faas-cli secret create basic-auth-password --from-literal $PASSWORD

Now add the following to the stack.yaml file:

functions:
  etl-oauth:
    secrets:
    - basic-auth-password

Your function now has the static secrets it needs to perform a token exchange, plus dynamic secrets to be updated later, and the gateway’s password to store the updated Access Token and Refresh Token.

This function will not need to scale beyond a single replica, it’s task will be a fairly simple background ETL job, so we can limit its scaling to 1 replica:

functions:
  etl-oauth:
    labels:
      com.openfaas.scale.min: "1"
      com.openfaas.scale.max: "1"

Most OAuth flows are fairly standard, but you will find variances. This is unfortunate and cannot be changed, so if you find that the API you’re working with wants an additional header passing in, a different content-type, etc, then you should be able to adapt the code with relative ease.

Here is an example token I received with the view_sales scope.

Perhaps your function collects all the sales of the past week, and sums them up for an insightful weekly email breaking down the sales by region, or performance vs the previous week.

{
"access_token":"",
"token_type":"Bearer",
"refresh_token":"",
"scope":"view_sales",
"created_at":1738322858
}

The code ended up being quite long for this function, so instead of quoting it all in this webpage, you can explore the handler on GitHub.

The handler is responsible for routing the incoming request based upon the HTTP path and the current state, such as whether a token is available or not.

If a token is not available, and a user visits the URL, they’ll be redirected to a page that will then redirect them to Gumroad to authorize the application. Once the user has authorized the application, they’ll be redirected back to the function, where the function will capture the Authorization Code, and use it to obtain an Access Token and Refresh Token.

package function

func Handle(w http.ResponseWriter, r *http.Request) {
	if r.Body != nil {
		defer r.Body.Close()
	}

	if r.Method != http.MethodGet {
		w.WriteHeader(http.StatusMethodNotAllowed)
		return
	}

	if r.URL.Path == "/enable" {
		handleEnable(r, w)
		return
	} else if r.URL.Path == "/oauth2/callback" {
		handleCallback(r, w)
		return
	}

	// If the default path is hit before a token is available, then redirect to the enable path
	if oauth2AccessToken == "empty" {
		redirect, _ := url.Parse(publicURL)
		redirect.Path = path.Join(redirect.Path, "/enable")
		http.Redirect(w, r, redirect.String(), http.StatusTemporaryRedirect)
		return
	}
}

Once authorized, and a token is available in the secret, the handler will fall through into the logic of the job itself:

    // Continued from handler.go

	// Look back around 7 days
	after := time.Now().Add(time.Hour * -24 * 7)

	lastSale, err := getLastSaleValue(after)
	if err != nil {
		log.Printf("error getting last sale value: %s", err.Error())
		w.WriteHeader(http.StatusInternalServerError)
		return
	}

	fmt.Fprintf(w, `
Last sale
Last sale
%+v
`, lastSale)
}

Whilst I was exploring the API, I noticed that:

There was no expires_in field in the token
There was no documentation on how to refresh the token (if it ever expires)
There was no documentation on how to refract the specific token

Copy the handler.go file from the sample repository into your function’s directory.

Deploy the function with:

faas-cli up --tag=digest --publish

When you visit the function’s URL, it will redirect you to the /enable path, where you can log in and grant the function access to your account. The function will then store the Access Token and Refresh Token in the Kubernetes Secret.

https://example.com/function/etl-oauth

Authorize the application to access your account for the specific scope

The function will then store the Access Token and Refresh Token in the Kubernetes Secret using the OpenFaaS API and give a success message:

The success message from the function after the token exchange

Then, whenever you visit the function’s URL again, it will perform the query and print out the results in the HTTP response. Ideally, you should move these to log statements, or take some other action based upon the data you find. The example leaks the data deliberately, so that you can see it’s working.

Data from the REST API obtained with the Access Token

The values are held in cents, so you can see that for this particular transaction the user paid 24.99 USD and the fee taken was 3.75 USD, roughly 15% of my revenue.

Further work

Opaque vs JWT Access Tokens

Some APIs like Gumroad will return an opaque token which cannot be parsed or unpacked by your function. Others are returned as JSON Web Tokens (JWTs) which can be parsed with a special library to extract what claims they have, when they were issued, who they were issued to, and when they’ll expire.

PKCE extension to OAuth2

An extension was built for the Authorization Code Grant called Proof Key for Code Exchange (PKCE). When we built the Single-Sign On (SSO) integration for OpenFaaS, we found that just like Gumroad, Google didn’t support the PKCE flow either. Ultimately, whilst OAuth2 is a standard, you will find a lot of variance.

The code example I’ve included doesn’t use the PKCE flow, but it is not much more code or work to implement it, an LLM could probably suggest the changes for you.

As a bonus, a Function that uses the PKCE flow to obtain an Access Token won’t typically need a Client Secret, which is one less thing to manage and update over time.

Secure storage of the Access Token

Once we’ve obtained an Access Token and Refresh Token, we need to store them securely. In some cases they will expire, so the risk has an expiry date, but some are perpetual, and so there is an ongoing risk, but not any worse than creating a Personal Access Token and attaching that to a function.

If we use the OpenFaaS API itself to store functions in Kubernetes secrets, then there is a potential security risk, since another function in the same namespace could mount the secret and read it to use the API. As a workaround, you could deploy the function into its own dedicated namespace, preventing the secret from being mounted by other functions.

An alternative may be to add a secret to the function with a symmetric encryption key, and to use this to encrypt the value before storing it in a database, or in some other storage mechanism like NATS JetStream’s KV store.

One other idea is to create a one-off ServiceAccount, RBAC permissions within the Kubernetes API for a specific named secret, then to grant the function access to store the secret in the Kubernetes API directly. This is the most complex option of them all, but could suit some power users.

Access to the function

When a function holds an *Access Token**, you now need to think about how to authenticate the function itself, otherwise the function could be invoked by any unauthorized user.

Invoking from a cron schedule

One option is to return no data to an invoker, but have the function perform its work silently, querying & updating any state found from the API, and sending off emails, Slack/Discord messages, etc

Invoking from a headless service

If this particular function needs to be invoked by another headless service, then you’ll need to implement some form of authentication, such as a shared secret, RSA Public Key cryptography, or a JWT token.

Invocation by a user directly

Have a look at the built-in function authentication IAM for OpenFaaS, which supports JWT tokens, and can be used to authenticate a user before they can invoke the function.

Refreshing the token

We didn’t cover token refreshing in this example because it appears as if the Access Token may have an unlimited lifespan.

You can learn a bit more about refresh tokens in the spec: IETF RFC6749 and on this page at OAuth.com

Other considerations

The URL in the code sample will need to be changed from “https://gumroad.com” to something else such as “https://api.xero.com”.

Most APIs come terms and conditions and stringent rate-limits, so make sure you understand these before you start accessing them from your function.

Before the Twitter became X, the API used to be free of cost, but had a very restrictive rate limit. When I was developing a function, it needed to gather some data when it started up, which meant every time I built a new version of the code, it would eat away at the rate limit until there was nothing left.

Conclusion

External data often requires some form of API token or password; some support long-lived Static Access Tokens, for others you will need to use OAuth. OAuth is a standard that allows users to grant access to their data to third-party applications, a Client Credentials or Server to Server flow is best suited for a function. But we showed that it is possible to obtain, store, and use an OAuth token obtained from a web flow in a function, for later invocation from a Cron schedule or event trigger.

We approached the function primarily as a background job that would run against our own data or account using a single token that was captured, however you could build a multi-user function in a very similar way. You would simply need to store an Access Token per user, rather than a single token. Encryption is paramount, and you should consider the full lifecycle of the token, from obtaining it, to storing it, to refreshing it, and finally to revoking it if the user no longer wants the function to access their data.

If you’d like to discuss this article further, we have a weekly call for community or you can get in touch via this page.

Save costs on AWS EKS with OpenFaaS and Karpenter

2025-01-29T00:00:00+00:00

In this tutorial we will show you a recommended configuration for OpenFaaS Functions for Karpenter on AWS EKS.

We’ll start by deploying OpenFaaS to AWS EKS, then we’ll set up Karpenter for cluster autoscaling. When you autoscale both your functions and your Kubernetes nodes, then you can keep costs down to an absolute minimum. There are some trade-offs to this approach, so we’ll cover that along the way.

The cluster will be split in static and dynamic capacity. The OpenFaaS core components will be running on static nodes in an EKS Managed Node Group. Then, Karpenter will be used to add, remove and resize nodes in the cluster based on the load observed on your functions.

What is Karpenter?

Karpenter is an open-source Kubernetes cluster autoscaler originally created for use with AWS EKS. It automates the provisioning and deprovisioning of nodes based on the scheduling needs of Pods, allowing efficient scaling and cost optimization.

Compared to other alternatives like Amazon EC2 Auto Scaling Groups and the Kubernetes Cluster Autoscaler (CAS), Karpenter has some differences.

It’s specialised primarily for use with AWS, and has some experimental support for Azure.
Is a modern, purpose built autoscaler that not only adds and removes nodes like the other two solutions, but tries to optimise for density, by replacing smaller nodes with larger ones when possible.
Has a built-in concept of “NodePools” which can be used to group nodes together and apply different constraints to them. This can be used to run different workloads on different types of nodes.
Has a tight integration with Kubernetes through its own Custom Resources Definitions (CRDs) and controllers.
Comes with insights on cost, performance, utilisation through its own set of Grafana dashboards and metrics

Karpenter is a good fit for OpenFaaS because it can be used to scale the cluster based on the load of the functions. This can be used to save costs by removing nodes when functions are not used and adding nodes when functions are scaled up. Karpenter can also be used to run functions on different types of nodes based on their requirements such as whether they should run on spot instances, on-demand instances or on nodes with GPU resources.

In a future article, we’ll show you how to combine everything we’ve covered here with scale to zero GPUs, for use with functions that require GPU resources like Large Language Models (LLMs) or audio transcription.

What makes Karpenter a good match for OpenFaaS?

Improved scalability

Karpenter will provision and deprovision nodes automatically based upon real-time workload requirements to ensure there is enough capacity in the cluster. It can increase the size of the cluster when more functions are deployed or when a function has high demand and is scaled up by the OpenFaaS autoscaler.

Cost optimization

Make use of spot instances - Karpenter can use spot instances to reduce cost for workloads that tolerate interruptions It can easily blend them with on-demand instances to ensure each workload runs on the right type of instance.
Right sized nodes - Nodes are automatically provisioned with optimal resources to match workload demands and avoid over-provisioning.
Combining workloads onto fewer nodes. Pods are moved to different nodes and packed together if workload demands change, allowing Karpenter to remove unused nodes.

Operational simplicity

Karpenter simplifies node management. Scaling a Kubernetes cluster often requires pre-configured node groups, scaling rules or manually adding nodes. Karpenter eliminates this by dynamically selecting the right instance types and adding them to the cluster based on a flexible NodePool configuration. Unlike node group-level autoscalers, Karpenter makes scaling decisions based on the entire cluster’s needs.

How to Deploy OpenFaaS and Karpenter on AWS EKS

In the following sections we will run you through the steps to get a basic OpenFaaS deployment with Karpenter running on EKS.

eksctl is used to create a new EKS cluster. If you have an existing EKS cluster or are switching from Kubernetes Cluster Autoscaler we recommend you to take a look at the Karpenter Migration guide for the Initial installation of Karpenter.

Prerequisites

Tools that need te be installed on your system to follow along with this guide. Most of these are available in arkade, which is an easy way to install common CLIs for developers.

AWS CLI
eksctl, CLI for AWS EKS - arkade get eksctl
kubectl - arkade get kubectl
Helm - arkade get helm

Create an AWS EKS cluster

The instructions are based on the Karpenter gettings started guide. You can check out this guide for a more in depth overview of how to get started with Karpenter.

Env variables

Set the Karpenter and Kubernetes versions:

export KARPENTER_NAMESPACE="kube-system"
export KARPENTER_VERSION="1.1.1"
export K8S_VERSION="1.31"

Set the cluster name, region and account id:

export CLUSTER_NAME="openfaas"
export AWS_PARTITION="aws"
export AWS_DEFAULT_REGION="us-east-1"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export TEMPOUT="$(mktemp)"

Create infrastructure dependencies for Karpenter

Karpenter requires IAM permissions to provision nodes and SQS for interruption notifications. Use Cloudformation to set up infrastructure needed by the Karpenter. See the Karpenter CloudFormation reference for a complete description of what cloudformation.yaml does.

curl -fsSL https://raw.githubusercontent.com/aws/karpenter-provider-aws/v"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml  > "${TEMPOUT}" \
&& aws cloudformation deploy \
  --stack-name "Karpenter-${CLUSTER_NAME}" \
  --template-file "${TEMPOUT}" \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides "ClusterName=${CLUSTER_NAME}"

Create the cluster

We will be creating a cluster with only a single static node in this demo. For production clusters it is recommended to run more than one node in the system node group. The Karpenter Helm chart for example tries to deploy 2 replicas of its controller to different nodes by default, OpenFaaS recommends running 3 replicas of some components. For production it is recommended to run at least 2 nodes or more.

Create a cluster config that can be used with eksctl.

With this config eksctl will:

Create a Kubernetes service account and AWS IAM Role, and associate them using IAM Roles for Service Accounts(IRSA) to let Karpenter launch instances.
Add the Karpenter node role to the aws-auth configmap to allow nodes to connect.
Create an AWS EKS managed node group for the kube-system, karpenter and openfaas namespaces.

cat > clusterconfig.yaml <
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: ${CLUSTER_NAME}
  region: ${AWS_DEFAULT_REGION}
  version: "${K8S_VERSION}"
  tags:
    karpenter.sh/discovery: ${CLUSTER_NAME}

iam:
  withOIDC: true
  podIdentityAssociations:
  - namespace: "${KARPENTER_NAMESPACE}"
    serviceAccountName: karpenter
    roleName: ${CLUSTER_NAME}-karpenter
    permissionPolicyARNs:
    - arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}

iamIdentityMappings:
- arn: "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}"
  username: system:node:{{EC2PrivateDNSName}}
  groups:
  - system:bootstrappers
  - system:nodes
  ## If you intend to run Windows workloads, the kube-proxy group should be specified.
  # For more information, see https://github.com/aws/karpenter/issues/5099.
  # - eks:kube-proxy-windows

managedNodeGroups:
- instanceType: m5.large
  amiFamily: AmazonLinux2
  name: ${CLUSTER_NAME}-ng
  desiredCapacity: 1
  minSize: 1
  maxSize: 4

addons:
- name: eks-pod-identity-agent
EOF

Create a new EKS cluster using the cluster configuration:

eksctl create cluster -f clusterconfig.yaml

Create a role on AWS to allow access to spot instances.

Unless your AWS account has already onboarded to EC2 Spot, you will need to create the service linked role to avoid the ServiceLinkedRoleCreationNotPermitted error.

aws iam create-service-linked-role --aws-service-name spot.amazonaws.com || true
# If the role has already been successfully created, you will see:
# An error occurred (InvalidInput) when calling the CreateServiceLinkedRole operation: Service role name AWSServiceRoleForEC2Spot has been taken in this account, please try a different suffix.

Deploy Karpenter

Create a Helm configuration file, karpenter-values.yaml, for Karpenter:

cat > karpenter-values.yaml <
# Run a single replica, if your cluster has more static nodes this value
# can be increased.
replicas: 1

settings:
  clusterName: ${CLUSTER_NAME}
  interruptionQueue: ${CLUSTER_NAME}

resources:
  requests:
    cpu: 0.5
    memory: 512Mi
  limits:
    cpu: 1
    memory: 1Gi
EOF

We halved the cpu and memory requests to reduce the number of nodes required for this demo. If you deploy Karpenter in production it is recommended to increase these requests.

Deploy Karpenter with Helm:

# Logout of helm registry to perform an unauthenticated pull against the public ECR
helm registry logout public.ecr.aws

helm upgrade --install karpenter \
    oci://public.ecr.aws/karpenter/karpenter \
    --version "${KARPENTER_VERSION}" \
    --namespace "${KARPENTER_NAMESPACE}" \
    --create-namespace \
    -f karpenter-values.yaml \
    --wait

Deploy OpenFaaS Standard

We are going to use OpenFaaS Standard here, however the instructions are the same for OpenFaaS for Enterprises.

Detailed installation instructions including the various chart configuration options are available in the OpenFaaS docs and Helm chart.

For the OpenFaaS configuration we start from the recommended configuration, then add a nodeAffinity rule. The nodeAffinity will make any Pods needed for the OpenFaaS core components run on AWS managed nodes, for stability. Whilst the control-plane can run in High Availability (HA), and tolerate node disruption, it’s recommended to run the core components on static nodes.

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: karpenter.sh/nodepool
              operator: DoesNotExist

The rule makes sure that the components will not run on nodes managed by Karpenter. We detect this by looking for the absence of the karpenter.sh/nodepool label.

Create a openfaas-values.yaml file with the following content:

cat > openfaas-values.yaml <
openfaasPro: true
clusterRole: true

operator:
  create: true
  leaderElection:
    enabled: true

gateway:
  replicas: 1

  # 10 minute timeout
  upstreamTimeout: 10m
  writeTimeout: 10m2s
  readTimeout: 10m2s

autoscaler:
  enabled: true

dashboard:
  enabled: true

queueWorker:
  replicas: 1

queueWorkerPro:
  maxInflight: 50

queueMode: jetstream

nats:
  streamReplication: 1

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: karpenter.sh/nodepool
              operator: DoesNotExist
EOF

We’ve also set the replica count for the queueWorker and gateway to 1, instead of their defaults to make everything fit into a single node for the purposes of the demo.

Create the namespaces for the OpenFaaS core components, and one for the functions:

kubectl apply -f https://raw.githubusercontent.com/openfaas/faas-netes/master/namespaces.yml

You will now have the openfaas and openfaas-fn namespaces.

Create a secret for your OpenFaaS license:

kubectl create secret generic \
  -n openfaas \
  openfaas-license \
  --from-file license=$HOME/.openfaas/LICENSE

Deploy OpenFaaS using the Helm chart:

helm repo add openfaas https://openfaas.github.io/faas-netes/
helm repo update

helm upgrade openfaas \
  --install openfaas/openfaas \
  --namespace openfaas \
  --values=openfaas-values.yaml

Verify the installation

Once all the services are up and running, log into the gateway using the OpenFaaS CLI faas-cli.

Usually, the OpenFaaS gateway is exposed over HTTPS using a LoadBalancer service, but for the purposes of the demo, we are going to keep the gateway service as a ClusterIP, so it’s private and hidden. You can use kubectl port-forward to access the gateway:

kubectl port-forward -n openfaas svc/gateway 8080:8080 &

PASSWORD=$(kubectl get secret -n openfaas basic-auth -o jsonpath="{.data.basic-auth-password}" | base64 --decode; echo)

echo -n $PASSWORD | faas-cli login -u admin --password-stdin
faas-cli version

Create a Karpenter NodePool

Karpenter only starts creating nodes when there is at least one NodePool configured using the NodePool CRD. A NodePool sets constraints on the nodes that can be created and the Pods that can run on those nodes. Karpenter makes scheduling and provisioning decisions based on Pod attributes such as labels and affinity. A single Karpenter NodePool is capable of handling many different Pod shapes.

We will create default NodePool and NodeClass Custom Resources that are capable of handling function Pods and any other Pods deployed to the cluster.

Create a NodeClass

Node Classes enable configuration of AWS specific settings like the AMIs for Karpenter to use when provisioning nodes. Each NodePool must reference an EC2NodeClass using the spec.template.spec.nodeClassRef field in the spec.

export ARM_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-arm64/recommended/image_id --query Parameter.Value --output text)"
export AMD_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2/recommended/image_id --query Parameter.Value --output text)"

cat > default-nodeclass.yaml << EOF
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2 # Amazon Linux 2
  role: "KarpenterNodeRole-${CLUSTER_NAME}" # replace with your cluster name
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  amiSelectorTerms:
    - id: "${ARM_AMI_ID}"
    - id: "${AMD_AMI_ID}"
EOF

Review the Karpenter NodeClass documentation for more information.

Create a NodePool

cat > default-nodepool.yaml <
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      expireAfter: 720h # 30 * 24h = 720h
  limits:
    cpu: 100
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
EOF

The spec.template.spec.requirements block is used to define constraints for the type of nodes that can be created. For the default pool we restrict Karpenter to only consider a set of on-demand general purpose instances suitable for most function workloads.
The limits block limits capacity to 100 CPUs. This means the NodePool will stop adding nodes to the cluster when the sum of all capacity created has reached this limit.
The disruption block configures how Karpenter manages nodes and moves workloads around. In this case with consolidationPolicy set to WhenEmptyOrUnderutilized, Karpenter will try to reduce cluster cost by removing or replacing nodes if they are underutilized or empty. The consolidateAfter specifies the amount of time Karpenter waits after identifying consolidation opportunities before acting on them.

Review the Karpenter NodePool documentation for more information.

Add the NodeClass and NodePool to the cluster:

kubectl apply -f default-nodeclass.yaml
kubectl apply -f default-nodepool.yaml

Demo: See Karpenter add nodes due to load on a function

Deploy an OpenFaaS function with OpenFaaS autoscaling labels. We are going to invoke the function with a load generation tool called hey, so that the OpenFaaS autoscaler observes the load, and adds more replicas. The higher replica count will not fit within our cluster resources and we should see Karpenter provision an additional node to run the new replicas. After the function scales down again the node will get removed.

faas-cli store deploy sleep \
  --label com.openfaas.scale.max=5 \
  --label com.openfaas.scale.target=5 \
  --label com.openfaas.scale.type=capacity \
  --label com.openfaas.scale.target-proportion=1.0 \
  --cpu-request=1

Note that we set a high cpu request value on the function. This will increase the resources requested and trigger Karpenter to provision a new node when the function is scaled up by the autoscaler.

Run the following command before you start to invoke the function to observe what is happening.

Watch function Pods in one terminal:

kubectl get pods -n openfaas-fn -o wide -w

Watch nodes in a second terminal:

kubectl get nodes -w

Watch the logs for the Karpenter controller in a third terminal:

kubectl logs -f -n "${KARPENTER_NAMESPACE}" -l app.kubernetes.io/name=karpenter -c controller

You can download hey with arkade get hey, or by building it from source using Go. We recommend you use the binaries we built and provide via arkade.

Invoke the function with hey to trigger autoscaling:

hey -t 120 -z 3m -c 25 \
  http://127.0.0.1:8080/function/sleep

The parameters used in the hey command are:

-t 120 - Use a 120s timeout for any requests just in case there are any nodes that need to be added and a longer cold start is required
-z 3m - Run the test for 3 minutes, note the duration is expressed as a Go duration string with a suffix i.e. s or m
-c 25 - Use 25 concurrent connections to generate load

After the function is scaled down the node should be reclaimed.

eks-node-viewer showing a c6a.large node gets replaced with a bigger c6a.2xlarge instance to satisfy the resource requests when the sleep function is scaled up to 5 replicas. The c6a.2xlarge instance is replaced by a cheaper c6a.large instance again after the function has scaled back down to 1 replica.

OpenFaaS dashboard showing the replicas of the sleep function alongside the Karpenter dashboard where you can see a node gets replaced by a higher capacity node when the function scales up. The node is replaced with a low capacity node again after the function scales down.

Demo: Advanced scheduling with affinity and OpenFaaS Profiles

Karpenter makes scheduling and provisioning decisions based on attributes such as resource requests, affinity, tolerations, nodeSelector and topology spread. Only the configuration of resource requests is supported through the OpenFaaS function spec. To set any of the other configuration attributes OpenFaaS has the concept of Profiles.

Profiles allow for advanced configuration of function deployments on Kubernetes and allow you to easily apply the configuration to multiple functions. Profiles can be used to configure tolerations, nodeAffinity, etc. See: the OpenFaaS profiles docs for all configuration options.

All profiles need to be created in the openfaas namespace, and are generally managed by cluster administrators. They can then be selected by a function by adding the com.openfaas.profile annotation with the name of the Profile to the function. Multiple profiles are supported with a comma separated list.

To get Karpenter to schedule functions to the default NodePool we need to set NodeAffinity on the function Pods for the functions NodePool.

Create a functions Profile that has a nodeAffinity rule to constrain function Pods to the default NodePool:

cat > functions-profile.yaml <
kind: Profile
apiVersion: openfaas.com/v1
metadata:
  name: functions
  namespace: openfaas
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: karpenter.sh/nodepool
              operator: In
              values:
              - default
EOF

Apply the Profile in the OpenFaaS namespace:

kubectl apply -f functions-profile.yaml

Deploy a function from the OpenFaaS store and apply the functions profile.

faas-cli store deploy nodeinfo --annotation "com.openfaas.profile=functions"

After deploying the function Karpenter will try to schedule it to a node in the default NodePool. If there are no nodes or there is not enough capacity in the NodePool to deploy the function, Karpenter will add an extra node or replace the node with a higher capacity one.

Demo: Schedule functions to spot instances with a Profile

If one or more of your functions can tolerate being interrupted or cancelled due to a node being reclaimed, then you could consider using spot instances. Spot instances are available up to a 90% off compared to On-Demand pricing. If you have a mixed workload of functions that can tolerate interruptions and functions that can not we recommend creating a separate NodePool for functions that can use spot nodes.

Create a spot NodePool:

cat > spot-nodepool.yaml <
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: spot
  annotations:
    kubernetes.io/description: "NodePool for provisioning spot capacity"
spec:
  template:
    spec:
      taints:
        - key: karpenter.sh/capacity-type
          value: "spot"
          effect: NoSchedule
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      expireAfter: 720h # 30 * 24h = 720h
  limits:
    cpu: 100
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
EOF

kubectl apply -f  spot-nodepool.yaml

This NodePool is almost identical to the default pool that was created earlier. Except for two changes. The karpenter.sh/capacity-type requirement was changed to spot and a taint was added. This taint should prevent running any Pods that do not explicitly tolerate it on spot instance nodes.

To run functions on spot nodes a toleration needs to be added to the function deployment. This can be done by creating a spot-functions Profile.

cat > spot-functions-profile.yaml <
kind: Profile
apiVersion: openfaas.com/v1
metadata:
  name: spot-functions
  namespace: openfaas
spec:
  tolerations:
  - key: "karpenter.sh/capacity-type"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"
EOF

kubectl apply -f spot-functions-profile.yaml

This Profile can now be applied to any function that can tolerate interruptions when a spot instance gets reclaimed.

faas-cli store deploy nodeinfo \
  --annotation "com.openfaas.profile=spot-functions"

Demo: Don’t pay for idle - scale both functions and nodes to zero

The OpenFaaS autoscaler can scale idle functions to zero. When scaled to zero, functions do not consume CPU or memory. Functions are automatically scaled back up again upon first use.

Scale to zero can save money because fewer nodes are required in a cluster and the available resources are used more efficiently. When combined with Karpenter cost savings could be even higher. When functions are scaled to zero Karpenter removes underutilized nodes from the cluster or replaces a bigger node with a cheaper smaller one if the capacity is not required.

When all functions using a NodePool are scaled to zero this means Karpenter will remove all nodes until there are requests for the function again bringing down the node cost to 0 while functions are idle.

Conceptual diagram showing how an underutilized node get removed from the cluster by Karpter when the sleep function running on the node gets scaled down to zero replicas by OpenFaaS.

Longer cold starts

The latency between accepting a request for an unavailable function and serving the request is often called a “cold start”. The cold start time for OpenFaaS functions can vary based on your cluster and the size of the function image but is usually not more than a few seconds. When using Karpenter you have to take into account the cold start can be significantly longer if a new node has to be provisioned. During our testing we saw that it took around 45-50 seconds for a function to become ready on on-demand nodes and around 70 seconds for nodes running on spot instances.

Whatever you are using to invoke the function will need to be able to handle these longer start-up times by either setting a higher timeout for request or supporting retries. Alternatively functions can be invoked asynchronously to gracefully handle longer cold starts by retrying functions for you, and decoupling the request and response from the caller.

Functions can generally scale up very quickly if the images are not too large, and you have some left over capacity aka headroom within the cluster. Unfortunately, headroom is currently not supported by Karpenter but there is an issue tracking this feature request. As an alternative you could enure there is always some spare capacity in the static node group running the core components or create an separate EKS Managed Node Group with some nodes to provide this basline capacity.

To learn more about cold-starts and how to minimise them in OpenFaaS, read: Fine-tuning the cold-start in OpenFaaS on the blog.

Clean up after the demo

If you leave the demo running, you will incur charges on AWS, so it’s important to clean up when you no longer want to run the cluster.

Run the following in this order:

helm uninstall openfaas --namespace openfaas

helm uninstall karpenter --namespace "${KARPENTER_NAMESPACE}"

aws cloudformation delete-stack --stack-name "Karpenter-${CLUSTER_NAME}"

aws ec2 describe-launch-templates --filters "Name=tag:karpenter.k8s.aws/cluster,Values=${CLUSTER_NAME}" |
    jq -r ".LaunchTemplates[].LaunchTemplateName" |
    xargs -I{} aws ec2 delete-launch-template --launch-template-name {}

eksctl delete cluster --name "${CLUSTER_NAME}"

Finally, then check for any related resources in the region such as EC2 instances, or LoadBalancers and remove them.

Conclusion

In this blog post, we set out to show you how to save costs on AWS EKS with OpenFaaS and Karpenter. Karpenter also provides a more modern, and dynamic approach to scaling nodes which also supports various types of NodePools, and scaling them to zero, something you don’t get with alternative solutions. We were able to combine the two so you don’t have to pay for idle resources.

We walked though the steps required to deploy OpenFaaS and Karpenter on an EKS cluster that we provisioned using eksctl. If you have an existing cluster, we also shared a link to the Karpenter documentation for migrating from the Kubernetes Cluster Autoscaler.

Then we showed how OpenFaaS and Karptner can be used together in several demos:

We showed how Karpenter can remove underutilized nodes to save cost by scaling functions to zero.
Asynchronous invocations can be used to handle invocations in a more reliable way. They can help to handle delays and retries when functions take longer to become ready because a new node has to be provisioned.
We used OpenFaaS Profiles to configure scheduling constraints for functions using affinity and tolerations. In this post we created two NodePools for on-demand and spot instances which should be a good starting point for most OpenFaaS clusters. Of course there is a lot more to explore like scheduling based on node resources like GPU or high availability for functions using topology spread. See the Karptner documentation for more advanced scheduling techniques.

In a next article we will show you how OpenFaaS and Karpenter can be used to efficiently run functions that need GPU resources by building on the configuration and features discussed in this article, like scale to zero and OpenFaaS Profiles.

Reach out to us if you’d like a demo, or if you have any questions about OpenFaaS on AWS EKS, or OpenFaaS in general.

How to Build & Integrate with Functions using OpenFaaS

2025-01-13T00:00:00+00:00

OpenFaaS is a developer-friendly platform for creating portable functions that can run on any cloud through Kubernetes. In this blog post, you will learn about use-cases, language templates, differences from traditional applications and cloud-based functions, how functions can be triggered, scaled and observed, and how to get started with OpenFaaS.

Introduction

Functions provide a quick and easy way to build functionality for both new and existing applications. Rather than having to think about boiler-plate code like Dockerfiles, HTTP servers, metrics collection, and scaling, these things are built-in, and the Function and its event sources become your new focus.

We’ll enumerate various common use-cases that we’ve seen from customers over the years. This is not an exhaustive list, but if you see something that resonates with your workloads, then you may be in the right place.

Everyone uses functions differently, so to explore OpenFaaS further for your own needs, you can browse the OpenFaaS documentation, and read past articles on the blog.

There are various ways to integrate with OpenFaaS - directly from your team’s UI or API, from a third party sending HTTP requests, or uploading data to S3 buckets, to cron, to a Kafka trigger from another part of your organisation.

Use-cases for Functions

What kinds of use-cases suit Functions?

Functions in OpenFaaS are simply pieces of code written in a language of your choice, that run in response to an event or a trigger, so they suit many kinds of use-cases.

Below is a simplified function which accepts an MP3 file via a HTTP body, transcodes the audio to text using OpenAI’s Whisper, and returns the text as a response. This function could be used to transcribe podcasts, or to generate subtitles for videos:

import tempfile
from urllib.request import urlretrieve

import whisper

def handle(event, context):
    models_cache = '/tmp/models'
    model_size = "tiny.en"

    url = str(event.body, "UTF-8")
    audio = tempfile.NamedTemporaryFile(suffix=".mp3", delete=True)
    urlretrieve(url, audio.name)

    model = whisper.load_model(name=model_size, download_root=models_cache)
    result = model.transcribe(audio.name)
    
    return (result["text"], 200, {'Content-Type': 'text/plain'})

Here are some common examples from our customers:

Transformation and processing

Extract, Transform, Load (ETL) - e.g. updating customer data, importing datasets from partners, encrypting/decrypting customer data
Security and analysis - e.g. scanning for vulnerabilities, detecting fraud, filtering spam
Data processing - categorising, summarising, filtering, and transforming data - perhaps with a Large Language Model (LLM) like LLama or OpenAI
Transcoding of audio and video - e.g. Run ffmpeg to extract metadata, create thumbnails, or OpenAI Whisper to produce transcripts and subtitles
Converting file formats and PDF generation - Convert between different file formats and generate PDFs for reports or invoices

Automation and integration

Event-driven automation - responding to events from a queue, database, or an API
Third-party data integration / imports - importing data from third parties or partners, or exporting datasets
Managing cloud infrastructure - e.g. creating, updating, or deleting resources in the cloud in response to events or support tickets
Email/SMS integration - responding to inbound messages, sending out notifications and alerts
Web scraping - e.g. extracting data from websites, or monitoring for changes using a headless browser or an HTML parser

DevOps and Support

Providing customer support - through the use of LLMs, chatbots, and by automating common user requests
Monitoring and alerting - e.g. sending alerts to Slack, PagerDuty, or OpsGenie, and taking remedial action if applicable
Internal portals / User interfaces - build internal user interfaces or portals for your team to interact with your services and internal APIs
Maintenance and back-office operations - e.g. triggering backups, cleaning up database indexes, running scheduled reports, etc

We’ll explore this more below, but OpenFaaS also provides a built-in asynchronous invocation mechanism and queue-worker. It’s great for long-running tasks, or tasks that may need to be retried a number of times. You can use it to fan out requests to process massive amounts of data in parallel and to chain functions together to build pipelines.

Here’s an example: every day you receive a CSV with up to 1000 rows of data, each row is the URL of a podcast episode, which you need to download, and then transcribe using a GPU and OpenAI’s Whisper. One asynchronous invocation can be fanned out per row, running in parallel with the amount of GPUs available, then the results will be stored in a relational database for later retrieval through your customer portal.

Hear from a customer

Kevin Lindsay is a Principal Engineer at Surge. Surge provides mortgage industry data and financial information on applicants for loans via Salesforce.

“We first adopted OpenFaaS Pro in 2021 because we wanted a way to write code without having to think about Kubernetes. The initial set of functions that we wrote needed to import mortgage data from various sources, transform it, then store it in AWS S3, ready for ingestion via Snowflake. It needed for run for an hour or more, which made OpenFaaS with its asynchronous queue-worker an obvious choice for us. What would have taken a week or so, was tested and promoted to production within a few hours.

Several years later and we’re still using OpenFaaS for much of our application and internal tools, where possible we’ve moved services and containers off Kubernetes and to OpenFaaS to make it easier to iterate on our platform.”

Kevin can often be found at the weekly Office Hours call, where he shares feedback and helps other users.

Differences to traditional applications

Functions make it possible to extend your own application without risking the stability of the core system. They can be written in a different language, and they can be scaled independently of the rest of your application.

This means that if your team writes mainly in a traditional language like Java, you can use Python, Node.js, or Go when it’s most appropriate for instance, Python is often coupled with data processing and machine learning, Go is often used for cloud-native APIs like Kubernetes, and Node.js is often used with React for building web applications and portals.

Portability

Unlike cloud-based functions like AWS Lambda, or Google Cloud Functions, When you write a function with OpenFaaS, it doesn’t just run on that one cloud, but it can run on any cloud, or on-premises without any changes to the code or its configuration. Functions get built into container images, and are deployed to a Kubernetes cluster, which can be on-premises, in the cloud, or at the edge - wherever you need it.

Customers have told us that portability was a key factor in choosing OpenFaaS:

Their application is run as a central SaaS, however certain customers needed a dedicated installation within an airgap or private datacenter
They enjoyed a large amount of credits from a single cloud vendor, but when they ran out they wanted to move to another cloud
The cloud they picked originally did not meet their evolving needs with certain aspects such as availability, support, or in managed services
They had an immediate need to run on-premises or in custom datacenters, meaning cloud functions were not an option

Some fear getting locked-in to a single cloud vendor, for others there are other driving factors like the developer experience, or being able to set custom timeouts. For instance, many OpenFaaS customers run on AWS Elastic Kubernetes Service (EKS) despite AWS Lambda being available to them.

Cost & configuration

Cloud based functions are very convenient, and have very little cost until you start to use them regularly, but there are some common complaints about their lack of configuration. Whilst some platforms have added more options and increased limits, it is not consistent, and every platform is different.

There is often a hard limit on the maximum timeout, which cannot be changed
There may be a maximum size of the payload that can be accepted for synchronous or asynchronous invocations
There can be limits on concurrency
Functions often scale to zero and are expensive when kept warm, or provisioned with static concurrency
It can be tricky to install custom packages, especially ones which are compiled natively
Some platforms limit the runtimes or templates that you can use
The cost of the function increases depending on the amount of RAM required
Most cloud-based functions platforms do not support GPUs or faster hardware
Teams that are already using Kubernetes want to keep consistency with tooling like ArgoCD or Helm for both their applications and their functions

Over the past few years we’ve added most of the configuration options that users have wanted to exist in cloud-based functions, whilst trying to keep the amount of things you need to change to get started to an absolute minimum. As for the cost of running OpenFaaS, this is a static cost per Kubernetes cluster, so you can use as much RAM, CPU, many replicas, and invocations as you like without worrying about incurring additional cost.

Developer Experience

The earliest focus for OpenFaaS was on Developer Experience, and we’ve worked hard to keep that as a central theme.

We want the product to be easy to install, simple to operate, and to provide a great experience for developers. This means that if you have a Kubernetes cluster set up, there is no reason why you cannot have a production-ready function deployed and running within a few minutes.

Code can be written with your familiar IDE such as VSCode, making use of plugins like GitHub’s Co-pilot to help you write the code faster, and to scaffold repetitive tasks, then you can test your code locally with the same OpenFaaS code that will run in production. You can run the same OpenFaaS code directly on your laptop through a local Kubernetes distribution like KinD, Docker Desktop, MiniKube, or K3d, etc.

When we first started OpenFaaS, we wrote language templates for anyone who asked, and community users could write their own as required. These days we have reduced the set of formally supported languages, to make sure we have enough time to document, support, and test them, however if you see something missing - you can adapt an existing template, or write your own in a short period of time. If you’re not sure how, we can provide help and direction.

A template consists of parts users do not see, and are not changed:

A Dockerfile - with either one base image, or starting off with a larger build time image, switching to a smaller runtime image for production
An entrypoint - the code that is run when the container starts - main.py, index.js, main.go, etc. This is responsible for setting up a HTTP server on a known port

And parts users see when they run faas-cli new:

A handler - the code that is run when a request comes in, this is where you write your business logic.
A dependency manifest - a requirements.txt, package.json, or go.mod file that lists the dependencies for your function

The Dockerfile should invoke the tool of choice to install the packages, and then copy the handler and dependencies into the image. The entrypoint should then import and run the handler.

Templates can be obtained via the faas-cli template store pull command, providing a name of a function in the built-in store, or a URL to a Git repository. You can also create your own Function Store JSON manifest file, like the official one.

View the Official templates here.

Calling Functions via HTTP

Whenever you deploy a function to OpenFaaS, it will get its own HTTP endpoint via a new Path available on the OpenFaaS Gateway of /function/NAME.

Example invocation through Ingress and the OpenFaaS gateway

Within the Kubernetes cluster, the path will be as follows:

http://gateway.openfaas:8080/function/NAME

For asynchronous invocations, the path changes slightly, and the required HTTP method is a HTTP POST:

http://gateway.openfaas:8080/async-function/NAME

You can retrieve the HTTP request details like the Header, Path, Query string, and Body within the handler of your function, and then you can set the HTTP response status code, headers, and body for the response.

Every template is different, and you’ll find instructions in the documentation, but here’s a sample handler for the Python HTTP function.

It examines the HTTP method, and returns a 405 status code if the method is not a GET:

def handle(event, context):
    if event.method == 'GET':
        return {
            "statusCode": 200,
            "body": "GET request"
        }
    else:
        return {
            "statusCode": 405,
            "body": "Method not allowed"
        }

Both synchronous and asynchronous invocations will return a unique identifier that can be tied into access and error logs, and consumed by the function via the X-Call-Id header.

In addition, asynchronous functions can register a one-time callback URL, which will receive a webhook when the invocation is complete: i.e. http://example.com/webhook or http://gateway.openfaas:8080/function/RECEIVER_FUNCTION.

Asynchronous functions are invoked through a queue, and are retried a number of times until they succeed, or the maximum number of retries is reached. This is useful for long-running tasks, or tasks that may fail due to transient errors.

During setup, the OpenFaaS gateway can be kept internal or made to be Internet facing with a HTTPs certificate and Kubernetes Ingress. If you don’t want to expose all of your functions, you can be more selective and only expose certain functions with a custom path or custom domain.

In terms of event-sources, the direct HTTP call is the easiest way to integrate with OpenFaaS. If you need the result immediately, you can use a synchronous invocation, if you need the result later, you can use the X-Callback-Url header and correlate the result with the original X-Call-Id that was returned.

Triggering Functions from events

Functions can also be invoked on a schedule using the OpenFaaS Cron connector, for instance if a certain function needs to be invoked once every 30 minutes to check the price of a stock, or once per week to send reports to customers. The cron connector sends a null body to the function, so the function could read its parameters from a database, or some external API.

Let’s say you wanted to invoke a function called customer-checker once every 30 minutes, but you had to run it for 3 customers, you could use Kubernetes Cron to pass in some additional data like a HTTP body, or an extra HTTP header, this could then be used in the function’s handler to determine which customer to check.

If you used the Cron Connector, you’d perhaps look up the list of customers from a database, or a configuration file then iterate over them in the function.

After HTTP, Async and Cron, the next set of event sources become very specific to your company and team, and what they may already be using or have experience with.

AWS SNS / AWS SQS - ideal for existing events or automating AWS infrastructure, i.e. handle the event of when an EC2 instance is created or a new object is Put into an S3 bucket
Apache Kafka - a very common enterprise event-source, often used to consume events from other parts of the organisation, or to publish events for integration purposes
Postgresql - a database trigger, or a change data capture (CDC) event, i.e. when a new row is inserted into a table, or when a row is updated
RabbitMQ - a message queue, often used for internal communication between services, or to decouple parts of the system
Webhooks - a common way to integrate with third-party services, or to receive events from a partner, i.e. when a new order is placed, or when a new user signs up. Webhooks simply use the existing HTTP endpoints, and often include their own authentication and security mechanisms

If you need to trigger your functions by another even source, let us know and we can provide it for you, or you can write your own.

What’s next?

This blog post focused on a single team or product that wanted to integrate functions into their existing application, or to build a new product with functions at its core. We covered some common use-cases, and how functions differ from traditional applications, and how they can be integrated into your product. We also covered differences vs traditional applications, and cloud-based functions, and why portability and developer experience are important for many of our customers.

Some OpenFaaS customers don’t build their own application with OpenFaaS, but allow others to extend their platform by providing source code for custom functions. These functions are sandboxed in containers in tenant namespaces, and are usually accepted through a cloud IDE built into your customer portal. If that’s what you were looking for, take a look at this post: Integrate FaaS Capabilities into Your Platform with OpenFaaS

Now how can you get started?

The easiest way to get started with OpenFaaS is to follow a blog post such as How to Build and Scale Python Functions with OpenFaaS. Even if your preferred language is something else like C# or Java, the principles, configuration, and commands are all the same. This is one of the benefits of using an opinionated platform vs. a DIY approach where every project or component can end up being subtly different.

I’ve also written a complete eBook with everything you need to get started with writing functions in Node.js called Serverless For Everyone Else, which you can purchase on Gumroad.

The Community Edition (CE) of OpenFaaS is free to use for personal use and for initial exploration, it does have some limits, so when you’ve had an initial exploration, you may want to try out OpenFaaS Standard on a monthly basis. You can find out more about the editions on the comparison page.

The following blog posts may also be of interest to you, and relate to some of the things we covered in this post:

For anything else, please get in touch via the Talk to us about OpenFaaS for your team form.

Trigger Your OpenFaaS Functions from RabbitMQ Queues

2024-12-04T00:00:00+00:00

Learn how to connect RabbitMQ to OpenFaaS to trigger functions from new and existing message queues.

Introduction

When distributed systems need to run work in the background, developers often turn to message brokers like RabbitMQ, commit-logs like Kafka, or pub/sub systems like NATS. These components decouple the request from the response, allowing for asynchronous processing and scaling out to handle huge amounts of work.

OpenFaaS supports event-driven architectures through the built-in asynchronous function concept, and through event connectors, to import events from external systems. In this post, we’ll explore how to use the RabbitMQ connector for OpenFaaS to trigger functions from RabbitMQ queues.

Whether you’re automating tasks, processing high-volume streams, or orchestrating microservices, this connector is designed to provide flexibility, security, and simplicity. It supports both standard RabbitMQ features and enhanced options like TLS authentication and custom Certificate Authorities (CAs).

In this post, we’ll explore:

Why RabbitMQ is a great fit for OpenFaaS.
How to set up the connector for secure, event-driven workflows.
Practical examples and best practices for integration.

This connector is available for OpenFaaS Standard and Enterprise editions, with commercial support making it suitable for production.

Above: example architecture.

The diagram above shows an example interaction between an existing system and a new OpenFaaS function. The role of the function is to provision a new customer record in the database, taking a JSON input containing a customer email as its input.

It gets triggered by the RabbitMQ connector, when a message is published to the activate topic, then when it has completed its work, publishes a response to the activated topic. The existing system would then consume this response and take further action such as sending a welcome email to the customer, or this could be handled by another function as part of a chain.

Why Choose RabbitMQ for OpenFaaS?

RabbitMQ is not the only option for OpenFaaS, but it is one of the most widely-used message brokers, trusted for its:

Reliability: Persistent queues and fault tolerance.
Flexibility: Broad protocol support and rich feature set.
Scalability: High throughput for real-time workloads.

By pairing RabbitMQ with OpenFaaS, you can:

Decouple producers and consumers for modular architectures.
Handle bursty workloads with RabbitMQ’s queuing capabilities.
Build workflows triggered by real-time events.

You can also get most of these benefits from using the built-in asynchronous system in OpenFaaS with NATS JetStream which supports persistence, at-least-once delivery, and retries.

When you configure a event-connector, you can decide whether you want the connector to make synchronous requests, or whether it should enqueue messages into NATS JetStream for asynchronous processing.

Getting Started with the RabbitMQ Connector

1. Install the Connector

Deploy the RabbitMQ connector using its Helm chart. Create a values.yaml file to define your RabbitMQ connection and queue subscriptions:

rabbitmqURL: "amqps://rabbitmq.rabbitmq.svc.cluster.local:5671"

queues:
  - name: queue1
    durable: true
    autodelete: false

Key configurations:

rabbitmqURL: Use amqps:// for TLS-secured connections, ensuring encrypted communication with the RabbitMQ broker.
queues: Define queues to subscribe to, including options like durability and auto-delete.
asyncInvocation: Set to true to enqueue all messages into NATS JetStream for asynchronous processing.

For advanced security, the connector supports:

Authentication: RabbitMQ credentials can be securely provided through Kubernetes secrets.
Custom Certificate Authorities (CAs): Use your internal CA for trusted communication between the connector and RabbitMQ.

You can find more details in the docs and the helm chart.

2. Annotate a Function

Just like any other connector for OpenFaaS, you can connect a function to your RabbitMQ queue by setting the topic annotation, with the name of the queue:

faas-cli store deploy printer \
  --annotation topic=queue1

The connector invokes this function whenever a message is published to queue1.

If you want a function to be invoked by more than one queue, then you can add them as a comma-separated list: --annotation topic=queue1,queue2.

3. Trigger an invocation from a queue

To publish a test message, you can use your existing infrastructure, an OpenFaaS function, or the RabbitMQ management CLI:

./rabbitmqadmin publish \
  routing_key="queue1" payload='Hello, Task Queue!' properties='{"message_id":"42"}'

Check the function logs to confirm message processing:

faas-cli logs printer

Example output:

faas-cli logs printer

2024-12-03T14:36:03Z X-Connector=[connector-sdk openfaasltd/rabbitmq-connector]
2024-12-03T14:36:03Z X-Topic=[queue1]
2024-12-03T14:36:03Z Accept-Encoding=[gzip]
2024-12-03T14:36:03Z Content-Type=[text/plain]
2024-12-03T14:36:03Z X-Call-Id=[d0ab9f9e-0c93-46b1-a4dd-695a037acb38]
2024-12-03T14:36:03Z 2024/12/03 14:36:03 POST / - 202 Accepted - ContentLength: 0B (0.0003s)
2024-12-03T14:36:03Z X-Forwarded-Host=[gateway.openfaas:8080]
2024-12-03T14:36:03Z X-Rabbitmq-Msg-Id=[1]
2024-12-03T14:36:03Z X-Start-Time=[1733236563468386644]
2024-12-03T14:36:03Z X-Forwarded-For=[10.42.0.13:55796]
2024-12-03T14:36:03Z X-Rabbitmq-Routing-Key=[queue1]
2024-12-03T14:36:03Z User-Agent=[openfaas-gateway/0.4.34]
2024-12-03T14:36:03Z 
2024-12-03T14:36:03Z Hello, Task Queue!
2024-12-03T14:36:03Z 

The connector passes RabbitMQ message metadata as HTTP headers to your function, including:

X-Topic The queue name that triggered the function.
X-Rabbitmq-Msg-Id - the message identifier.
X-Rabbitmq-Routing-Key - the routing key of the message.

Real-World Use Cases

Many of the tasks for RabbitMQ can be handled by NATS JetStream, so we’d recommend that you primarily use RabbitMQ to enqueue messages into NATS JetStream from existing systems, and applications with your company.

However, RabbitMQ, can be used for a variety of use-cases, including:

Task Orchestration: Automatically process and dispatch tasks, such as image resizing or video encoding, by publishing jobs to RabbitMQ queues.
IoT Data Pipelines: Collect and process high-frequency sensor data with functions triggered from RabbitMQ queues.
Order Processing: Integrate with e-commerce platforms to process and respond to customer orders in real-time.

Advanced Features and Best Practices

Custom Content Types: The default helm chart sets a Content-type header of text/plain to functions, but you can change this to application/json or other formats, in the values.yaml file.
Scaling: The RabbitMQ connector can be scaled by adjusting the amount of replicas running. You can do this manually as required, or by using Kubernetes’ built-in Horizontal Pod Autoscaler (HPA) to dynamically adjust connector capacity based on CPU/RAM, or queue activity.
Security: For environments with strict compliance requirements, ensure all communication with RabbitMQ is encrypted using TLS and trusted with a custom CA.

Conclusion

We released the RabbitMQ connector for teams and companies that already use RabbitMQ, and wish to trigger their OpenFaaS functions from these existing systems. If you’re approaching OpenFaaS and have no existing message broker in use such as AWS SQS, Apache Kafka, then we strongly recommend using the built-in NATS JetStream support for asynchronous processing, it provides a convenient HTTP API, and is built into every OpenFaaS installation.

Reach out to us if you’d like a demo, or if you have any questions about the RabbitMQ connector, or OpenFaaS in general.

OpenFaaS - Serverless Functions Made Simple

How to Protect Your Data with Self-Hosted LLMs and OpenFaaS Edge

Why Self-Hosted LLMs?

Build of materials for a PC

How to get started with OpenFaaS Edge

Install OpenFaaS Edge

Install the Nvidia Container Toolkit

Add Ollama to OpenFaaS Edge

Create a function to call the model

Deploy and invoke the function

Invoke the function asynchronously for durability and scale

Further work for the function

Conclusion

Eradicate Cold Emails From Gmail for Good With OpenAI and OpenFaaS

Prerequisites

Receive push notifications for mailboxes via Pub/Sub

Create a topic for Gmail notifications

Authenticate with the Gmail API

Configure Gmail to send notifications

Deploy the Google Cloud Pub/Sub connector

Handle Pub/Sub messages and filter spam emails

Classifying Emails using the OpenAI API

Deploy the function

Taking it further

Conclusion

How to Convert Scripts & HTTP Servers to Serverless Functions

What is a Serverless Function?

How do traditional programs work?

Listening on a port

Flags and arguments

Environment variables for configuration

Reading files from the filesystem

Writing state and files

Consuming secrets

External triggers and events

Scale to zero

Built-in queue / asynchronous invocations

Conclusion

Connect with us

How to Manage Stateful Services with OpenFaaS Edge

Introducing the new service commands

faasd service list

faasd service logs

faasd service top

faasd service restart

New features for the docker-compose.yaml file

How the docker-compose.yaml file works

How to update images in the docker-compose.yaml file

Conclusion

Want to try it out?

How to Integrate WebSockets with Serverless Functions and OpenFaaS

Two options for WebSocket support in a function

Option 1: Modify an existing template for WebSockets

Option 2: Package existing code as a function with a Dockerfile

Timeouts for WebSockets

Scaling WebSockets

Conclusion

Scale to zero GPUs with OpenFaaS, Karpenter and AWS EKS

Prerequisites

Prepare the cluster for GPU support

Schedule GPU nodes with Karpenter

Add a GPU node pool

Run a GPU accelerated function.

Tutorial: create a GPU accelerated function workflow.

Build production ready workflows

Conclusion

How to access APIs with OAuth tokens from OpenFaaS functions

Introduction

Example Function in Go

Last sale

Further work

Conclusion

Save costs on AWS EKS with OpenFaaS and Karpenter

How to Deploy OpenFaaS and Karpenter on AWS EKS

Prerequisites

Create an AWS EKS cluster

Deploy Karpenter

Deploy OpenFaaS Standard

Verify the installation

Create a Karpenter NodePool

How the `docker-compose.yaml` file works