Scaling Queue Triggered Azure Functions

Last weekend I spent some time playing with queue driven Azure functions to learn more about how serverless functions scale under load. This post is a write up of some of my findings.

Azure Functions

Azure functions are part of the serverless offering in Azure. When I first heard about “serverless” the whole thing seemed a bit like a misnomer. The confusion is usually that “serverless” definitely doesn’t refer to running your code on fewer servers. Actually your code may run on many more servers, but the idea is that you don’t have to manage anything related to hardware provisioning, or even the hosting process. Instead you focus on creating a standalone function and leave the rest to Azure.

The most common Azure function types are queue triggered and http triggered functions. In my experiment I decided to go with queue triggered functions with Azure service bus queueing. There are other alternatives as well (e.g. storage queues), but I went with Azure service bus mainly because of the built-in session support. Session support is nice because it allows me to correlate the messages on the result queue back to the original request. In a way a session queue is a queue of queues since each session represents a sub queue in the main queue.

Scalability

One of the key features of Azure functions is automatic scaling to deal with spike loads. In the case of queue triggered functions this means increased computing power to deal with a large number of queue messages. Scalability varies with the selected Azure pricing tier. In my experiment I opted for the so called consumption tier, which offers a theoretical limit of 200 concurrent instances of the function. Basically this just means that Azure will examine the throughput of your queue and determine how many functions to “serverlessly” spin up to keep the queue moving.

This all sounds great! In practice though, scalability is a bit more nuanced. The algorithm for spinning up more instances is not public knowledge, but has something to do with the number of items on the queue in combination with message age. Despite being “serverless”, spinning up another server is not instant, so there is overhead every time Azure has to provision another instance. Azure will keep adding instances to keep the queue moving, but there is no official SLA on queue processing.

My experience is that Azure is actually quite reluctant to add more instances once you get to a certain point. In my case my functions seemed to plateau around 20 instances, which is quite a bit lower than the theoretical limit of 200 in the consumption tier. I tried to lure more functions out of hiding by flooding the queue with up to 200,000 messages, but despite my efforts, Azure was not feeling generous past the allocation of 20 server instances.

Experiment

My initial experiment consisted of a single queue triggered Azure function where I was running a simple cpu bound task and put the result on a result queue. The calculation I ran was Fibonacci(35) which executed in 1-2 seconds under load. Effectively this means 10000 input messages with corresponding function executions followed by a total of 10000 messages written to a result queue.

The code for my Azure function can be found below:

using System; using System.Configuration; using System.Diagnostics; using System.Text; using Microsoft.Azure.ServiceBus; using Microsoft.Azure.ServiceBus.Core; using Microsoft.Azure.WebJobs; using Microsoft.Azure.WebJobs.Host; using Microsoft.Extensions.Logging; using Newtonsoft.Json; namespace Math { public class FibonacciMessage { public string SessionId { get; set; } public int n { get; set; } } public static class Fibonacci { private static string ServiceBusConnectionString; private static string ResultQueueName; private static MessageSender sender; static Fibonacci() { ServiceBusConnectionString = System.Environment.GetEnvironmentVariable("Connection"); ResultQueueName = System.Environment.GetEnvironmentVariable("ResultQueueName"); sender = new MessageSender(ServiceBusConnectionString, ResultQueueName); } [FunctionName("Fibonacci")] public static async void Run([ServiceBusTrigger("%InputQueueName%", Connection = "Connection")]string queueItem, ILogger log) { var message = new Message(Encoding.UTF8.GetBytes(queueItem)); var msgObj = Encoding.UTF8.GetString(message.Body); var m = JsonConvert.DeserializeObject(msgObj); var sw = new Stopwatch(); sw.Start(); var result = CalculateFibonacci(m.n); sw.Stop(); var resultMessage = new Message(Encoding.UTF8.GetBytes($"Fibonacci sequence of {m.n} = {result}. Runtime: {sw.Elapsed}")); resultMessage.SessionId = m.SessionId; await sender.SendAsync(resultMessage); } public static long CalculateFibonacci(int n) { if (n == 0 || n == 1) { return 1; } return CalculateFibonacci(n - 2) + CalculateFibonacci(n - 1); } } }

Cold Start

The first time you hit your functions you pay a start up tax since the functions have to start up, which is part of the scaling algorithm. This is often referred to as a cold start since none of the functions are initially running. Cold starting my demo app resulted in a total runtime of about 5 minutes, which includes receiving all the messages from the result queue. Running the functions a second or third time shaved about a minute off the total runtime since the functions were already running. Still, as I mentioned earlier, the scaling did not get close to 200 instances. I was probably naive, but I was kind of hoping for of a divide and conquer effect with 200 Azure functions processing my 10000 queue items in parallel. Instead I had to settle for around 20.

Cold starts can be mitigated by moving to the premium pricing tier since it allows you to leave a certain number of functions always on. Basically, premium is a hybrid solution between the truly serverless consumption tier and static server allocation. However, the trade off here is that you have to pay a fixed fee for the statically allocated servers.

Manual Scaling

To create the desired divide and conquer effect I had to scale out my functions manually by defining multiple function apps instead of a single function app. In theory this should result in a lot better scaling since you now have many independent apps processing messages in parallel. As you would expect this made for a good performance improvement.

Queue Overhead

Multiple apps made for a big improvement, but I still wasn’t satisfied with the overall performance. I noticed considerable lag in clearing the result queue after the functions were done processing the input messages. As a next step, based on advice from this article, I ended up creating dedicated queues for each function, which drastically reduced the number of messages per queue. It also enabled me to receive messages from multiple queues in parallel. In my final setup I created 15 unique function apps with dedicated input and output queues. As a result I was able to reduce the total runtime to just under 30 seconds after function warm-up! The code sample above shows how to bind dynamic queue names from config to the different instances of my function apps.

A few things to note: Despite multiple result queues, the default performance of a single queue receives is still slow if you process a single message at the time. It's important to leverage the part of the subscription api that allows you to receive multiple messages per channel. Message write performance was fast in my example since the payload was tiny, but writes can become a performance bottleneck with large messages. For my writes I used a round robin algorithm to evenly distribute the messages across my queues.