Upload files to Azure blob storage container

Image by https://www.pexels.com/@apasaric/
Image by https://www.pexels.com/@apasaric/

We have a common use case when we need to upload a sizable file to Azure Blob Storage (or any form of Cloud storage) and then the file content is then processed by Azure Function App. This blog is about an anti-pattern and an effective way to handle this use case.

Setup

Create a resource group and create these services

  1. Azure Function app
  2. Azure Blob Storage

Enable the Managed Service Identity option for Azure Function App.

Next grant this identity to blob storage's Storage Blob Data Contributor role. This is an important step otherwise the generated Shared Access Signature (SAS) Token does have the right permissions.

Add Application Settings in Function App

The BLOB_CONTAINER_NAME and BLOB_STORAGE_URL are in the Python code later.
  • BLOB_CONTAINER_NAME is the container name. I have created one and I named it as "test".
  • BLOB_STORAGE_URL is like https://<you-blob-storage-account-name>.blob.core.windows.net/


Anti Pattern

When we allow the upload of files that are targeted to storage in cloud storage via Azure Function App (or any lightweight microservices), we face the following issues
  1. One extra hop. The file has to go to the function-app first and then to the target blob storage container.
    1. Making the upload process unnecessarily slow.
  2. Unnecessary memory consumption in the function-app for staging the data.


Using Shared Access Signature token

We shall share some code on how to generate Shared Access Signature Token (aka SAS Token).
# azure-identity
# azure-functions
# azure-storage-blob

import logging
import json
import os
import azure.functions as func

from datetime import datetime, timedelta
from azure.identity import ManagedIdentityCredential
from azure.storage.blob import AccountSasPermissions, BlobServiceClient, generate_blob_sas

BLOB_STORAGE_URL = os.environ["BLOB_STORAGE_URL"]
BLOB_CONTAINER_NAME = os.environ["BLOB_CONTAINER_NAME"]

def generate_sas_token(blob_name: str) -> str:
    """generate storage access token

    Args:
        blob_name (str): name of blob

    Returns:
        str: storage access token
    """
    msi_credential = ManagedIdentityCredential()
    blob_client = BlobServiceClient(
        account_url=BLOB_STORAGE_URL,
        credential=msi_credential)

    # Get user-delegated key
    user_key = blob_client.get_user_delegation_key(
        key_start_time = datetime.utcnow(),
        key_expiry_time = datetime.utcnow() + timedelta(minutes=5)
    )

    return generate_blob_sas(
        account_name=blob_client.account_name,
        container_name=BLOB_CONTAINER_NAME,
        blob_name=blob_name,
        user_delegation_key=user_key,
        permission=AccountSasPermissions(create=True, read=True),
        expiry=datetime.utcnow() + timedelta(minutes=5),
    )
    

def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Request SAS URL [Python HTTP trigger] called.')

    name = req.params.get('name')
    if name:
        try:
            token = generate_sas_token(name)

            result = {"sas_url": f"{BLOB_STORAGE_URL}{BLOB_CONTAINER_NAME}/{name}?{token}"}
            return func.HttpResponse(
                json.dumps(result),
                mimetype="application/json",
            )        
        except Exception as e:
            logging.error(e)
            return func.HttpResponse(str(e), status_code=400)

    return func.HttpResponse("name parameter value is required", status_code=400)

The code is very concise and self-explanatory. Here are a few things to note.

  1. ManagedIdentityCredential works because we have created the MSI (Managed Service Identity) for the function-app and assigned the Blob Data Contributor role to it.
  2. The function app will return the URL for uploading a file to Azure Blob Storage. We need to
    1. The generated URL expires in 5 minutes
    2. Use HTTP/1.1 PUT method on this URL
    3. Add x-ms-blob-type=BlockBlob to the header of the HTTP request.




Comments

Popular posts from this blog

OpenAI: Functions Feature in 2023-07-01-preview API version

Storing embedding in Azure Database for PostgreSQL

Happy New Year, 2024 from DALL-E