Posts

Showing posts from February, 2023

Discarding some pages in PDF documents in Azure Blob Storage Container.

Image
Image by https://www.pexels.com/@pixabay/ I was looking at how to discard some pages in a PDF document and generate a new PDF file without them. There are many applications that I can download to perform this task. And, there are also ways to do this online. Doing this online is not an option because PDF documents contain confidential information. The applications running on my MacBookPro are also not an option because my PDF documents are in Azure blob storage's container. Furthermore, there are many PDF documents that I need to work with, and I may repeat the same task in the future. Hence I am looking at this. Read the PDF files from a blob container, Get the splitting information from a file, Split the PDF document and write it back to a destination blob container. Here is the Python code, I am using Blob Storage Connection String, and we can use Shared Access Signature Token too (I have written a few blogs on SAS Token.). Dependencies Here are the dependencies PyPDF2 pytho

Azure Blob Storage Table limitations

Image
  Image from https://www.pexels.com/@mikebirdy/ Azure Blob Storage Table provides a very economical way to store schema-free records. However, it has some limitations. In the blog, we look at limitations in search queries. We need these dependencies python-dotenv==0.21.1 azure-data-tables==12.4.2 And here is a tiny code snippet. import os from dotenv import load_dotenv from azure.core.exceptions import ResourceExistsError from azure.data.tables import TableClient # create .env file and have values for these # BLOB_CONN_STR= # TBL_NAME= load_dotenv() tbl_name = os.getenv("TBL_NAME") tbl_client = TableClient.from_connection_string( os.getenv("BLOB_CONN_STR"), tbl_name) try: tbl_client.create_table() except ResourceExistsError: print(f"Table, {tbl_name} already exists") tbl_client.create_entity(entity={ "PartitionKey": "pk", # set your partition key accordingly. "RowKey": "row-1", "

Azure Cognitive Search - Synonyms

Image
Image from https://www.pexels.com/@agk42/ I was reading about the Synonyms feature in Azure Cognitive Search and decided to test it with its Python API. These are the dependencies. python-dotenv==0.21.1 azure-identity==1.12.0 azure-search-documents==11.3.0 azure-search==1.0.0b2 We create the clients. An index client and a search client. from dotenv import load_dotenv import os from azure.core.credentials import AzureKeyCredential from azure.search.documents.indexes import SearchIndexClient from azure.search.documents import SearchClient load_dotenv() SERVICE_ENDPOINT = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT") SERVICE_KEY = os.getenv("AZURE_SEARCH_API_KEY") INDEX_NAME = "test-synonyms" index_client = SearchIndexClient(SERVICE_ENDPOINT, AzureKeyCredential(SERVICE_KEY)) search_client = SearchClient( SERVICE_ENDPOINT, INDEX_NAME, AzureKeyCredential(SERVICE_KEY) ) Next. we create the synonym mapping SYNONYM_MAP_NAME = "test-syn-map" from az