A Look at Azure Event Hubs Archive

|  Posted: November 22, 2016  |  Categories: Microsoft Azure

In this guest blog post, we are going to look at a recently released feature called Azure Event Hubs Archive. For those who may not be familiar with Event Hubs, Event Hubs is an Azure service that allows for large-scale ingestion of events. Customers typically send telemetry events to Event Hubs and then consume these events using other Azure services, like Azure Stream Analytics (ASA) or Azure Functions.

At my employer, TransAlta, we have been using Azure Event Hubs in our Industrial Internet of Things (IoT) projects. We documented one of our projects as part of a case study with the Azure Messaging team this past summer, you can read about it here.

A common requirement for customers is to archive event data. Customers may do this for a variety of reasons. For some, they may want to look back at the events that were processed to support operational triage or investigations. At some point, if you are supporting a messaging solution, you are bound to be asked a question about whether a specific event was processed. If you don’t have the “evidence”, you are bound to lose that conversation. However, if you have the event in your archive, then you may be on the right side of that conversation.

Another use case for having an archive of your events is if you want to run an analytic “cold path”. In some scenarios, you may use an ASA job to provide a “hot path” for your real-time analytics that may be focused on a real-time event stream. Conversely, you may want a “cold path” where you are essentially batching-up a series of events over a longer duration, only to perform that analysis later.

You may be asking yourself, why would I want to enable this archive feature? I can build my own logger using log4net or NLog within my consumer. Yes, you can do this. But, you are now responsible for writing or integrating that code in your consumer. You are also responsible for providing storage and compute for that process to run in. For data that needs to be archived, you generally want to store this data in the most cost-effective location. In most situations, this place is in the cloud.

Configuration

Let’s now setup Event Hubs Archive and see it in action. For the purpose of this blog post, I am going to take an existing Event Hub that I provisioned for my Ignite talk.

  1. Within the Azure portal, find your Event Hub and click on Properties.Azure Event Hubs Properties
  2. Enable the Archive feature by turning the slider to On. We also need to specify the Time Window and the Size Window.
    Azure Event Hubs Enable Archive
    Note: The values posted are the default values. The minimum window is 60 seconds up to 900 seconds, which is 15 minutes. The size window goes from 10 MB to 500 MB. Since we have two different thresholds, the threshold that is reached first will cause an archive of the Event Hub to occur.
  3. Next, we need to provide a Storage Account and a Blob Container before we can save our settings.

Testing

  1. We can now start our publisher. In my case, I am going to send a batch of events every 1 second from my simulator.Azure Event Hubs Publisher
  2. In order to see our events, I have downloaded Azure Storage Explorer and configured it with my storage account and key. When we explore our Blob Container we will see a series of files.Azure Event Hubs Storage Explorer
  3. Note the taxonomy of the files. We have:
  4. If we open up a file, we will discover our contents in Avro format.Event Hubs Avro format Also, note that we may see 0 byte files within our Blob Container if we do not have events processed for that period.  Remember, the archive event will be executed when the first threshold is exceeded. In this case, it was the time elapsing before the size constraint was exceeded.

Conclusion

In this post, we discussed how we can very simply and quickly add archive capabilities to manage our Event Hub projects without any performance impact.  It is a great option for customers who want additional traceability and/or for additional analytic streaming options.

Do be aware that there is a cost implication to enable Event Hubs Archive. In addition to the costs related to our Event Hub Throughput Unit(s) and storage, there is also an hourly charge for using this feature.  Please consult with the Azure Event Hub Pricing page for more details.

Author: Kent Weare

Kent Weare is a Microsoft Azure MVP, he has worked on projects for the Canadian Federal Government, a multi-national bank in New York City and integrated Health Care projects throughout Canada. His role as a Senior Enterprise Architect and Integration allows him to get involved in a variety of technology projects for a large Energy company. Since 2004, Kent has been very active in the Integration space having worked with every version of BizTalk from 2004 onwards. He has also spent time using competitor tools including Intersystems, IBM and MuleSoft. He is very passionate about Architecture especially if it involves integrating different systems.