Since remote working has become the norm, risk and information security teams are operating in a completely different landscape and must adapt in order to meet these new monitoring and log requirements which is critical to ensure resiliency and security of business operations.
Over the past couple of years, our client's Palo Alto Web security log storage costs on Hadoop have increased 5 fold and their ad-hoc Kafka queries are running at a snails pace preventing the information security team from responding to threats in a timely manner.
Azure Data Explorer (ADX) is a fully managed data analytics service for real-time analysis on huge volumes of data streaming from applications, websites and IoT devices.
Performing near real time analytics over over petabytes of data, returning results in less than a second across billions of records.
After our client switched from Hadoop to ADX as the primary storage they observed a huge cost savings of 50% and reduced Kafka Cluster query responses from 30 minutes to a few seconds!
What Data Mastery love most about Azure Data Explorer is the simplified solution and ability it provides our client to add new data ingestion pipelines!
For a detailed explanation on Azure Data Explorer click here.
Ingesting Palo Alto Logs from Azure Storage to ADX
In this article, I will demonstrate how to create an Ingestion Pipeline to ingest and transform Palo Alto Web Traffic logs files uploaded hourly to an Azure Storage Account and which accumulate to a daily total of 200GB (when uncompressed).
The file is a compressed .gz file split into three different formats:
Space delimited values
Pipe delimited values
Space delimited Key-Value Pairs
Solution
The solution used follows the high-level steps below:
Palo Alto Log Files are uploaded/created on Azure Storage(ADLS Gen2) This action in turn triggers the ingestion process using an Event Grid-created subscriber.
The file is ingested into an ADX staging table.
An ADX user-defined Update Policy reads the newly uploaded data in the staging table and transforms the data into the destination table as required.
P.S. This Solution is as simple as it sounds!
Ingestion Pipeline
Prerequisites
Install Kusto explorer and connect to the ADX cluster. Alternatively, the Web UI can be used.
Microsoft recommends each file must be 1GB uncompressed for optimal ingestion and no larger than 4GB.
Register Event Grid with the Azure Subscription.
To create the ingestion pipeline the following steps must be completed
Create a container on Azure Storage - ADLS Gen2.
Create an ADX Staging Table.
Set a Retention Policy on the ADX Staging table.
Create an ADX Query Function to read and transform the data landing in the staging table.
Create an ADX Destination Table for the curated data.
Create ADX Update policy: The Update Policy instructs ADX to automatically append data to the target table whenever new data is inserted into the staging table, based on the transformation function created in step 3.
Create an Event Grid Ingestion Method: The chosen ingestion method is ingesting data into data explorer via Event Grid from ADLS.
Test :)
Steps
1. Create a container on Azure Storage - ADLS Gen2
2. Create an ADX staging table with one column of data type string
3. Set a Retention Policy on the ADX Staging table to only keep 14 days of data.
4. Create an ADX Function.
The function reads and transforms the data from the staging table to the desired output. Only a subset of source columns are required in the output.
5. Create an ADX Destination Table for the curated data.
The ingestion function can be used to create the schema for the destination table using the following script:
NOTE: Ensure the DateTime and numeric columns are typed correctly as ADX stores metadata and statistics for each column. ADX will also store the maximum and minimum values of the extent of the data. This will ensure that when the user requests the data from the store, with certain conditions, it will be compared and only relevant extents are scanned and returned as results.
6. Create ADX Update Policy
The Update Policy instructs ADX to automatically append data to the target table whenever new data is inserted into the staging table, based on the transformation function created above.
7. Create an Event Grid Ingestion Method.
The chosen ingestion method is ingesting data into data explorer via Event Grid from ADLS.
Log in to the Azure Portal.
Navigate to the ADX Cluster ➜ Databases (Select appropriate database) ➜ Data connections.
Add Data Connection – see below.
Click ‘Next: Review + create >’ to the next tab Ingest Properties.
NOTE: Txt files do not have mappings. Mappings are only used for CSV, JSON, AVRO, and W3CLOGFILE files.
8. Test :)
Upload a file to the Azure storage container. If the ingestion has failed run the query below to check why.
Conclusion
If you would like a copy of my code, please drop me a message on LinkedIn.
I hope you have found this helpful and will save your company time and money with Azure Data Explorer.
Please share your thoughts, questions, corrections and suggestions. All feedback and comments are very welcome.
Comments