There were and still are countless tools for system monitoring in the Azure Cloud: In 2007, the System Center Operations Manager (SCOM) was introduced by Microsoft, followed by the Operations Management Suite (OMS) in 2015 and in the same year, Application Insights (App Insights) appeared as a separate resource for tracking web applications.
However, these tools were all designed for specific purposes and there was still no way for users to monitor applications, infrastructure and network in one central location in the portal. This was changed in 2018 with the Azure Monitor.
Azure Monitor is a standard service that does not need to be provisioned as an extra resource in a resource group. In the Azure Monitor it is possible to ...
... connect data from a wide range of sources.
... perform extensive analyses.
... process data for the business process in a variety of ways, for example with Azure Functions or Logic Apps.
Basically, it works as shown in the following graphic: The possible data sources are listed on the left, the data types in the middle and the options for using the data on the right.
Azure Monitor Overview© Microsoft
Data types consist of numeric and decimal values and text entries called metrics and logs:
The metrics are assigned data from time series databases that are developed for real-time scenarios, such as displaying the utilisation of CPU or memory of a virtual machine. With these, it must be possible to react quickly, for example in the event of limit values being exceeded. The logs are somewhat less time-critical. They are represented by data rows in a table with columns. Many columns contain text in JSON format and can thus be broken down even further in the analysis queries.
However, the origin of the data can also be differentiated:
For user sign-in activities at the tenant level, we speak of Active Directory logs; for administrative operations or security messages at the subscription level, we speak of activity logs. At the resource level, relevant metrics and configuration changes of the resources are provided. At the application level, for example of a .Net Core web application, data about the behaviour of the code can be recorded - with exceptions, requests or debugger information.
For certain data, monitoring happens out of the box. They can be queried during a certain retention time. By default, this is 90 days for activity logs and 93 days for metrics. Tenant logs are available for 30 days. However, access can also be extended by transferring it to another data store.
If, for example, a service bus is provisioned with a queue, it is possible to see how many messages have passed the bus over time via the menu of the resource itself or via the monitor service with the corresponding scope. From an app service or a virtual machine, an insight into the performance metrics can be taken via the portal.
Azure Service Bus Metrics © Thomas Hafermalz
Extended monitoring can be managed under the configuration settings, the so-called "Diagnostic Settings". The resource level is used to set which data should be sent to a storage account, a log analytics workspace or an event hub resource, depending on the desired retention period.
What kind of data can be collected depends on the resource. A storage account provides additional telemetry about the number of transactions and their source, a web application firewall can provide detailed insight into recorded requests, matching rules and blocking actions.
An important point in the cloud is the monitoring of virtual machines. Performance counters such as CPU or RAM utilisation are particularly interesting, but also diagnostic data from the boot process, dumps of crashes or Windows event logs or Linux system logs. The latter are very important for further evaluation in security solutions such as the Azure Security Center or Azure Sentinel.
When using telemetry data, it is also difficult in this regard to keep track. There are five different software modules (agents) for collecting system logs, process information or network traffic. Oftentimes, a log analytics workspace is used. The agents can also overlap in terms of data.
A log analytics workspace is a central data store for log and telemetry data. Depending on the configured source, there are different tables in which the data is available after retention has been set. To use the Log Analytics feature, at least one workspace is required - or one Application Insights instance. Data from virtual machines can be sent directly to a workspace or connected via an SCOM solution. It is also common to integrate storage accounts where virtual machines write performance counters or boot diagnostics.
An important data source for monitoring is also the Application Insights resource. Here, too, data is stored in tables. Since it is designed for web applications, there are tables for requests, exceptions and traces. You can choose between an independent data source ("Classic" variant) and the workspace variant. It should be noted, however, that the tables and fields have different names.
The costs are mainly composed of data input and retention period. Both the Log Analytics Workspace and AppInsights offer an initial 5 GB of free input and 31 days or 90 days of free retention respectively. To prevent costs from unnoticedly getting out of control, daily data limits can be set or a storage account can be configured for archiving data for longer periods.
What can be done with the collected data? The term "log analytics" plays an essential role in answering this question. It can have different meanings depending on the source. In this context, the term refers to the part of the Azure Portal where KQL queries can be used and managed.
KQL refers to the Kusto Query Language which was designed for the Data Explorer. It is similar to SQL and is optimised for reading queries of large data sets. KQL is also used for queries in the resource graph. A KQL query follows the pattern of selecting a table or a join, optionally extended ("piped") via conditions and projections (selects). The pipe operator passes on intermediate results.
For example, the equivalent of the MSSQL query would be:
SELECT operation_Name, type, method
WHERE operation_Name = “Myfunction”
| where operation_Name == “Myfunction”
| project operation_Name, type, method
Azure Web Application Fireall Logs © Thomas Hafermalz
A neat dashboard can be created with Azure Monitor Workbooks. Sections can be fixed and results of KQL queries can be arranged as desired and displayed as tables or charts. There are also parameters for search manipulation and various filter and sort options. Query results can also be linked and detailed tables can be changed by clicking on a bar.
Another great benefit of the data is that alarms can be created on its basis. An alarm consists of three components:
First, the signal source must be selected in relation to a resource: This could be a metric, an administrative operation on it or a defined KQL query.
Then the condition is defined, in other words the limit value of the signal including query interval and period. This can be the number of results of a query or the desired warning threshold of the CPU load related to the past two hours. It should be noted, however, that although a query managed in Log Analytics can be used, it is only copied and not referenced. Moreover, the configuration of the query period in the alert already pre-filters the data.
The last step is about the actions or action groups. Now, the people to be notified and the type of notification to them should be determined. By configuring actions, code can be executed by an Azure function and a runbook when an alert occurs, for example, or a logic app can be triggered or a ticket from a connected ITSM can be generated.
If Action Groups and Alert Rules resources are hidden by default, they would have to be made visible for a resource move through the corresponding checkbox.
The term "Insights" covers a wide range of monitoring options. The Applications Insights tool is somewhat older. It is provided as an extra resource and linked with web applications in the cloud and on-premises or, for example, with Azure Functions. It can be used to record exceptions from the code as well as page views and their loading or response times and error codes. Other features include host disgnostics or the display of request information. Distributed tracing can also be used to draw application maps, which can be used to trace a user request from the frontend to the database or storage call.
AppInsights Application Map © Microsoft
Various SDKs can be used to track individual business events and metrics. This enables advanced functions such as the display of user returns and the configuration of funnels to measure the success rates of a user journey - this is useful for an online shop, for example.
There are also other insights for various resources that are automatically available, although some require a little configuration work and a log analytics workspace, such as container insights for monitoring an AKS cluster or VM insights. Central to these solutions are workbooks that use metrics and logs to facilitate resource analysis. Storage Insights can thus provide a quick overview of transactions, storage utilisation and access latency of all storage accounts in the tenant. The counterpart for the Key Vault provides, among other things, access quantities and error rates. This also makes these two Insights variants interesting when the activation of the additional Azure Defender protection for the resources is being weighed up, as the costs for this can thus be quickly and easily estimated.
The unification of monitoring options for telemetry data for the Azure Cloud or on-premises environments in Azure Monitor is progressing. Resources are being merged or simplified and sensible extensions in the area of insights are also on the way. It will therefore be exciting to see where the journey leads.
+41 58 459 53 19