Josef Richberg's blog

Extracting Azure Analysis Services Metrics from your Log Analytics workspace into PowerBI

Josef Richberg — Tue, 06 Feb 2024 16:48:14 GMT

We have numerous Azure Analysis Services servers running in the cloud and needed a way to have near-real time monitoring of the queries being run. This KQL query will enable you to pull all the user/query information necessary to manage these servers, into a PowerBI report.

/* The exported Power Query Formula Language (M Language ) can be used with Power Query in Excel and Power BI Desktop. For Power BI Desktop follow the instructions below:
Download Power BI Desktop from https://powerbi.microsoft.com/desktop/
In Power BI Desktop select: 'Get Data' -> 'Blank Query'->'Advanced Query Editor'
Paste the M Language script into the Advanced Query Editor and select 'Done' */
let AnalyticsQuery = let Source = Json.Document(Web.Contents("https://api.loganalytics.io/v1/subscriptions/your subscription here/query", [Query=[#"query"="AzureDiagnostics | where ResourceProvider == 'MICROSOFT.ANALYSISSERVICES' and Resource in ('server1','server2') and OperationName =='QueryEnd' and NTDomainName_s == 'AzureAD' | order by StartTime_t",#"x-ms-app"="AzureFirstPBI",#"timespan"="PT1H",#"scope"="hierarchy",#"prefer"="ai.response-thinning=true"],Timeout=#duration(0,0,4,0)])), TypeMap = #table( { "AnalyticsTypes", "Type" }, { { "string", Text.Type }, { "int", Int32.Type }, { "long", Int64.Type }, { "real", Double.Type }, { "timespan", Duration.Type }, { "datetime", DateTimeZone.Type }, { "bool", Logical.Type }, { "guid", Text.Type }, { "dynamic", Text.Type } }), DataTable = Source[tables]{0}, Columns = Table.FromRecords(DataTable[columns]), ColumnsWithType = Table.Join(Columns, {"type"}, TypeMap , {"AnalyticsTypes"}), Rows = Table.FromRows(DataTable[rows], Columns[name]), Table = Table.TransformColumnTypes(Rows, Table.ToList(ColumnsWithType, (c) => { c{0}, c{3}})) in Table in AnalyticsQuery

The highlighted section requires you to input your values. In the above example, this will monitor 2 servers. You can add more or split this into individual servers/query.

About those Read-Only Replicas

Josef Richberg — Thu, 26 Oct 2023 13:11:00 GMT

In a post, I wrote last month I showed how you can take advantage of your replicas in SQL Sever by adding a single attribute to your connection string in .Net. Turns out you need to be aware of a caveat that I found yesterday while tracking down an odd bug.

The application in question manages lists of books. The workflow is as follows:

Select 1 or more ISBNs from the list of books we publish (over 1m)
Assign this collection a name and a catalog (collection of lists)
Save the collection
Go back to the main menu with the names and metadata of all your catalogs

The issue happens between 3 and 4. This system keeps metadata about each list and metadata about the folder (number of total lists, number of total items, last modification). It needs to make a series of calls. It can't just save the list and return you to the main screen without first making a call to the system to populate that screen with the latest information (which includes audit information). This means a call to SaveList and a call to GetCatalogs. Developers noted that 70% of the time the main menu was missing the newly saved list, but if they did a refresh it appeared. We could not reproduce this by hand (though Postman).

We are running the database on an Azure Hyperscale with RCSI (Read Committed Snapshot Isolation), so the first thought was there are independent threads making the calls and the GetCatalogs is returning the previously committed set. However, the development team confirmed that it was a single chain of events through a single thread with an average time between calls of 300ms.

All calls in this application are controlled via stored procedures, so we can guarantee the order of operations and within the SaveList procedure is a commit before the result set is returned guaranteeing the data is in the database before the thread gets the go-ahead to move to the next call. At this point I realized it was the replica.

300ms is not enough time for the data to be moved to the replica, so I simply removed the ApplicationIntent=ReadOnly and the bug disappeared.

A word of caution when designing an application. The flow of the application is important to understand while making architectural decisions. In this specific example, the need to pull back the data vs having the application carry it means the timing of when the data is where (primary vs replica) is very important. If the application did a few other things and the avg time between the save and when it made the call to GetCatalog was say 750ms or 1s, the replica might have been just fine.

Quick Tip: Taking advantage of Read-Only SQL Server replicas in your C# application

Josef Richberg — Mon, 18 Sep 2023 14:40:10 GMT

If you happen to be using C# there is a very easy way to set your application up to take advantage of Read-Only replicas.

Using this connection string: (Some portions were left out for brevity)

"Server=tcp:;Initial Catalog=;ApplicationIntent=ReadOnly;"

The option ApplicationIntent=ReadOnly tells the system to first look for a replica when asking for a connection, but if there is no replica, then connect to the existing primary(read/write) server.

I have 2 connection strings in my APIs, one that includes that option and one that doesn't. This helps balance the calls between all the available backend resources without doing any special work within the calls themselves.

How to curb aggressive parallelism in Microsoft SQL Server

Josef Richberg — Wed, 06 Sep 2023 22:36:18 GMT

Microsoft SQL Server, like most modern database systems, can convert a query into a set of parallel instructions to improve efficiency. This is map-reduce before map-reduce was a popular programming paradigm (think Hadoop). This is done by the optimizer based on numerous information points that the system has access to at runtime. 99% of the time, this is perfectly fine, but when you find that 1% it can be very tricky to solve. In this article, I will show you one technique that I use to solve these edge cases.

This is the query I am working with:

select s.SalesRepName,       m.material as ISBN  from MaterialSalesRepMap m  join SalesHierarchy s on (get_bit(EligibilityKey,Position)=1       and SalesRepName in ('Smith,Joe','Doe,John','Doe,Jane','Lowry,Amanda'))

The optimizer chose to parallelize this and we can see it here

One thing to note is that when you make the collection parallel (distribute streams) you then need to funnel those threads back into a controlling thread to output (gather streams).

Statistics (pay attention to the highlighted portion):

It turns out that 2nd worktable grows/shrinks based on the number of reps.

select s.SalesRepName,       m.material as ISBN  from MaterialSalesRepMap m  join SalesHierarchy s on (get_bit(EligibilityKey,Position)=1       and SalesRepName in ('Smith,Joe','Doe,John','Doe,Jane'))

MaterialSalesRepMap

Scan count: NumReps+1
logical reads: 301586*NumReps

The worktable grows by millions for each additional rep. We have 3,998 'rep' entries! It was a this point that I remembered SQL Server (from 2018+) can be 'aggressive' in its choice of how many workers to use when it chooses parallelism.

There is an optimizer hint (maxdop N) which stands for MaxDegreeOfParallelism. This directs the optimizer to use N number of threads when determining how many to use. Rather than guess what would be an optimal number for a query with a varying number of reps in the request, I wanted to see how efficient the system would be without it. So I turned off parallelism by saying (maxdop 1).

select s.SalesRepName,       m.material as ISBN  from MaterialSalesRepMap m  join SalesHierarchy s on (get_bit(EligibilityKey,Position)=1       and SalesRepName in ('Smith,Joe','Doe,John','Doe,Jane','Lowry,Amanda'))option (maxdop 1)

As you can see this forced the optimizer to ignore any type of parallel processing.

The results were fantastic (and consistent)

The only thing that changes is the underlined blue portion. The Scan Count is equal to the number of reps and the logical reads fluctuates slightly up or down accordingly.

If you are looking to squeeze out some additional performance or look for consistent results this would be one of those specialized tuning approaches that you can take. I have used this approach successfully in selects, inserts, and deletes.

Efficient calculation of an ISBN-13 check digit

Josef Richberg — Mon, 10 Jul 2023 01:03:34 GMT

I thought I might pass along, what I have found to be, the most efficient way to validate the check digit within Azure SQL Server. I was looking to go down the path of a CLR, but it turns out that seems to be frowned upon. The function returns Y/N. For my environment, which is a 24CPU HyperScale, I can process approximately 50,000 isbns/second. My test involves reading ISBNs from a table and outputting the value into a temp table.

CREATE function [dbo].[ValidateISBN](@ISBN as bigint)returns char(1)asbegin declare @is978 tinyint=(978-(@ISBN /10000000000))*-1,                 @checkdigit tinyint set @checkdigit =10-cast((cast((@ISBN %10000000000000/1000000000000) as tinyint)  + cast((@ISBN %1000000000000/100000000000) as tinyint)*3  + cast((@ISBN %100000000000/10000000000) as tinyint)  + cast((@ISBN %10000000000/1000000000) as tinyint)*3  + cast((@ISBN %1000000000/100000000) as tinyint)  + cast((@ISBN %100000000/10000000) as tinyint)*3  + cast((@ISBN %10000000/1000000) as tinyint)  + cast((@ISBN %1000000/100000) as tinyint)*3  + cast((@ISBN %100000/10000) as tinyint)  + cast((@ISBN %10000/1000) as tinyint)*3  + cast((@ISBN %1000/100) as tinyint)  + cast((@ISBN %100/10) as tinyint)*3)as tinyint)%10if ((@checkdigit=10 and @ISBN %10=0) or @checkdigit=@ISBN %10)        return 'Y' return 'N'endGO

Pulling Azure Analysis Services logs from Azure Log Analytics into PowerBI using Kusto Query Language (KQL)

Josef Richberg — Wed, 17 May 2023 17:47:39 GMT

We have very large Analysis Services(SSAS) cubes with billions of records and hundreds of users so we need to be able to monitor the performance of queries. 95% of this universe are users of reports built for them by knowledgeable report builders. Those report builders are the 5% we are looking to monitor.

The developers normally build within dev, however, there are situations where 1-off reports are being developed or a new report is being developed and there is an impact on production.

This method of reporting is near real-time vs real-time for 2 reasons. First, SSAS doesn't send information during the run itself like say SQL Server does. It tells you when something starts and then when it finishes. The good news about that is the 'QueryEnd' event has all the calculated metrics (start-time,end-time, total time, etc) so you don't need to capture both 'QueryStart' and 'QueryEnd'. Second, the system behind the Log Analytics Workspaces is an ADX(Azure Data Explorer) which itself requires time to ingest. We see about 5 min delay.

To interact with the ADX cluster you need to write a language called KQL (Kusto Query Language), which looks like a cross between SQL and Unix scripting. Below is the query.

AzureDiagnostics| where ResourceProvider == 'MICROSOFT.ANALYSISSERVICES'    AND Resource in ('','')    AND OperationName == 'QueryEnd'    AND NTDomainName_s == 'AzureAD'| project ServerName = Resource, CubeName = DatabaseName_s,QueryId = RootActivityId_g, QueryStart = StartTime_t, QueryEnd = EndTime_t, Duration = Duration_s, CPUTime = CPUTime_s, Success = Success_s, Error = Error_s,Query = TextData_s

You start with the table you want to query AzureDiagnostics (you can do joins, but they are beyond the scope of this article) and then you put a | (pipe) to take that output and move it to the next filter. In this case a where clause. After that filter you push those results to a project which is a select statement, but unlike SQL is not required. I am using the project because I want to not only create user-friendly names but reduce the number of columns (for efficiency on both ends).

The next step is to convert this into a Power BI (M query). This is under the Export option. Take that query and run it from within PowerBI and you now have a dataset to build your reports off of. The downside is you have to manually refresh the data, however, there is a mechanism to make this streaming. If there is interest in the streaming or any question, please let me know in the comments.

Moving dates to a weekending date

Josef Richberg — Sat, 05 Nov 2022 00:19:26 GMT

In publishing, and Im sure many other industries, we get data at both the daily level and the weekly level. To properly tie these two pieces of data you need to aggregate the daily data and adjust the date to align with the weekly date. Our weekly data is set to the Saturday of the week.

As an example all daily data acquired between 10/21/2018 and 10/27/2018 must have a WeekEndingDate of 10/27/2018. This can be done with a case statement, but a there is an inline formula that will come in handy when doing projections (YTD, previous 6 months, etc). Ill present the formula and then show modifications to handle variations.

select dateadd(dd,7-datepart(dw,'10/21/2018'),'10/21/2018')

The above formula determines how many days away from 7 the current date is and simply adds that many days. We use 7, because Saturday is the 7th day of the week. Ill break it down.

datepart(dw,'10/21/2018')

The above snippet returns a value of 2, because it is a Monday. 7 if it is a Saturday. You subtract 7 from that to determine how many days you have to move forward. In the example, you need to move 5 days from Monday to make it to Saturday. This gives you the number of days to add. Reducing the example:

select dateadd(dd,5,'10/21/2018')

This comes in very handy when you are looking to move backward in time a certain number of days, weeks, months, or years but align to a Saturday. An example is below, which goes back 1 month.

select dateadd(dd,(datepart(dw,7-dateadd(mm,-1,'10/21/2018')),dateadd(mm,-1,'10/21/2018'))

Here is a simple way to remember how to use the formula. Remember you have to move your date first, then push it to Saturday.

declare @var dateselect @var=dateadd(mm,-1,'10/2/2018')select dateadd(dd,7-datepart(dw,@var),@var)

Improved data loads from Snowflake to Azure Synapse Analytics

Josef Richberg — Sun, 30 Oct 2022 14:00:42 GMT

My responsibilities revolve around providing the business with the data they need to make informed business decisions. One of those processes requires us to shift data from a Snowflake data warehouse housed in AWS to an Azure Synapse Analytics data warehouse.

We use Azure Data Factory as our ETL system, which has native drivers for Snowflake, but they only work under very limited situations, none of which suited this specific workflow. This meant we needed to use their odbc driver and to do that, we had to create a managed VM to contain the drivers and an Integration Runtime. This enables us to connect Snowflake directly to our Synapse system. Viola!

This setup, however, is not without its own quirks. You see we also use Azure Analysis Services, which requires ODBC, and some of its data sets also reside on Snowflake. This starts to put a significant amount of pressure on the VMs that act as a gateway, so we began looking for alternatives to moving data, to free up the VMs for cube builds only.

Snowflake allows you to export any table as a series of .gzipped files with the copy into command. You either copy into Snowflake or copy into a set of files. To do this required a script that would normally be run on a machine with the ability to connect to Snowflake. There was no easy way to coordinate this. Originally, we decided to have the job run as the end of a nightly process on Snowflake. We extended the job to push the table to a set of files in an Azure connected storage account and then call our event hub to signal the job is complete. That event would trigger a logic app, which would then call a Data Factory Pipeline that would ingest those files using a feature of Synapse called Polybase.

Polybase is the feature that Synapse uses to push/pull files (like the copy into command). Woot! We have bypassed the VM, but all was not good. Polybase gives incredible performance gains but is very finicky and expects the file to be formatted in some special way before it can ingest it. Even a simple '|' delimited file was copied to a set of staging files and then ingested. This was a dead-end, that is until the new Script Object became available in the Azure Data Factory.

Now we could run that exact command in our pipeline, which gave us more fine-grained control. Part 1. Part 2 was sneaky.

I wanted to see just what Polybase was doing with these files and it turns out to be something incredibly simple.

Polybase is simply replacing whatever delimiter you use in the source file with: \u2bd1
That seemed simple enough, but it turns out for Snowflake you need to tell it to use: \xE2\xaf\x91 as the delimiter.

This looks script object looks like this:

The source of the copy object looks like this:

What type of performance do you get?

461MB/sthat is 100 times faster than we were getting with an odbc connection through a managed VM. Done!

You can use this trick to improve the ingest speed of any csv file.spee

Presenting at DeveloperWeek Global!

Josef Richberg — Thu, 20 Oct 2022 22:52:53 GMT

I'm excited to be presenting my talk on Building a complete API in Azure

A quick update: If you are interested in coming to the presentation or anything else DeveloperWeek has to offer, here is a link to a free pass. There is a limited number of them, so first come, first serve: Free Pass

Composite Models in PowerBI -- The solution to poor search performance in SSAS

Josef Richberg — Mon, 17 Oct 2022 23:02:29 GMT

With the release of composite models in PowerBI, Ive been able to solve a long-standing issue with SQL Server Analysis Services: text search performance. In this article I am going to use our Azure Hyperscale instance (but any Microsoft SQL Server instance will do) to replace our Filter or search tables. Ill go into some background on how we architect our SSAS tabular models, how to inject the new SQL Server search tables, and more importantly the SQL Server specific techniques used to overcome the DAX pushed by PowerBI.

To improve performance in our tabular models we extract the distinct values for various columns the users will be filtering/searching on into their own table in the tabular model. Some of the text-based tables are author names, titles, BISAC codes, and search terms. We will be using search terms in this tutorial.

Unlike many industries where data is locked behind IDs (users, products, locations, etc), in publishing much of the data is locked behind text (words); lots of words. Here are the statistics of the table:

87.5 million records Structure:

      keyword nvarchar(400) not null      ID Bigint not null

select count(*)  from searchtermswhere CHARINDEX(keyword,'cookbook')>=1

Now that we have some insight into the data and how the data is being used, lets look at the technique I use to improve the overall performance. I realized that using Charindex meant that SQL Server wouldnt be able to use an index to find the records, but I needed to put an index on the ID column to allow us to link back to SSAS. I was curious to see if there was any performance impact in scanning the the index vs a heap table and while clicking through SSMS I saw this gem from long ago.

We are running a Hyperscale Azure SQL Server so storage is not an issue (100TB max database), however I thought this might work well given how SQL Server works; at the page level.

If I want a single record or every record on the page, SQL Server must read that page into memory. This means the only way to improve the performance of searching every page is to reduce the overall number of pages. The only way to do that is to compress the data to fit more into a single page, so we will be trading I/O for CPU. I was optimistic that performance of the compression algorithm and general increase in processor power (over I/O speeds) would provide a net benefit and I was right, sort of. Below is a graph of a quick test I did.

As you can see from the chart, the overall time for the queries stayed relative stable, but the number of pages change dramatically. This tells me that the work being done is in the CPU to compare those characters. Adding compression seemed to have no impact on the overall process, but it reduced the I/O load on the server by a factor of 3.5. This is good news as it allows for more queries and while I was hoping for greater speed, the timings here are much faster than those obtained while searching inside of SSAS. This also reduces the load on SSAS by shifting text searching out.

When you wire up SQL Server and SSAS the underlying queries between both parties look just like someone used app filters to pass direct values. Below is an image of what happens after the ISBNs are selected based upon a keyword search in SQL Server. This is as efficient as it can be.

I hope this article has given you some insight into how composite models can connect SQL Server and SSAS to overcome limitations of the product and inspire you to organize your data in a new way.

New Blog Space

Josef Richberg — Mon, 17 Oct 2022 00:59:27 GMT

I've decided to move my blog here. You'll see some old content being uploaded in the next few days. Look around, I hope you find some tidbits of information.

Older content focused on SQL Server and SSIS. Now I focus on the Azure ecosystem.

Using Managed Identities to authenticate Function Apps

Josef Richberg — Mon, 26 Oct 2020 04:00:00 GMT

I have a video detailing how to use Managed Identities to authenticate function apps in Azure.