<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Untitled Publication]]></title><description><![CDATA[I have over 34 years in the world of RDBMS.  I have in-depth knowledge of Microsoft SQL Server and the Azure platform (Data Factories, Logic Apps, Functions, Ev]]></description><link>https://josefrichberg.com</link><generator>RSS for Node</generator><lastBuildDate>Sun, 19 Apr 2026 20:34:20 GMT</lastBuildDate><atom:link href="https://josefrichberg.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Analysis Services models in Azure Fabric -- METADATA$ROW_ID gotcha]]></title><description><![CDATA[For those of you entering the Microsoft Fabric foray, there is a big caveat you must be aware of.
If you are using Fabric Mirroring or Delta Lake tables you will see an additional column in your data source: METADATA$ROW_ID. This column is added auto...]]></description><link>https://josefrichberg.com/analysis-services-models-in-azure-fabric-metadatarowid-gotcha</link><guid isPermaLink="true">https://josefrichberg.com/analysis-services-models-in-azure-fabric-metadatarowid-gotcha</guid><category><![CDATA[microsoft fabric]]></category><category><![CDATA[microsoftfabric]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Mon, 08 Sep 2025 19:35:19 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/aZt4hpRwqaE/upload/15a2c325178b9beadd2bd02ec85eaf28.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>For those of you entering the Microsoft Fabric foray, there is a big caveat you must be aware of.</p>
<p>If you are using Fabric Mirroring or Delta Lake tables you will see an additional column in your data source: <strong>METADATA$ROW_ID</strong>. This column is added automatically by Fabric to manage versioning, change data capture, and transactional consistency. What exactly is this column, and you need to care.</p>
<p>The column is, for all intents and purposes, a SQL GUID, which is a 40-character unique identifier and because it is unique it is uncompressible, and this is the problem. Analysis Services models within Fabric are limited by your SKU, so space is at a premium. We are running an F64, so that limits our model size to 25GB.</p>
<p>When you create a model, it does a <strong>select * from <em>&lt;source&gt;</em></strong>, which was never an issue until we started using a Mirrored source. That <strong>METADATA$ROW_ID</strong> snuck into our model and caused the size to explode. Our complete 5-year model normally takes up about 6GB of space, but this would blow out around 18 months. Digging into the models from both Azure Analysis Services and Fabric Analysis Services we found that sneaky ID.</p>
<p>The correction is easy, now that we found the issue, so I am passing it on here in hopes of preventing others the headaches that come from details buried in specific implementations.</p>
<p>Go into your semantic model through DAX Studio and manually remove the <strong>METADATA$ROW_ID</strong> column. Do this after you have created your model as it is simpler to remove the column than to modify the build process to exclude it.</p>
]]></content:encoded></item><item><title><![CDATA[Permissions required for user defined table types in Microsoft SQL Server]]></title><description><![CDATA[This has bitten me so many times, I’m putting this here where I can find it in the future. You might want to bookmark this page :)
I rely heavily on stored procedures for interaction both internally and externally. The improve performance and efficie...]]></description><link>https://josefrichberg.com/permissions-required-for-user-defined-table-types-in-microsoft-sql-server</link><guid isPermaLink="true">https://josefrichberg.com/permissions-required-for-user-defined-table-types-in-microsoft-sql-server</guid><category><![CDATA[#user-defined-table-types]]></category><category><![CDATA[SQL Server]]></category><category><![CDATA[permissions]]></category><category><![CDATA[Microsoft SQL server]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Wed, 25 Jun 2025 15:49:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/7GfRwb78YWs/upload/b43c5044235e3572d124a99e286114ed.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This has bitten me so many times, I’m putting this here where I can find it in the future. You might want to bookmark this page :)</p>
<p>I rely heavily on stored procedures for interaction both internally and externally. The improve performance and efficiency I’ve created many table types, most notably one for isbns. It looks like this:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">create</span> <span class="hljs-keyword">type</span> dbo.ISBN <span class="hljs-keyword">as</span> <span class="hljs-keyword">table</span>(
    ISBN <span class="hljs-built_in">char</span>(<span class="hljs-number">13</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>
    primary <span class="hljs-keyword">key</span> clustered (ISBN <span class="hljs-keyword">asc</span>)
) <span class="hljs-keyword">with</span> (ignore_dupe_Key=<span class="hljs-keyword">off</span>))
</code></pre>
<p>Let’s use this in a sample procedure</p>
<pre><code class="lang-sql"><span class="hljs-keyword">create</span> <span class="hljs-keyword">procedure</span> inventory.CheckISBNQty
(@ISBNs dbo.ISBN readonly)
<span class="hljs-keyword">with</span> <span class="hljs-keyword">execute</span> <span class="hljs-keyword">as</span> owner
<span class="hljs-keyword">as</span>
.....
</code></pre>
<p>We have a function app, <strong>[func-InventoryProcess-prod]</strong> that will be calling this procedure so naturally you would grant it the ability to execute the function.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">grant</span> exec <span class="hljs-keyword">on</span> inventory.CheckISBNQty <span class="hljs-keyword">to</span> [func-InventoryProcess-prod]
</code></pre>
<p>When the function app tries to run the stored procedure, you will get an error that you cannot execute <strong>dbo.ISBN.</strong> To solve this problem, you need to add execute permissions on the table object to the app.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">grant</span> <span class="hljs-keyword">execute</span> <span class="hljs-keyword">on</span> <span class="hljs-keyword">type</span>::dbo.ISBN <span class="hljs-keyword">to</span> [func-InventoryProcess-prod]
<span class="hljs-keyword">or</span>
<span class="hljs-keyword">grant</span> <span class="hljs-keyword">execute</span> <span class="hljs-keyword">on</span> <span class="hljs-keyword">type</span>::[dbo].[ISBN] <span class="hljs-keyword">to</span> [func-InventoryProcess-prod]
</code></pre>
]]></content:encoded></item><item><title><![CDATA[Azure Data Factory utcNow() nuances]]></title><description><![CDATA[For those of you who work with Azure Data Factories I thought I’d help you out with, what I would consider a bug in how pipelines work. For the record, I work with pipelines on almost a daily basis, but I am generally pushing data into Microsoft SQL ...]]></description><link>https://josefrichberg.com/azure-data-factory-utcnow-nuances</link><guid isPermaLink="true">https://josefrichberg.com/azure-data-factory-utcnow-nuances</guid><category><![CDATA[Azure Data Factory]]></category><category><![CDATA[copy activity]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Tue, 03 Jun 2025 17:24:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/BXOXnQ26B7o/upload/fb8c6fe0729523633439fad6ae4b2d38.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>For those of you who work with Azure Data Factories I thought I’d help you out with, what I would consider a bug in how pipelines work. For the record, I work with pipelines on almost a daily basis, but I am generally pushing data into Microsoft SQL Server. In this specific instance, I am pushing data into Snowflake and that data includes dates.</p>
<p>Microsoft SQL Server is very lenient when it comes to the format of a date it will be ingesting. It can be:</p>
<ul>
<li><p>‘6/3/2025’</p>
</li>
<li><p>‘6/03/2025’</p>
</li>
<li><p>‘06/03/2025’</p>
</li>
<li><p>‘2025-6-3’</p>
</li>
<li><p>‘2025-06-3’</p>
</li>
</ul>
<p>The list goes on. Snowflake however, is very specific: ‘YYYY-MM-DD’.</p>
<p>The pipeline I was building needed to add a FILEDATE column to the .csv file it was creating that would be ingested by Snowflake. I added the column into the source of the <strong>Copy data</strong> activity as follows:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748970635045/ccd2e07e-6839-4931-a0bc-da27546d5010.png" alt class="image--center mx-auto" /></p>
<p>The expected result would be: “2025-06-03T13:11:00” as per the documentation of the <strong>utcNow()</strong> function.</p>
<p>The result provided:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748970736912/6105920d-7d7f-43c9-9368-243256c36237.png" alt class="image--center mx-auto" /></p>
<p>As I mentioned before, this would be an acceptable datetime in Microsoft SQL Server, so it was never an issue. On ingestion to Snowflake, this blew up. I tried the FormatDate function, I tried <strong>utcNow(‘</strong>yyyy-mm-ddTHH:mm:ss’) and numerous variations of that, all to no avail.</p>
<p>If however, you put the same function call in a variable:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748971069386/5ed8c614-917e-42f1-9744-cee1dbb6f941.png" alt class="image--center mx-auto" /></p>
<p>You get the correct result:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748971090403/cb4bbd66-d3d0-4c08-8a26-96addf2ff59d.png" alt class="image--center mx-auto" /></p>
<p>Then simply use the variable vs the function when adding it to your source.</p>
]]></content:encoded></item><item><title><![CDATA[Microsoft SQL Server permission chaining]]></title><description><![CDATA[Many times, views are used as a security object; Granting select on a given view instead of the underlying table(s). Now if this view happens to cross schemas you might get an error saying the user does not have select permission on the underlying ta...]]></description><link>https://josefrichberg.com/microsoft-sql-server-permission-chaining</link><guid isPermaLink="true">https://josefrichberg.com/microsoft-sql-server-permission-chaining</guid><category><![CDATA[Microsoft SQL server]]></category><category><![CDATA[permissions]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Mon, 02 Jun 2025 05:46:55 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/9ycXTLGNMro/upload/0ac31699b4fa8e0bf12e30cb5e78cbbf.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Many times, views are used as a security object; Granting <strong>select</strong> on a given view instead of the underlying table(s). Now if this view happens to cross schemas you might get an error saying the user does not have select permission on the underlying table(s). As an example, a user might be restricted to the <strong><em>customer</em></strong> schema but need to view information from the <strong><em>inventory</em></strong> schema.</p>
<p>One might naturally simply give select permission to that specific table in the <strong><em>inventory</em></strong> schema. If you did this, you reduce the security aspect of hiding the underlying schema/objects. The correct method is simply to make sure both schemas are ‘owned’ by the same user. We have everything owned by <strong>dbo</strong>, so you need to run this on all schemas involved.</p>
<p>Instead run this command on both the <strong><em>inventory</em></strong> and <strong><em>customer</em></strong> schema:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">alter</span> authorization <span class="hljs-keyword">on</span> <span class="hljs-keyword">schema</span>::[inventory] <span class="hljs-keyword">to</span> dbo;
<span class="hljs-keyword">alter</span> authorization <span class="hljs-keyword">on</span> <span class="hljs-keyword">schema</span>::[customer] <span class="hljs-keyword">to</span> dbo;
</code></pre>
<p>You preserve the integrity of your security model, while being able to properly isolate data via schemas.</p>
]]></content:encoded></item><item><title><![CDATA[Using a stored procedure as a source for PowerBI Paginated reports]]></title><description><![CDATA[If the reports that you are building do not need to be interactive, you can use Microsoft Paginated Reports (or SSRS). One of the benefits of a paginated report is that it’s source(s) can be stored procedures. I really, really like that option.In Mic...]]></description><link>https://josefrichberg.com/using-a-stored-procedure-as-a-source-for-powerbi-paginated-reports</link><guid isPermaLink="true">https://josefrichberg.com/using-a-stored-procedure-as-a-source-for-powerbi-paginated-reports</guid><category><![CDATA[PowerBI]]></category><category><![CDATA[Power BI]]></category><category><![CDATA[ssrs report builder]]></category><category><![CDATA[power bi report bilder]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Thu, 22 May 2025 17:33:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/XrIfY_4cK1w/upload/2735ecb2f489c4f578db963e6cc77954.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If the reports that you are building do not need to be interactive, you can use Microsoft Paginated Reports (or SSRS). One of the benefits of a paginated report is that it’s source(s) can be stored procedures. I really, really like that option.<br />In Microsoft SQL Server stored procedures are an incredible programming tool. They have numerous benefits which I won’t get into here but suffice to say I use stored procedures wherever possible. You can have complex logic executed in a more efficient manner than say a handful of CTEs that would be required if you needed to do this in a single view. That result is generally stored in the procedure in a temporary table which needs to be selected back at the end of the procedure. The use of a temporary table requires a slight modification to your procedure, or the report builder will not recognize the output.</p>
<p>At the beginning of your procedure (I usually put it as the first line of code after my comments) you need to put the following:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SET</span> FMTONLY <span class="hljs-keyword">OFF</span>
</code></pre>
<p>Now if you look in the Microsoft documents it tells you that the FMTONLY flag is deprecated and gives you a handful of alternatives. I asked CoPilot how I would use these specifically for Report Builder it suggests not using select into #table and either explicitly declaring that temp table or using table variables (which are explicitly declared), but when pressed it said that is not foolproof and the suggested method is the FMTONLY flag.</p>
<p>I will do some additional research myself into declaring # tables ahead of time and the use of table variables, but for now I am comfortable using this. I don’t foresee Microsoft scrapping Report Builder anytime soon.</p>
]]></content:encoded></item><item><title><![CDATA[How to Skip Rows in Azure Data Factory Pipeline's Excel Source]]></title><description><![CDATA[I was building a data factory source the other day and I needed to skip the first 5 rows. This is intuitive and easy within a text (csv) source as there is a setting to skip the first N rows. In excel there is a 'Range' option which has as an example...]]></description><link>https://josefrichberg.com/how-to-skip-rows-in-azure-data-factory-pipelines-excel-source</link><guid isPermaLink="true">https://josefrichberg.com/how-to-skip-rows-in-azure-data-factory-pipelines-excel-source</guid><category><![CDATA[data factory]]></category><category><![CDATA[Azure]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Fri, 19 Jul 2024 14:49:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/GauA0hiEwDk/upload/485f23630942ddc38c040346a6e5b3f0.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><img src alt class="image--center mx-auto" /></p>
<p>I was building a data factory source the other day and I needed to skip the first 5 rows. This is intuitive and easy within a text (csv) source as there is a setting to skip the first N rows. In excel there is a 'Range' option which has as an example 'A1:B10'.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721400085358/4d8af58d-b0ea-4e02-a19d-782fcbc2bf17.png" alt class="image--center mx-auto" /></p>
<p>The example implies that you need to specify the entire area of the excel that you want to import. I simply put my starting point (5 lines down) and sure enough it took the entire excel, all columns, all rows, just skipping down 5 lines.</p>
<p>I know this is an incredibly small post and you might be wondering; "why post this?". I use this blog for 2 reasons. The first is I have an online, universally accessible set of notes. I routinely go to code something and say, "where did I put that formula, or code, or snippet, etc." and rather than trolling through <strong>sys.sql_modules</strong> or trying to remember the pipeline, I just come here.</p>
<p>The second reason is to help others find, what I believe are useful snippets to help others out.</p>
]]></content:encoded></item><item><title><![CDATA[Why Bit Masks?]]></title><description><![CDATA[Publishing is one of those industries where the possibilities are endless. This translates into 'endless combinations' as well. To sell a book you need 4 pieces of information. Those are:

Customer

Book

Format


Sales Rep


Now in the fictional Pri...]]></description><link>https://josefrichberg.com/why-bit-masks</link><guid isPermaLink="true">https://josefrichberg.com/why-bit-masks</guid><category><![CDATA[SQL Server]]></category><category><![CDATA[SQL]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Wed, 03 Jul 2024 05:52:34 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/03r2PBffuCk/upload/c650a7b9747c39e29d8904d0600cd711.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Publishing is one of those industries where the possibilities are endless. This translates into 'endless combinations' as well. To sell a book you need 4 pieces of information. Those are:</p>
<ol>
<li><p>Customer</p>
</li>
<li><p>Book</p>
<ol>
<li><em>Format</em></li>
</ol>
</li>
<li><p>Sales Rep</p>
</li>
</ol>
<p>Now in the fictional <strong>PrintItNow</strong> publishing company here are some base stats</p>
<ul>
<li><p>Number of Customers: 20,000</p>
</li>
<li><p>Number of Titles: 100,000</p>
<ul>
<li>Number of Formats: 5</li>
</ul>
</li>
<li><p>Number of Reps: 1,000</p>
</li>
</ul>
<p>The rules are as follows:</p>
<ul>
<li><p>A rep can be assigned the following:</p>
<ul>
<li><p>1 or all customers</p>
</li>
<li><p>1 or all titles</p>
</li>
<li><p>1 or all formats</p>
</li>
</ul>
</li>
<li><p>Two reps cannot sell the same book+format to the same customer. That means I can sell the hardcover and you could sell the paperback to the same client.</p>
</li>
<li><p><strong>Reps can be realigned at any time across all formats, books, and customers. This realignment must be retroactively applied to all previous sales.</strong> -- GOTCHA!</p>
</li>
</ul>
<p>To put that into perspective if I am allowed to sell all of the titles, but hardcover only, to all customers I have a 'territory' of 20,000*100,000=2 billion entries in my sales table. How often do reps get realigned? The two most common times are when a new rep gets hired and takes over a portion of territory from 1 or more other reps or if a rep leaves and that territory needs to be spread out over other reps. No matter what the case all the sales entries (which could be billions) need to be changed. Adding to that is every day you sell more books, so the universe of sales grows, and you publish new books, so the territory grows. I had tried to tackle this problem a few years ago with the use of bit masks, but the functions were not up to the task. With SQL Server 2022, they released two new functions; <strong>SET_BIT</strong> and <strong>GET_BIT</strong>.</p>
<p>Now that we have the background of our publishing company, let's get into what a bit mask is.</p>
<p>First, we need a table to manage the reps themselves.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">Create</span> <span class="hljs-keyword">table</span> SalesRep
(RepName <span class="hljs-built_in">varchar</span>(<span class="hljs-number">256</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
RepId <span class="hljs-built_in">int</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>)
</code></pre>
<p>This gives me plenty of space for reps (just under 2 million for a 4-byte int)</p>
<p>Now you need to have a table that would manage the sales territory</p>
<pre><code class="lang-sql"><span class="hljs-keyword">Create</span> <span class="hljs-keyword">table</span> BookRep
(RepId <span class="hljs-built_in">int</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
Book <span class="hljs-built_in">bigint</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>)
</code></pre>
<p>The line is concise (8 bytes), but you could have tens of millions or more based upon the combination of what book a rep can sell. For this example, we have every rep able to sell every book and divide it up by customers. In that case it would be 1000*100000=100 million and it will grow for every rep you hire and every book you publish. Here's a better way.</p>
<h3 id="heading-bit-masks">Bit Masks</h3>
<p>A bit mask works much like the lockers we had in school. Imagine a set of lockers numbered 0-7. Now let's take students in alphabetical order by first name: Alice, Bob, Jessica, and Joshua. It is important that you pick some ordering system for your data to insure a consistent mapping.</p>
<p>Now let's assign these students to lockers, starting at 0 for the first student, 1 for the second and so on: <em>(MaskPosition = Locker #)</em></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>RepName</td><td>EmployeeId</td><td>MaskPosition</td></tr>
</thead>
<tbody>
<tr>
<td>Alice</td><td>S123</td><td>0</td></tr>
<tr>
<td>Bob</td><td>S252</td><td>1</td></tr>
<tr>
<td>Jessica</td><td>S110</td><td>2</td></tr>
<tr>
<td>Joshua</td><td>S871</td><td>3</td></tr>
</tbody>
</table>
</div><p>This would be represented in the code below</p>
<pre><code class="lang-sql"><span class="hljs-keyword">Create</span> <span class="hljs-keyword">table</span> SalesRep
(RepName <span class="hljs-built_in">varchar</span>(<span class="hljs-number">256</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
EmployeeId <span class="hljs-built_in">char</span>(<span class="hljs-number">4</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>,
MaskPosition <span class="hljs-built_in">int</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>) <span class="hljs-comment">-- new column to represent the assigned position</span>
</code></pre>
<p>Now we will need a data type to represent the lockers. If you haven't already guessed, you need a binary data type and you will need 1 byte for every 8, or fraction of 8, sales reps (lockers) you will employ. In our example, we need a <strong>binary(1)</strong>.</p>
<p>The table would look like this.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">Create</span> <span class="hljs-keyword">table</span> BookRepMap
(Book <span class="hljs-built_in">bigint</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>, <span class="hljs-comment">--ISBN13 can be represented as a bigint </span>
 RepMap <span class="hljs-built_in">binary</span>(<span class="hljs-number">1</span>) <span class="hljs-keyword">not</span> <span class="hljs-literal">null</span>)
</code></pre>
<p>This will only ever add a record for each Book, and you add 1 byte to the RepMap column for every 8 reps (starting with the first). Now before we end the first article in the series, let me explain just how powerful this is.</p>
<p>If we take 8000 bytes as the maximum row and subtract 8 bytes for the bigint, that leaves us with 7982 bytes for the RepMap column. You get 8 positions for each byte: 7982*8=63,856 total reps(positions) for single row!</p>
<p>In the next installment of this series, I will show the code used to fill this table.</p>
]]></content:encoded></item><item><title><![CDATA[Prevent SQL Injection Attacks Using SQL Server Stored Procedures]]></title><description><![CDATA[SQL injection is a very serious topic and there are numerous libraries and best practices to help you secure your connection between your app and your database. To reduce the ability of bad actors to take over your SQL requests you can use Stored Pro...]]></description><link>https://josefrichberg.com/prevent-sql-injection-attacks-using-sql-server-stored-procedures</link><guid isPermaLink="true">https://josefrichberg.com/prevent-sql-injection-attacks-using-sql-server-stored-procedures</guid><category><![CDATA[SQL Server]]></category><category><![CDATA[stored procedure]]></category><category><![CDATA[Azure SQL Database]]></category><category><![CDATA[SQL Injection]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Tue, 04 Jun 2024 21:25:46 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/_ts3NfjvaXo/upload/7c29c878c2a06ade2920a41faea1267e.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>SQL injection is a very serious topic and there are numerous libraries and best practices to help you secure your connection between your app and your database. To reduce the ability of bad actors to take over your SQL requests you can use Stored Procedures (think of it as server-side code) vs client-side code. They are equivalent to function calls within other programming languages.</p>
<h3 id="heading-anatomy-of-a-stored-procedure">Anatomy of a Stored Procedure</h3>
<p>Similar to function calls a stored procedure has a name and 0 or more parameters. The parameters are actually placeholder variables, so they are required to begin with an ampersand (<strong>@</strong>). Like variables they are also required to have their datatype defined. Let's build a very simple stored procedure that takes an ISBN and returns the author of the book.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">create</span> <span class="hljs-keyword">procedure</span> FindAuthorISBN13
(
@ISBN <span class="hljs-built_in">bigint</span>
)
<span class="hljs-keyword">as</span>
<span class="hljs-keyword">begin</span>
<span class="hljs-comment">--fill in with code</span>
<span class="hljs-keyword">end</span>
</code></pre>
<p>This is how you call it.</p>
<pre><code class="lang-sql">
<span class="hljs-comment">--call by parameter. This is the preferred method</span>
exec FindAuthorISBN13 @ISBN=9781254872103
<span class="hljs-comment">--call by position</span>
exec FindAuthorISBN13 9781254872103
<span class="hljs-comment">--implicit conversion can be called by position or parameter</span>
exec FindAuthorISBN13 '9781254872103'
</code></pre>
<p>Now if you tried to call it with something other than a valid big integer you will get an error. That is the first line of defense.</p>
<p>Now for the guts of the procedure:</p>
<pre><code class="lang-sql"> <span class="hljs-keyword">create</span> <span class="hljs-keyword">procedure</span> FindAuthorISBN13
(
@ISBN <span class="hljs-built_in">bigint</span>
)
<span class="hljs-keyword">with</span> <span class="hljs-keyword">execute</span> <span class="hljs-keyword">as</span> owner
<span class="hljs-keyword">as</span>
<span class="hljs-keyword">begin</span>
 <span class="hljs-keyword">select</span> AuthorName <span class="hljs-keyword">as</span> [Author <span class="hljs-keyword">Name</span>]
   <span class="hljs-keyword">from</span> TitleMetadata
  <span class="hljs-keyword">where</span> ISBN13=@ISBN
<span class="hljs-keyword">end</span>
</code></pre>
<p>As you can see you are using the variable to test for a value against a column. You aren't passing in a SQL statement to be run.</p>
<p>Now from C# this is how you would make the call:</p>
<pre><code class="lang-csharp">....
<span class="hljs-keyword">var</span> cmd = <span class="hljs-keyword">new</span> SQLCommand(Proc, conn)
cmd.Parameters.AddWithValue(<span class="hljs-string">"@ISBN"</span>,_isbn);
cmd.CommandType = CommandType.StoredProcedure;
<span class="hljs-keyword">using</span> <span class="hljs-keyword">var</span> reader = cmd.ExecuteReader();
reader.Read();
result=reader.GetFieldValue&lt;<span class="hljs-keyword">string</span>&gt;(<span class="hljs-number">0</span>);
......
</code></pre>
<p>There is no way a malicious actor can gain access to anything other than the result programmed in the procedure.</p>
]]></content:encoded></item><item><title><![CDATA[Extracting Azure Analysis Services Metrics from your Log Analytics workspace into PowerBI]]></title><description><![CDATA[We have numerous Azure Analysis Services servers running in the cloud and needed a way to have near-real time monitoring of the queries being run. This KQL query will enable you to pull all the user/query information necessary to manage these servers...]]></description><link>https://josefrichberg.com/extracting-azure-analysis-services-metrics-from-your-log-analytics-workspace-into-powerbi</link><guid isPermaLink="true">https://josefrichberg.com/extracting-azure-analysis-services-metrics-from-your-log-analytics-workspace-into-powerbi</guid><category><![CDATA[PowerBI]]></category><category><![CDATA[KQL]]></category><category><![CDATA[Azure Analysis Services]]></category><category><![CDATA[Azure Log Analytics]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Tue, 06 Feb 2024 16:48:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/Wpnoqo2plFA/upload/214dce3673bd8afcf9d53db40c304686.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We have numerous Azure Analysis Services servers running in the cloud and needed a way to have near-real time monitoring of the queries being run. This KQL query will enable you to pull all the user/query information necessary to manage these servers, into a PowerBI report.</p>
<blockquote>
<p>/* The exported Power Query Formula Language (M Language ) can be used with Power Query in Excel and Power BI Desktop. For Power BI Desktop follow the instructions below:</p>
<ol>
<li><p>Download Power BI Desktop from <a target="_blank" href="https://powerbi.microsoft.com/desktop/">https://powerbi.microsoft.com/desktop/</a></p>
</li>
<li><p>In Power BI Desktop select: 'Get Data' -&gt; 'Blank Query'-&gt;'Advanced Query Editor'</p>
</li>
<li><p>Paste the M Language script into the Advanced Query Editor and select 'Done' */</p>
</li>
</ol>
<p>let AnalyticsQuery = let Source = Json.Document(Web.Contents("<a target="_blank" href="https://api.loganalytics.io/v1/subscriptions//query">https://api.loganalytics.io/v1/subscriptions/</a><strong><mark>your subscription here</mark></strong><a target="_blank" href="https://api.loganalytics.io/v1/subscriptions//query">/query</a>", [Query=[#"query"="AzureDiagnostics | where ResourceProvider == 'MICROSOFT.ANALYSISSERVICES' and Resource in ('<em><mark>server1','server2')</mark></em> and OperationName =='QueryEnd' and NTDomainName_s == 'AzureAD' | order by StartTime_t",#"x-ms-app"="AzureFirstPBI",#"timespan"="PT1H",#"scope"="hierarchy",#"prefer"="ai.response-thinning=true"],Timeout=#duration(0,0,4,0)])), TypeMap = #table( { "AnalyticsTypes", "Type" }, { { "string", Text.Type }, { "int", Int32.Type }, { "long", Int64.Type }, { "real", Double.Type }, { "timespan", Duration.Type }, { "datetime", DateTimeZone.Type }, { "bool", Logical.Type }, { "guid", Text.Type }, { "dynamic", Text.Type } }), DataTable = Source[tables]{0}, Columns = Table.FromRecords(DataTable[columns]), ColumnsWithType = Table.Join(Columns, {"type"}, TypeMap , {"AnalyticsTypes"}), Rows = Table.FromRows(DataTable[rows], Columns[name]), Table = Table.TransformColumnTypes(Rows, Table.ToList(ColumnsWithType, (c) =&gt; { c{0}, c{3}})) in Table in AnalyticsQuery</p>
</blockquote>
<p>The highlighted section requires you to input your values. In the above example, this will monitor 2 servers. You can add more or split this into individual servers/query.</p>
]]></content:encoded></item><item><title><![CDATA[About those Read-Only Replicas]]></title><description><![CDATA[In a post, I wrote last month I showed how you can take advantage of your replicas in SQL Sever by adding a single attribute to your connection string in .Net. Turns out you need to be aware of a caveat that I found yesterday while tracking down an o...]]></description><link>https://josefrichberg.com/about-those-read-only-replicas</link><guid isPermaLink="true">https://josefrichberg.com/about-those-read-only-replicas</guid><category><![CDATA[.NET]]></category><category><![CDATA[SQL Server]]></category><category><![CDATA[replication]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Thu, 26 Oct 2023 13:11:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/GGewLGcQD-I/upload/9d527c4530184dbc3b1acbdf41b2a2d2.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In a <a target="_blank" href="https://josefrichberg.com/quick-tip-taking-advantage-of-read-only-sql-server-replicas-in-your-c-application">post</a>, I wrote last month I showed how you can take advantage of your replicas in SQL Sever by adding a single attribute to your connection string in .Net. Turns out you need to be aware of a caveat that I found yesterday while tracking down an odd bug.</p>
<p>The application in question manages lists of books. The workflow is as follows:</p>
<ol>
<li><p>Select 1 or more ISBNs from the list of books we publish (over 1m)</p>
</li>
<li><p>Assign this collection a name and a catalog (collection of lists)</p>
</li>
<li><p>Save the collection</p>
</li>
<li><p>Go back to the main menu with the names and metadata of all your catalogs</p>
</li>
</ol>
<p>The issue happens between 3 and 4. This system keeps metadata about each list and metadata about the folder (number of total lists, number of total items, last modification). It needs to make a series of calls. It can't just save the list and return you to the main screen without first making a call to the system to populate that screen with the latest information (which includes audit information). This means a call to <strong>SaveList</strong> and a call to <strong>GetCatalogs</strong>. Developers noted that 70% of the time the main menu was missing the newly saved list, but if they did a refresh it appeared. We could not reproduce this by hand (though Postman).</p>
<p>We are running the database on an Azure Hyperscale with RCSI (Read Committed Snapshot Isolation), so the first thought was there are independent threads making the calls and the <strong>GetCatalogs</strong> is returning the previously committed set. However, the development team confirmed that it was a single chain of events through a single thread with an average time between calls of 300ms.</p>
<p>All calls in this application are controlled via stored procedures, so we can guarantee the order of operations and within the <strong>SaveList</strong> procedure is a commit before the result set is returned guaranteeing the data is in the database before the thread gets the go-ahead to move to the next call. At this point I realized it was the replica.</p>
<p>300ms is not enough time for the data to be moved to the replica, so I simply removed the <strong>ApplicationIntent=ReadOnly</strong> and the bug disappeared.</p>
<p>A word of caution when designing an application. The flow of the application is important to understand while making architectural decisions. In this specific example, the need to pull back the data vs having the application carry it means the timing of when the data is where (primary vs replica) is very important. If the application did a few other things and the avg time between the save and when it made the call to <strong>GetCatalog</strong> was say 750ms or 1s, the replica might have been just fine.</p>
]]></content:encoded></item><item><title><![CDATA[Quick Tip: Taking advantage of Read-Only SQL Server replicas in your C# application]]></title><description><![CDATA[If you happen to be using C# there is a very easy way to set your application up to take advantage of Read-Only replicas.
Using this connection string: (Some portions were left out for brevity)
"Server=tcp:<servername,port>;Initial Catalog=<database ...]]></description><link>https://josefrichberg.com/quick-tip-taking-advantage-of-read-only-sql-server-replicas-in-your-c-application</link><guid isPermaLink="true">https://josefrichberg.com/quick-tip-taking-advantage-of-read-only-sql-server-replicas-in-your-c-application</guid><category><![CDATA[C#]]></category><category><![CDATA[SQL Server]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Mon, 18 Sep 2023 14:40:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/cvBBO4PzWPg/upload/b75b779f325cedea703ace0ba315a897.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you happen to be using C# there is a very easy way to set your application up to take advantage of Read-Only replicas.</p>
<p>Using this connection string: <strong><em>(Some portions were left out for brevity)</em></strong></p>
<pre><code class="lang-csharp"><span class="hljs-string">"Server=tcp:&lt;servername,port&gt;;Initial Catalog=&lt;database name&gt;;ApplicationIntent=ReadOnly;"</span>
</code></pre>
<p>The option <strong>ApplicationIntent=ReadOnly</strong> tells the system to first look for a replica when asking for a connection, but if there is no replica, then connect to the existing <strong>primary</strong>(read/write) server.</p>
<p>I have 2 connection strings in my APIs, one that includes that option and one that doesn't. This helps balance the calls between all the available backend resources without doing any special work within the calls themselves.</p>
]]></content:encoded></item><item><title><![CDATA[How to curb aggressive parallelism in Microsoft SQL Server]]></title><description><![CDATA[Microsoft SQL Server, like most modern database systems, can convert a query into a set of parallel instructions to improve efficiency. This is map-reduce before map-reduce was a popular programming paradigm (think Hadoop). This is done by the optimi...]]></description><link>https://josefrichberg.com/how-to-curb-aggressive-parallelism-in-microsoft-sql-server</link><guid isPermaLink="true">https://josefrichberg.com/how-to-curb-aggressive-parallelism-in-microsoft-sql-server</guid><category><![CDATA[SQL Server]]></category><category><![CDATA[T-SQL]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Wed, 06 Sep 2023 22:36:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/2OuTr9_VaUg/upload/f07b795a4ff2fa014d5394cb000161ff.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Microsoft SQL Server, like most modern database systems, can convert a query into a set of parallel instructions to improve efficiency. This is <strong>map-reduce</strong> before <strong>map-reduce</strong> was a popular programming paradigm (think Hadoop). This is done by the optimizer based on numerous information points that the system has access to at runtime. 99% of the time, this is perfectly fine, but when you find that 1% it can be very tricky to solve. In this article, I will show you one technique that I use to solve these edge cases.</p>
<p>This is the query I am working with:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> s.SalesRepName,
       m.material <span class="hljs-keyword">as</span> ISBN
  <span class="hljs-keyword">from</span> MaterialSalesRepMap m
  <span class="hljs-keyword">join</span> SalesHierarchy s <span class="hljs-keyword">on</span> (get_bit(EligibilityKey,<span class="hljs-keyword">Position</span>)=<span class="hljs-number">1</span>
       <span class="hljs-keyword">and</span> SalesRepName <span class="hljs-keyword">in</span> (<span class="hljs-string">'Smith,Joe'</span>,<span class="hljs-string">'Doe,John'</span>,<span class="hljs-string">'Doe,Jane'</span>,<span class="hljs-string">'Lowry,Amanda'</span>))
</code></pre>
<p>The optimizer chose to parallelize this and we can see it here</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1694017173106/7bd324a9-a327-4c82-ac9c-26de0be18b30.png" alt class="image--center mx-auto" /></p>
<p>One thing to note is that when you make the collection parallel (distribute streams) you then need to funnel those threads back into a controlling thread to output (gather streams).</p>
<p>Statistics (pay attention to the highlighted portion):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1694015857556/0b3f9264-feb9-4acd-ad2e-f8a78920519f.png" alt class="image--center mx-auto" /></p>
<p>It turns out that 2nd <strong>worktable</strong> grows/shrinks based on the number of reps.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> s.SalesRepName,
       m.material <span class="hljs-keyword">as</span> ISBN
  <span class="hljs-keyword">from</span> MaterialSalesRepMap m
  <span class="hljs-keyword">join</span> SalesHierarchy s <span class="hljs-keyword">on</span> (get_bit(EligibilityKey,<span class="hljs-keyword">Position</span>)=<span class="hljs-number">1</span>
       <span class="hljs-keyword">and</span> SalesRepName <span class="hljs-keyword">in</span> (<span class="hljs-string">'Smith,Joe'</span>,<span class="hljs-string">'Doe,John'</span>,<span class="hljs-string">'Doe,Jane'</span>))
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1694016480254/b03736a1-45ee-4eba-81fb-77310fe5ad85.png" alt class="image--center mx-auto" /></p>
<p>MaterialSalesRepMap</p>
<ul>
<li><p>Scan count: NumReps+1</p>
</li>
<li><p>logical reads: 301586*NumReps</p>
</li>
</ul>
<p>The worktable grows by millions for each additional rep. We have 3,998 'rep' entries! It was a this point that I remembered SQL Server (from 2018+) can be 'aggressive' in its choice of how many workers to use when it chooses parallelism.</p>
<p>There is an optimizer hint (<strong>maxdop N)</strong> which stands for <strong>M</strong>ax<strong>D</strong>egreeOf<strong>P</strong>arallelism. This directs the optimizer to use <em>N</em> number of threads when determining how many to use. Rather than guess what would be an optimal number for a query with a varying number of reps in the request, I wanted to see how efficient the system would be without it. So I turned off parallelism by saying <strong>(maxdop 1)</strong>.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> s.SalesRepName,
       m.material <span class="hljs-keyword">as</span> ISBN
  <span class="hljs-keyword">from</span> MaterialSalesRepMap m
  <span class="hljs-keyword">join</span> SalesHierarchy s <span class="hljs-keyword">on</span> (get_bit(EligibilityKey,<span class="hljs-keyword">Position</span>)=<span class="hljs-number">1</span>
       <span class="hljs-keyword">and</span> SalesRepName <span class="hljs-keyword">in</span> (<span class="hljs-string">'Smith,Joe'</span>,<span class="hljs-string">'Doe,John'</span>,<span class="hljs-string">'Doe,Jane'</span>,<span class="hljs-string">'Lowry,Amanda'</span>))
<span class="hljs-keyword">option</span> (maxdop <span class="hljs-number">1</span>)
</code></pre>
<p>As you can see this forced the optimizer to ignore any type of parallel processing.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1694017787341/25bc1352-b716-4848-b1ad-464158071003.png" alt class="image--center mx-auto" /></p>
<p>The results were fantastic (and consistent)</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1694018207362/a09f2ccf-9435-4982-bcee-c398539e30bd.png" alt class="image--center mx-auto" /></p>
<p>The only thing that changes is the underlined blue portion. The <strong>Scan Count</strong> is equal to the number of reps and the <strong>logical reads</strong> fluctuates slightly up or down accordingly.</p>
<p>If you are looking to squeeze out some additional performance or look for consistent results this would be one of those specialized tuning approaches that you can take. I have used this approach successfully in selects, inserts, and deletes.</p>
]]></content:encoded></item><item><title><![CDATA[Efficient calculation of an ISBN-13 check digit]]></title><description><![CDATA[I thought I might pass along, what I have found to be, the most efficient way to validate the check digit within Azure SQL Server. I was looking to go down the path of a CLR, but it turns out that seems to be frowned upon. The function returns Y/N. F...]]></description><link>https://josefrichberg.com/efficient-calculation-of-an-isbn-13-check-digit</link><guid isPermaLink="true">https://josefrichberg.com/efficient-calculation-of-an-isbn-13-check-digit</guid><category><![CDATA[SQL Server]]></category><category><![CDATA[T-SQL]]></category><category><![CDATA[book publishing]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Mon, 10 Jul 2023 01:03:34 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/Wpnoqo2plFA/upload/v1667144416089/RIpbPXrKO.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I thought I might pass along, what I have found to be, the most efficient way to validate the check digit within Azure SQL Server. I was looking to go down the path of a CLR, but it turns out that seems to be frowned upon. The function returns Y/N. For my environment, which is a 24CPU HyperScale, I can process approximately 50,000 isbns/second. My test involves reading ISBNs from a table and outputting the value into a temp table.</p>
<pre><code class="lang-plaintext">CREATE function [dbo].[ValidateISBN]

(@ISBN as bigint)

returns char(1)

as

begin

 declare @is978 tinyint=(978-(@ISBN /10000000000))*-1,

                 @checkdigit tinyint

 set @checkdigit =10-cast((cast((@ISBN %10000000000000/1000000000000) as tinyint)

  + cast((@ISBN %1000000000000/100000000000) as tinyint)*3

  + cast((@ISBN %100000000000/10000000000) as tinyint)

  + cast((@ISBN %10000000000/1000000000) as tinyint)*3

  + cast((@ISBN %1000000000/100000000) as tinyint)

  + cast((@ISBN %100000000/10000000) as tinyint)*3

  + cast((@ISBN %10000000/1000000) as tinyint)

  + cast((@ISBN %1000000/100000) as tinyint)*3

  + cast((@ISBN %100000/10000) as tinyint)

  + cast((@ISBN %10000/1000) as tinyint)*3

  + cast((@ISBN %1000/100) as tinyint)

  + cast((@ISBN %100/10) as tinyint)*3)as tinyint)%10



if ((@checkdigit=10 and @ISBN %10=0) or @checkdigit=@ISBN %10)

        return 'Y'

 return 'N'

end

GO
</code></pre>
]]></content:encoded></item><item><title><![CDATA[Pulling Azure Analysis Services logs from Azure Log Analytics into PowerBI using Kusto Query Language (KQL)]]></title><description><![CDATA[We have very large Analysis Services(SSAS) cubes with billions of records and hundreds of users so we need to be able to monitor the performance of queries. 95% of this universe are users of reports built for them by knowledgeable report builders. Th...]]></description><link>https://josefrichberg.com/pulling-azure-analysis-services-logs-from-azure-log-analytics-into-powerbi-using-kusto-query-language-kql</link><guid isPermaLink="true">https://josefrichberg.com/pulling-azure-analysis-services-logs-from-azure-log-analytics-into-powerbi-using-kusto-query-language-kql</guid><category><![CDATA[KQL]]></category><category><![CDATA[PowerBI]]></category><category><![CDATA[Azure Analysis Services]]></category><category><![CDATA[Azure Log Analytics]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Wed, 17 May 2023 17:47:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/BVyNlchWqzs/upload/90d68f133ae5c88286d9ff0b9a813112.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We have very large Analysis Services(SSAS) cubes with billions of records and hundreds of users so we need to be able to monitor the performance of queries. 95% of this universe are users of reports built for them by knowledgeable report builders. Those report builders are the 5% we are looking to monitor.</p>
<p>The developers normally build within dev, however, there are situations where 1-off reports are being developed or a new report is being developed and there is an impact on production.</p>
<p>This method of reporting is near real-time vs real-time for 2 reasons. First, SSAS doesn't send information during the run itself like say SQL Server does. It tells you when something starts and then when it finishes. The good news about that is the 'QueryEnd' event has all the calculated metrics (start-time,end-time, total time, etc) so you don't need to capture both 'QueryStart' and 'QueryEnd'. Second, the system behind the Log Analytics Workspaces is an ADX(Azure Data Explorer) which itself requires time to ingest. We see about 5 min delay.</p>
<p>To interact with the ADX cluster you need to write a language called KQL (Kusto Query Language), which looks like a cross between SQL and Unix scripting. Below is the query.</p>
<pre><code class="lang-plaintext">AzureDiagnostics
| where ResourceProvider == 'MICROSOFT.ANALYSISSERVICES'
    AND Resource in ('&lt;myserver1&gt;','&lt;myserver2&gt;')
    AND OperationName == 'QueryEnd'
    AND NTDomainName_s == 'AzureAD'
| project ServerName = Resource, CubeName = DatabaseName_s,QueryId = RootActivityId_g, QueryStart = StartTime_t, QueryEnd = EndTime_t, Duration = Duration_s, CPUTime = CPUTime_s, Success = Success_s, Error = Error_s,Query = TextData_s
</code></pre>
<p>You start with the table you want to query <strong>AzureDiagnostics</strong> (you can do joins, but they are beyond the scope of this article) and then you put a <strong>|</strong> (pipe) to take that output and move it to the next filter. In this case a <strong>where</strong> clause. After that filter you push those results to a <strong>project</strong> which is a select statement, but unlike SQL is not required. I am using the project because I want to not only create user-friendly names but reduce the number of columns (for efficiency on both ends).</p>
<p>The next step is to convert this into a Power BI (M query). This is under the Export option. Take that query and run it from within PowerBI and you now have a dataset to build your reports off of. The downside is you have to manually refresh the data, however, there is a mechanism to make this streaming. If there is interest in the streaming or any question, please let me know in the comments.</p>
]]></content:encoded></item><item><title><![CDATA[Moving dates to a weekending date]]></title><description><![CDATA[In publishing, and I’m sure many other industries, we get data at both the daily level and the weekly level. To properly tie these two pieces of data you need to aggregate the daily data and adjust the date to align with the weekly date. Our weekly d...]]></description><link>https://josefrichberg.com/moving-dates-to-a-weekending-date</link><guid isPermaLink="true">https://josefrichberg.com/moving-dates-to-a-weekending-date</guid><category><![CDATA[Azure SQL Database]]></category><category><![CDATA[SQL Server]]></category><category><![CDATA[SQL]]></category><category><![CDATA[#SQLtutorial ]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Sat, 05 Nov 2022 00:19:26 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/tT2DSShVDTI/upload/v1667147217050/M--2NGmm5.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In publishing, and I’m sure many other industries, we get data at both the daily level and the weekly level. To properly tie these two pieces of data you need to aggregate the daily data and adjust the date to align with the weekly date. Our weekly data is set to the Saturday of the week.</p>
<p>As an example all daily data acquired between 10/21/2018 and 10/27/2018 must have a WeekEndingDate of 10/27/2018. This can be done with a case statement, but a there is an inline formula that will come in handy when doing projections (YTD, previous 6 months, etc). I’ll present the formula and then show modifications to handle variations.</p>
<pre><code>select dateadd(dd,<span class="hljs-number">7</span>-datepart(dw,<span class="hljs-string">'10/21/2018'</span>),<span class="hljs-string">'10/21/2018'</span>)
</code></pre><p>The above formula determines how many days away from 7 the current date is and simply adds that many days. We use 7, because Saturday is the 7th day of the week. I’ll break it down.</p>
<pre><code>datepart(dw,<span class="hljs-string">'10/21/2018'</span>)
</code></pre><p>The above snippet returns a value of 2, because it is a Monday. 7 if it is a Saturday. You subtract 7 from that to determine how many days you have to move forward. In the example, you need to move 5 days from Monday to make it to Saturday. This gives you the number of days to add. Reducing the example:</p>
<pre><code>select dateadd(dd,<span class="hljs-number">5</span>,<span class="hljs-string">'10/21/2018'</span>)
</code></pre><p>This comes in very handy when you are looking to move backward in time a certain number of days, weeks, months, or years but align to a Saturday. An example is below, which goes back 1 month.</p>
<pre><code>select dateadd(dd,(datepart(dw,<span class="hljs-number">7</span>-dateadd(mm,<span class="hljs-number">-1</span>,<span class="hljs-string">'10/21/2018'</span>)),dateadd(mm,<span class="hljs-number">-1</span>,<span class="hljs-string">'10/21/2018'</span>))
</code></pre><p>Here is a simple way to remember how to use the formula. Remember you have to move your date first, then push it to Saturday.</p>
<pre><code>declare @<span class="hljs-keyword">var</span> date

select @<span class="hljs-keyword">var</span>=dateadd(mm,<span class="hljs-number">-1</span>,<span class="hljs-string">'10/2/2018'</span>)

select dateadd(dd,<span class="hljs-number">7</span>-datepart(dw,@<span class="hljs-keyword">var</span>),@<span class="hljs-keyword">var</span>)
</code></pre>]]></content:encoded></item><item><title><![CDATA[Improved data loads from Snowflake to Azure Synapse Analytics]]></title><description><![CDATA[My responsibilities revolve around providing the business with the data they need to make informed business decisions. One of those processes requires us to shift data from a Snowflake data warehouse housed in AWS to an Azure Synapse Analytics data w...]]></description><link>https://josefrichberg.com/improved-data-loads-from-snowflake-to-azure-synapse-analytics</link><guid isPermaLink="true">https://josefrichberg.com/improved-data-loads-from-snowflake-to-azure-synapse-analytics</guid><category><![CDATA[azure-synapse-analytics]]></category><category><![CDATA[snowflake]]></category><category><![CDATA[Azure Data Factory]]></category><category><![CDATA[Azure Pipelines]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Sun, 30 Oct 2022 14:00:42 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/eH_ftJYhaTY/upload/v1667107974889/Iz3ypuQE9.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>My responsibilities revolve around providing the business with the data they need to make informed business decisions. One of those processes requires us to shift data from a Snowflake data warehouse housed in AWS to an Azure Synapse Analytics data warehouse. </p>
<p>We use Azure Data Factory as our ETL system, which has native drivers for Snowflake, but they only work under very limited situations, none of which suited this specific workflow. This meant we needed to use their odbc driver and to do that, we had to create a managed VM to contain the drivers and an Integration Runtime. This enables us to connect Snowflake directly to our Synapse system. Viola!</p>
<p>This setup, however, is not without its own quirks. You see we also use Azure Analysis Services, which requires ODBC, and some of its data sets also reside on Snowflake. This starts to put a significant amount of pressure on the VMs that act as a gateway, so we began looking for alternatives to moving data, to free up the VMs for cube builds only.</p>
<p>Snowflake allows you to export any table as a series of .gzipped files with the <strong>copy into</strong> command. You either <strong>copy into</strong> Snowflake or <strong>copy into</strong> a set of files. To do this required a script that would normally be run on a machine with the ability to connect to Snowflake. There was no easy way to coordinate this. Originally, we decided to have the job run as the end of a nightly process on Snowflake. We extended the job to push the table to a set of files in an Azure connected storage account and then call our event hub to signal the job is complete. That event would trigger a logic app, which would then call a Data Factory Pipeline that would ingest those files using a feature of Synapse called <strong>Polybase</strong>. </p>
<p><strong>Polybase</strong> is the feature that Synapse uses to push/pull files (like the <strong>copy into</strong> command). Woot! We have bypassed the VM, but all was not good. <strong>Polybase</strong> gives incredible performance gains but is very finicky and expects the file to be formatted in some special way before it can ingest it. Even a simple '|' delimited file was copied to a set of staging files and then ingested. This was a dead-end, that is until the new <strong>Script Object</strong> became available in the Azure Data Factory.</p>
<p>Now we could run that exact command in our pipeline, which gave us more fine-grained control. Part 1. Part 2 was sneaky.</p>
<p>I wanted to see just what <strong>Polybase</strong> was doing with these files and it turns out to be something incredibly simple.</p>
<blockquote>
<p><strong>Polybase</strong> is simply replacing whatever delimiter you use in the source file with: <strong>\u2bd1</strong></p>
<p>That seemed simple enough, but it turns out for Snowflake you need to tell it to use:  <strong>\xE2\xaf\x91</strong> as the delimiter.</p>
</blockquote>
<p>This looks script object looks like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1667088829705/VIYkNAhvG.png" alt="polybase script output (1).png" /></p>
<p>The source of the copy object looks like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1667140582381/y24Sb8ne0.png" alt="polybase copy encoding (1).png" />
What type of performance do you get?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1667140598383/FKLIt1QZ9.png" alt="polybase performance.png" /></p>
<p><strong><em>461MB/s</em></strong>
that is 100 times faster than we were getting with an odbc connection through a managed VM. Done! </p>
<p>You can use this trick to improve the ingest speed of any csv file.spee</p>
]]></content:encoded></item><item><title><![CDATA[Presenting at DeveloperWeek Global!]]></title><description><![CDATA[I'm excited to be presenting my talk on Building a complete API in Azure
A quick update: If you are interested in coming to the presentation or anything else DeveloperWeek has to offer, here is a link to a free pass. There is a limited number of them...]]></description><link>https://josefrichberg.com/presenting-at-developerweek-global</link><guid isPermaLink="true">https://josefrichberg.com/presenting-at-developerweek-global</guid><category><![CDATA[Azure]]></category><category><![CDATA[APIs]]></category><category><![CDATA[Azure SQL Database]]></category><category><![CDATA[Azure Functions]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Thu, 20 Oct 2022 22:52:53 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1666306311740/1pk4Jo5Kq.PNG" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I'm excited to be presenting my talk on <a target="_blank" href="https://www.developerweek.com/global/conference/enterprise/microservices-architecture/"><strong>Building a complete API in Azure</strong></a></p>
<p>A quick update: If you are interested in coming to the presentation or anything else DeveloperWeek has to offer, here is a link to a free pass. There is a limited number of them, so first come, first serve: <a target="_blank" href="https://bit.ly/3DgmZ2x">Free Pass</a></p>
]]></content:encoded></item><item><title><![CDATA[Composite Models in PowerBI -- The solution to poor search performance in SSAS]]></title><description><![CDATA[With the release of composite models in PowerBI, I’ve been able to solve a long-standing issue with SQL Server Analysis Services: text search performance. In this article I am going to use our Azure Hyperscale instance (but any Microsoft SQL Server i...]]></description><link>https://josefrichberg.com/composite-models-in-powerbi-the-solution-to-poor-search-performance-in-ssas</link><guid isPermaLink="true">https://josefrichberg.com/composite-models-in-powerbi-the-solution-to-poor-search-performance-in-ssas</guid><category><![CDATA[Azure Analysis Services]]></category><category><![CDATA[Composite Models]]></category><category><![CDATA[Azure SQL Database]]></category><category><![CDATA[PowerBI]]></category><category><![CDATA[Power BI]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Mon, 17 Oct 2022 23:02:29 GMT</pubDate><content:encoded><![CDATA[<p>With the release of composite models in PowerBI, I’ve been able to solve a long-standing issue with SQL Server Analysis Services: text search performance. In this article I am going to use our Azure Hyperscale instance (but any Microsoft SQL Server instance will do) to replace our ‘Filter’ or search tables. I’ll go into some background on how we architect our SSAS tabular models, how to ‘inject’ the new SQL Server search tables, and more importantly the SQL Server specific techniques used to overcome the DAX pushed by PowerBI.</p>
<p>To improve performance in our tabular models we extract the distinct values for various columns the users will be filtering/searching on into their own table in the tabular model. Some of the text-based tables are author names, titles, BISAC codes, and search terms. We will be using search terms in this tutorial.</p>
<p>Unlike many industries where data is locked behind IDs (users, products, locations, etc), in publishing much of the data is locked behind text (words); lots of words. Here are the statistics of the table:</p>
<ul>
<li>87.5 million records 
  Structure: <pre><code>      keyword nvarchar(<span class="hljs-number">400</span>) not <span class="hljs-literal">null</span>
      ID Bigint not <span class="hljs-literal">null</span>
</code></pre></li>
</ul>
<pre><code class="lang-sql"><span class="hljs-keyword">select</span> <span class="hljs-keyword">count</span>(*)
  <span class="hljs-keyword">from</span> searchterms
<span class="hljs-keyword">where</span> <span class="hljs-keyword">CHARINDEX</span>(keyword,<span class="hljs-string">'cookbook'</span>)&gt;=<span class="hljs-number">1</span>
</code></pre>
<p>Now that we have some insight into the data and how the data is being used, let’s look at the technique I use to improve the overall performance. I realized that using Charindex meant that SQL Server wouldn’t be able to use an index to find the records, but I needed to put an index on the ID column to allow us to link back to SSAS. I was curious to see if there was any performance impact in scanning the the index vs a heap table and while clicking through SSMS I saw this gem from long ago.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1666047489550/nILGyndI0.png" alt="ssms+compression+menu.png" /></p>
<p>We are running a Hyperscale Azure SQL Server so storage is not an issue (100TB max database), however I thought this might work well given how SQL Server works; at the page level.</p>
<p>If I want a single record or every record on the page, SQL Server must read that page into memory. This means the only way to improve the performance of searching every page is to reduce the overall number of pages. The only way to do that is to compress the data to fit more into a single page, so we will be trading I/O for CPU. I was optimistic that performance of the compression algorithm and general increase in processor power (over I/O speeds) would provide a net benefit and I was right, sort of. Below is a graph of a quick test I did.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1666047536073/b3ajPipdU.png" alt="composite+results.png" /></p>
<p>As you can see from the chart, the overall time for the queries stayed relative stable, but the number of pages change dramatically. This tells me that the work being done is in the CPU to compare those characters. Adding compression seemed to have no impact on the overall process, but it reduced the I/O load on the server by a factor of 3.5. This is good news as it allows for more queries and while I was hoping for greater speed, the timings here are much faster than those obtained while searching inside of SSAS. This also reduces the load on SSAS by shifting text searching out. </p>
<p>When you wire up SQL Server and SSAS the underlying queries between both parties look just like someone used app filters to pass direct values. Below is an image of what happens after the ISBNs are selected based upon a keyword search in SQL Server. This is as efficient as it can be.</p>
<p>I hope this article has given you some insight into how composite models can connect SQL Server and SSAS to overcome limitations of the product and inspire you to organize your data in a new way.</p>
]]></content:encoded></item><item><title><![CDATA[New Blog Space]]></title><description><![CDATA[I've decided to move my blog here. You'll see some old content being uploaded in the next few days. Look around, I hope you find some tidbits of information.
Older content focused on SQL Server and SSIS. Now I focus on the Azure ecosystem.]]></description><link>https://josefrichberg.com/new-blog-space</link><guid isPermaLink="true">https://josefrichberg.com/new-blog-space</guid><category><![CDATA[Azure]]></category><category><![CDATA[Azure Functions]]></category><category><![CDATA[Azure Data Factory]]></category><category><![CDATA[Azure SQL Database]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Mon, 17 Oct 2022 00:59:27 GMT</pubDate><content:encoded><![CDATA[<p>I've decided to move my blog here. You'll see some old content being uploaded in the next few days. Look around, I hope you find some tidbits of information.</p>
<p>Older content focused on SQL Server and SSIS. Now I focus on the Azure ecosystem.</p>
]]></content:encoded></item><item><title><![CDATA[Using Managed Identities to authenticate Function Apps]]></title><description><![CDATA[I have a video detailing how to use Managed Identities to authenticate function apps in Azure.]]></description><link>https://josefrichberg.com/using-managed-identities-to-authenticate-function-apps</link><guid isPermaLink="true">https://josefrichberg.com/using-managed-identities-to-authenticate-function-apps</guid><category><![CDATA[Azure]]></category><category><![CDATA[Azure Managed Identities]]></category><category><![CDATA[Azure Functions]]></category><dc:creator><![CDATA[Josef Richberg]]></dc:creator><pubDate>Mon, 26 Oct 2020 04:00:00 GMT</pubDate><content:encoded><![CDATA[<p>I have a <a target="_blank" href="https://youtu.be/A4-Qupj5tWc">video</a> detailing how to use Managed Identities to authenticate function apps in Azure.</p>
]]></content:encoded></item></channel></rss>