﻿<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
	<channel>
		<title>Turnerj</title>
		<link>https://turnerj.com/</link>
		<description>aka. James Turner - A programmer and entrepreneur with a love of cars, music and technology.</description>
		<copyright>2026</copyright>
		<managingEditor>James Turner</managingEditor>
		<pubDate>Thu, 21 May 2026 00:34:45 GMT</pubDate>
		<lastBuildDate>Thu, 21 May 2026 00:34:45 GMT</lastBuildDate>
		<item>
			<title>Proxying Rainbow Six LAN for WAN with .NET</title>
			<link>https://turnerj.com/blog/proxying-rainbow-six-lan-for-wan-with-dotnet</link>
			<description>A simple journey of learning, debugging and building my way to play an old game with a friend.</description>
			<enclosure url="https://turnerj.com/blog/images/social/proxying-rainbow-six-lan-for-wan-with-dotnet.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/proxying-rainbow-six-lan-for-wan-with-dotnet</guid>
			<pubDate>Sun, 29 Jan 2023 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;"Go go go!" has been one of those quotes that stuck around with me for a while.
Not because it is some linguistic masterpiece but because of both the frequency and way the line was delivered in Rainbow Six Vegas.
For a game that came out back in 2006, with its sequel out in 2008, it is still pretty enjoyable.
Having a nice mix of stealth and action, you can approach each section of the map from different angles to try different strategies.
I don't think it is Game of the Year material or anything but you can have some good solid fun with it.&lt;/p&gt;
&lt;p&gt;I mainly play the sequel now, Rainbow Six Vegas 2, which has both an online mode and a LAN mode for both match making and a mode called "Terrorist Hunt".
I never played much of the online mode though it doesn't matter now anyway as &lt;a href="https://www.rockpapershotgun.com/ubisoft-explain-why-they-closed-rainbow-six-vegas-2s-servers"&gt;Ubisoft shutdown the online servers&lt;/a&gt; so even if I wanted to, I can't.
LAN mode is where I've been having most of my fun with the game though with times changing with tools like Discord and it being easier to organize playing remotely than in person, it was important to try and get this game working over a VPN too.&lt;/p&gt;
&lt;p&gt;Initially we tried a fairly basic VPN to each other's home network.
We could access each other's systems but the game wouldn't find the running server in the other's network.
Unfortunately the game doesn't have a "Connect to IP" functionality so we couldn't help the game out.
We tried again more recently with &lt;a href="https://tailscale.com/"&gt;Tailscale&lt;/a&gt; as a more simplified way of connecting our networks but to no avail.&lt;/p&gt;
&lt;p&gt;I'm no stranger to &lt;a href="https://turnerj.com/blog/fixing-bf1942-with-win32"&gt;writing code to work around buggy behaviour in old games&lt;/a&gt; and with both my friend and I being programmers, we thought we'd dig a bit into what exactly is the game and server doing for the communication and maybe we could help it along with a bit of code.&lt;/p&gt;
&lt;h2 id="packet-inspection-with-wireshark"&gt;Packet Inspection with Wireshark&lt;/h2&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/proxying-rainbow-six-lan-for-wan-with-dotnet-wireshark.png" alt="Screenshot of Wireshark looking at packets from Rainbow Six Vegas 2"&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.wireshark.org/"&gt;Wireshark&lt;/a&gt; is a great tool for analyzing network traffic.
I only scratch the surface of the functionality of it but even then, it really helps quite a bit with understanding what is going on.&lt;/p&gt;
&lt;p&gt;I figured Rainbow Six Vegas 2 was either listening for a broadcast packet from the server or was perhaps sending one of its own when we refresh the list of servers available.
I would have typically thought it was the former from the point of view that "servers announce themselves to clients" but that isn't the case at all.&lt;/p&gt;
&lt;p&gt;There are 4 packets that serve as part of this handshake between the game (client) and the server:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Client sends a packet to 255.255.255.255 to announce itself&lt;/li&gt;
&lt;li&gt;Server responds back to the announced client&lt;/li&gt;
&lt;li&gt;Client sends some sort of acknowledgement&lt;/li&gt;
&lt;li&gt;Server responds one final time with some acknowledgement&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My friend and I figure it is likely that broadcast packet isn't getting from the client to the game server.&lt;/p&gt;
&lt;p&gt;At this point, I thought the best strategy was to identify the data in the transmission but it didn't take long till I hit roadblocks there.&lt;/p&gt;
&lt;p&gt;Rainbow Six Vegas 2 was built with Unreal Engine 3 and there is some pretty decent documentation available.
Unlike newer versions of the engine, the code itself isn't available without forking over a large amount of money.
What I was looking for specifically was for the wire protocol as I figured the game probably wasn't implementing something custom.&lt;/p&gt;
&lt;p&gt;There is an official page for the &lt;a href="https://docs.unrealengine.com/udk/Three/NetworkingOverview.html"&gt;networking overview of UE3&lt;/a&gt; which includes links to jump to sections about the network driver implementation and the wire protocol but... the links don't work.
The actual content of the page cuts off just before those sections for some reason - guessing it is intentional but kinda frustrating for what I'm wanting to do.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/proxying-rainbow-six-lan-for-wan-with-dotnet-ue3-network-overview.png" alt="Screenshot of UE3 Network Overview page, highlighting the missing Wire Protocol section"&gt;&lt;/p&gt;
&lt;p&gt;I thought I'd instead manually analyze the data to see what specifically is being transferred.&lt;/p&gt;
&lt;p&gt;I captured the first packet, the broadcast from the client to the server, multiple different times which looked like this (each row is a version of the data in the order I tried them).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;34e06d2afb5849af04842c614100
3448fcb5f3598849ec852c614100
34bc213e2e217f5868842c614100
34b8040d9d93566399842c614100
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can see the first byte is the same, the next 9 bytes change each time and the final 8 bytes don't change.
I figured maybe there is a timestamp or maybe game version but even with using &lt;a href="https://imhex.werwolv.net/"&gt;ImHex&lt;/a&gt;, a fantastic tool to help decode binary data, I couldn't really work any of the bytes out.
There's a chance maybe the random data is just a nonce to prevent responses from servers being mixed up.&lt;/p&gt;
&lt;p&gt;Maybe I'd have more luck with the packets so looking at the second packet, the one from the server back to the game, I recorded the bytes from multiple attempts (again, each row is a version of the data).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;36e06d2afb5849af04842c614160a7936d748cea5960fc2198b550a8f5efb22c41d8099a308051013e0418c8a6fe63041800fe01fe010000fe01fe010002c2ccc47064ca646860c468ca72c470c26e60cac6c8ccca62627070c672c6707200a8eae4dccae4d40008000200000000000000000008000000000000009479c51efeffffff03000000000000008051013e9c0100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020000000600000000000000080000000200000002000000ca00000000000000feffffff150000000000000004000000000008000000000002000000000000000000000010000000a8eae4dccae4d40000000000000000000000
3648fcb5f3598849ec852c614160a7936d748cea5960fc2198b550a8f5efb22c41d8099a308051013e0418c8a6fe63041800fe01fe010000fe01fe010002c2ccc47064ca646860c468ca72c470c26e60cac6c8ccca62627070c672c6707200a8eae4dccae4d40008000200000000000000000008000000000000009479c51efeffffff03000000000000008051013e9c0100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020000000600000000000000080000000200000002000000ca00000000000000feffffff150000000000000004000000000008000000000002000000000000000000000010000000a8eae4dccae4d40000000000000000000000
36bc213e2e217f5868842c614160a7936d748cea5960fc2198b550a8f5efb22c41d8099a308051013e0418c8a6fe63041800fe01fe010000fe01fe010002c2ccc47064ca646860c468ca72c470c26e60cac6c8ccca62627070c672c6707200a8eae4dccae4d40008000200000000000000000008000000000000009479c51efeffffff03000000000000008051013e9c0100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020000000600000000000000080000000200000002000000ca00000000000000feffffff150000000000000004000000000008000000000002000000000000000000000010000000a8eae4dccae4d40000000000000000000000
36b8040d9d93566399842c614160a7936d748cea5960fc2198b550a8f5efb22c41d8099a308051013e0418c8a6fe63041800fe01fe010000fe01fe010002c2ccc47064ca646860c468ca72c470c26e60cac6c8ccca62627070c672c6707200a8eae4dccae4d40008000200000000000000000008000000000000009479c51efeffffff03000000000000008051013e9c0100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020000000600000000000000080000000200000002000000ca00000000000000feffffff150000000000000004000000000008000000000002000000000000000000000010000000a8eae4dccae4d40000000000000000000000
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There does actually seem to be a pattern between these responses and the original requests.
The first byte might be different from request and response (maybe to signal "client" vs "server") with the next several bytes matching the original request, backing up my thought about it being a nonce.
The rest of the data I found to be a bit hit-or-miss in decoding it - it is static for each attempt but nothing was actually changing the state of the game server either.&lt;/p&gt;
&lt;p&gt;I was hitting a dead end but thought I'd just look at these final packets between the client and the server.
The client sends the the following in each of my attempts.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0100
0100
0100
0100
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The server then responds in kind.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0101
0101
0101
0101
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It is some sort of acknowledgement but I wasn't going to be able to figure it out but that's when it kinda hit me - I don't need to understand the data, just make sure it gets to the right place.&lt;/p&gt;
&lt;p&gt;Earlier I mentioned I think it is the broadcast packet, that first one from the client looking for servers, that is the problem so if I can get that to the server maybe everything will Just Work�.&lt;/p&gt;
&lt;h2 id="proxying-the-packets"&gt;Proxying the Packets&lt;/h2&gt;
&lt;p&gt;Of the 4 packet exchange, the client broadcasts from a random port to a specific port (45000) and the server will first respond to whatever port sent the broadcast.
The last 2 packets are on a different port (11120) for both sending and receiving.&lt;/p&gt;
&lt;p&gt;To proxy the packets then, I need a program on the client machine to listen on port 45000 for the broadcast and send it directly to the server.
Because the server will respond back to whatever port sent the "broadcast", that means my application will get the request so I need to then respond to the real broadcast on the original port too.&lt;/p&gt;
&lt;p&gt;In .NET, it is relatively simple to work with sockets in a case like this as I'm not really needing to do anything too fancy.
I threw something together in LINQPad to see if the idea would pan out.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var serverAddress = new IPEndPoint(IPAddress.Parse("the-ip-address"), 45000);
using var proxiedServer = new Socket(serverAddress.AddressFamily, SocketType.Dgram, ProtocolType.Udp);

var interceptAddress = new IPEndPoint(IPAddress.Any, 45000);
using var clientIntercept = new Socket(interceptAddress.AddressFamily, SocketType.Dgram, ProtocolType.Udp);
clientIntercept.Bind(interceptAddress);

var buffer = new byte[512];
while (true)
{
	var broadcastIntercept = await clientIntercept.ReceiveFromAsync(buffer, SocketFlags.None, interceptAddress);
	Console.WriteLine("Client Broadcast - {0} Bytes", broadcastIntercept.ReceivedBytes);
	Console.WriteLine(Convert.ToHexString(buffer.AsSpan().Slice(0, broadcastIntercept.ReceivedBytes)));
	await proxiedServer.SendToAsync(buffer.AsMemory().Slice(0, broadcastIntercept.ReceivedBytes), serverAddress);
	Console.WriteLine("Forwarded to Server");
	var serverResponse = await proxiedServer.ReceiveFromAsync(buffer, serverAddress);
	Console.WriteLine("Server Response - {0} Bytes", serverResponse.ReceivedBytes);
	Console.WriteLine(Convert.ToHexString(buffer.AsSpan().Slice(0, serverResponse.ReceivedBytes)));
	await clientIntercept.SendToAsync(buffer.AsMemory().Slice(0, serverResponse.ReceivedBytes), broadcastIntercept.RemoteEndPoint);
	Console.WriteLine("Forwarded to Client");
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And, well, I think this screenshot says everything...&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/proxying-rainbow-six-lan-for-wan-with-dotnet-working-proxy.jpg" alt="Screenshot of Rainbow Six Vegas 2 with my character looking at another instance of my character"&gt;&lt;/p&gt;
&lt;p&gt;This test was done from my laptop, going through a mobile hotspot, to my desktop via Tailscale.&lt;/p&gt;
&lt;p&gt;From what I can tell then, simply getting the initial two packets between the game client and the server allowed the game's net code to take over from there.
I didn't check whether those final two packets actually got to the right spot as I was happy enough the game worked.&lt;/p&gt;
&lt;p&gt;The only thing missing, and what could be related to those two packets, was that there was no ping being displayed in the game's server listing.
That didn't matter to me though, the game worked and I could play.&lt;/p&gt;
&lt;h2 id="wrapping-up"&gt;Wrapping Up&lt;/h2&gt;
&lt;p&gt;I was both surprised and confused it was "that easy" given my diving into the packets themselves.
For all I knew, I'd need to proxy all the game's networking packets but that wasn't the case at all - just those first 2 packets was enough to let the game do its thing.
It seems to trick the game enough to think the client/server are local when they actually aren't.&lt;/p&gt;
&lt;p&gt;I packaged up the code &lt;a href="https://github.com/Turnerj/NetworkedVegas2"&gt;and published it to GitHub as "Networked Vegas 2"&lt;/a&gt;, a simple tool to get that broadcast packet to a specific server.
Right now I'm assuming this problem I hit might be unique to Rainbow Six Vegas 2 however in the released executable, it supports specifying a custom port in case it is different for other games.&lt;/p&gt;
&lt;p&gt;I'm not intending to support other games but who knows, maybe this specific client/server communication is common for UE3 games and might help others play their LAN games over the internet.&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>Ref-structs are technically [Obsolete]</title>
			<link>https://turnerj.com/blog/ref-structs-are-technically-obsolete</link>
			<description>How one very useful C# feature has a hidden quirk that tripped up my code coverage reports.</description>
			<enclosure url="https://turnerj.com/blog/images/social/ref-structs-are-technically-obsolete.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/ref-structs-are-technically-obsolete</guid>
			<pubDate>Mon, 19 Dec 2022 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;&lt;small&gt;TL;DR: &lt;a href="https://turnerj.com/#why-are-ref-structs-obsolete"&gt;the &lt;code&gt;Obsolete&lt;/code&gt; attribute on ref-structs&lt;/a&gt; tripped up my code coverage reports&lt;/small&gt;&lt;/p&gt;
&lt;p&gt;It might not be the most recent feature to C#, coming in C# 7.2, but &lt;a href="https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/ref-struct"&gt;ref-struct&lt;/a&gt; support is awesome.
For those that don't know, a ref-struct is a type that is allocated on the stack and can't escape to the managed heap.
The most ubiquitous form of this is &lt;code&gt;Span&lt;/code&gt; - a powerful type that can help avoid allocations in a number of scenarios and provide a consistent way to access different types of sequential data.&lt;/p&gt;
&lt;p&gt;The restriction of not escaping to the managed heap means that you can't store ref-structs like &lt;code&gt;Span&lt;/code&gt; on a class or even a normal struct.
If you want to pass around a ref-struct as a member of another type, that type will itself need to be a ref-struct.&lt;/p&gt;
&lt;p&gt;I'm a heavy &lt;code&gt;Span&lt;/code&gt; user, particular for my work with tokenizers and other parsing techniques.
One design pattern I've used with tokenizers before is to hold the &lt;code&gt;Span&lt;/code&gt; on my type so with that in mind, my tokenizers might also be a ref-struct.&lt;/p&gt;
&lt;p&gt;Here is a basic example of what one might look like:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;ref struct CustomReader
{
    private const byte EndOfFile = 0x00;

    public readonly ReadOnlySpan&amp;lt;byte&amp;gt; Data;
    private int Offset;

    public CustomReader(ReadOnlySpan&amp;lt;byte&amp;gt; data)
    {
        Data = data;
        Offset = 0;
    }

    public bool MoveNext(out Token token)
    {
        if (Offset + 1 &amp;gt;= Data.Length)
        {
            token = default;
            return false;
        }

        token = Data[Offset] switch
        {
            (byte)' ' or (byte)'\t' =&amp;gt; ReadWhitespace(),
            (byte)'#' =&amp;gt; ReadComment(),
            _ =&amp;gt; ReadValue()
        };
        return true;
    }

    private byte Current
    {
        get
        {
            if (Offset &amp;lt; Data.Length)
            {
                return Data[Offset];
            }
            return EndOfFile;
        }
    }

    private Token ReadWhitespace()
    {
        var startIndex = Offset;
        Offset++;
        while (true)
        {
            switch (Current)
            {
                case (byte)' ':
                case (byte)'\t':
                    Offset++;
                    continue;
                default:
                    return new Token(TokenType.Whitespace, Data[startIndex..Offset]);
            }
        }
    }

    private Token ReadComment()
    {
        var startIndex = Offset;
        Offset++;
        var newLineIndex = Data[Offset..].IndexOfAny((byte)'\r', (byte)'\n');
        if (newLineIndex == -1)
        {
            Offset = Data.Length;
        }
        else
        {
            Offset += newLineIndex;
        }
        return new Token(TokenType.Comment, Data[startIndex..Offset]);
    }

    private Token ReadValue()
    {
        var startIndex = Offset;
        Offset++;
        while (true)
        {
            switch (Current)
            {
                case (byte)' ':
                case (byte)'\t':
                case (byte)'#':
                case EndOfFile:
                    return new Token(TokenType.Value, Data[startIndex..Offset]);
                default:
                    Offset++;
                    break;
            }
        }
    }
}

readonly ref struct Token
{
    public readonly TokenType TokenType;
    public readonly ReadOnlySpan&amp;lt;byte&amp;gt; Value;

    public Token(TokenType tokenType, ReadOnlySpan&amp;lt;byte&amp;gt; value)
    {
        TokenType = tokenType;
        Value = value;
    }
}

enum TokenType
{
    Whitespace,
    Comment,
    Value
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I can then use a tokenizer like this fairly simply like so:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;using System.Text;

var reader = new CustomReader("Hello World  # Test Comment"u8);

while (reader.MoveNext(out var token))
{
    Console.WriteLine($"{token.TokenType}: \"{Encoding.UTF8.GetString(token.Value)}\"");
}

// Output:
//  Value: "Hello"
//  Whitespace: " "
//  Value: "World"
//  Whitespace: "  "
//  Comment: "# Test Comment"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While this tokenizer is pretty simple, once you have more token types or obscure optimizations, things can get pretty complex pretty fast.
I'd typically write a bunch of unit tests for such a tokenizer and leverage code coverage reports to ensure I've got good coverage across the code.
For &lt;a href="https://github.com/Turnerj"&gt;my open source libraries&lt;/a&gt;, I have a very similar set of rules about what to include and exclude from coverage.&lt;/p&gt;
&lt;p&gt;Here is a typical &lt;code&gt;CodeCoverage.runsettings&lt;/code&gt; file I might have:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-xml"&gt;&amp;lt;?xml version="1.0" encoding="utf-8"?&amp;gt;  
&amp;lt;RunSettings&amp;gt;  
  &amp;lt;DataCollectionRunSettings&amp;gt;  
    &amp;lt;DataCollectors&amp;gt;
      &amp;lt;DataCollector friendlyName="XPlat code coverage"&amp;gt;
        &amp;lt;Configuration&amp;gt;
          &amp;lt;Format&amp;gt;cobertura&amp;lt;/Format&amp;gt;
          &amp;lt;Exclude&amp;gt;[MongoFramework.Tests]*&amp;lt;/Exclude&amp;gt;
          &amp;lt;Include&amp;gt;[MongoFramework]*,[MongoFramework.*]*&amp;lt;/Include&amp;gt;
          &amp;lt;ExcludeByAttribute&amp;gt;Obsolete,GeneratedCodeAttribute,CompilerGeneratedAttribute&amp;lt;/ExcludeByAttribute&amp;gt;
          &amp;lt;UseSourceLink&amp;gt;true&amp;lt;/UseSourceLink&amp;gt;
          &amp;lt;SkipAutoProps&amp;gt;true&amp;lt;/SkipAutoProps&amp;gt;
        &amp;lt;/Configuration&amp;gt;
      &amp;lt;/DataCollector&amp;gt;
    &amp;lt;/DataCollectors&amp;gt;  
  &amp;lt;/DataCollectionRunSettings&amp;gt;  
&amp;lt;/RunSettings&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One of the specific rules I have is excluding code marked with an &lt;code&gt;Obsolete&lt;/code&gt; attribute.
I find this useful as if I've got to the point where I mark some code as obsolete, I don't really care about its coverage anymore.
The key part here though is &lt;strong&gt;I&lt;/strong&gt; mark the code as obsolete which is what brings me back to ref-structs.&lt;/p&gt;
&lt;p&gt;When I was writing a particular tokenizer, very similar to the one above, I had the &lt;code&gt;Obsolete&lt;/code&gt; attribute excluded from my code coverage reports.
What I found though was my entire tokenizer was being excluded too which didn't make sense to me.
Clearly my tokenizer isn't compiler generated, it wasn't using auto-props and wasn't in the wrong namespace or assembly - something else was tripping up the report.&lt;/p&gt;
&lt;p&gt;I couldn't understand why but I had the thought of decompiling the source with &lt;a href="https://www.jetbrains.com/decompiler/"&gt;JetBrain's dotPeek&lt;/a&gt;.
I've used this tool a bunch for debugging some weird compilation issues before that I've had and thought it would give me some insight and it didn't disappoint.&lt;/p&gt;
&lt;p&gt;Here is what my tokenizer looked like:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;[IsByRefLike]
[Obsolete("Types with embedded references are not supported in this version of your compiler.", true)]
[CompilerFeatureRequired("RefStructs")]
internal ref struct CustomReader
{
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I definitely didn't put that &lt;code&gt;Obsolete&lt;/code&gt; attribute there so what is going on?&lt;/p&gt;
&lt;h2 id="why-are-ref-structs-obsolete"&gt;Why are ref-structs [Obsolete]?&lt;/h2&gt;
&lt;p&gt;The message in the &lt;code&gt;Obsolete&lt;/code&gt; attribute probably gives it away - it is a measure for older compilers that don't know the rules around handling ref-structs.&lt;/p&gt;
&lt;p&gt;Microsoft does &lt;a href="https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-7.2/span-safety#metadata-representation-of-ref-like-structs"&gt;explain why in the docs&lt;/a&gt; if you know where to look:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Having no other good alternatives that work in old compilers without servicing, an &lt;code&gt;Obsolete&lt;/code&gt; attribute with a known string will be added to all ref-like structs.
Compilers that know how to use ref-like types will ignore this particular form of &lt;code&gt;Obsolete&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;NOTE: it is not the goal to make it so that any use of ref-like types on old compilers fails 100%.
That is hard to achieve and is not strictly necessary. For example there would always be a way to get around the &lt;code&gt;Obsolete&lt;/code&gt; using dynamic code or, for example, creating an array of ref-like types through reflection.&lt;/p&gt;
&lt;p&gt;In particular, if user wants to actually put an &lt;code&gt;Obsolete&lt;/code&gt; or &lt;code&gt;Deprecated&lt;/code&gt; attribute on a ref-like type, we will have no choice other than not emitting the predefined one since &lt;code&gt;Obsolete&lt;/code&gt; attribute cannot be applied more than once.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;While this was definitely a surprise to find out, I also think it is actually pretty clever.
Being able to use something that older compilers would already know about to have them fail when using something they don't understand is smart.
Also that the docs call out the scenario if you wanted to put your own &lt;code&gt;Obsolete&lt;/code&gt; attribute on the type is good - that would have been another confusing issue if you couldn't add your own &lt;code&gt;Obsolete&lt;/code&gt; attribute.&lt;/p&gt;
&lt;p&gt;Unfortunately even knowing this information, I'm relatively limited in what I can do for my code coverage problem - if I want a ref-struct in my code coverage reports, I can't exclude &lt;code&gt;Obsolete&lt;/code&gt; attributes anymore.
I've got &lt;a href="https://github.com/coverlet-coverage/coverlet/issues/1204"&gt;an open issue for Coverlet&lt;/a&gt; for trying to work around it that way, perhaps looking at the specific message on the attribute.&lt;/p&gt;
&lt;p&gt;Alternatively, I can just not have my tokenizer as a ref-struct and instead of holding a &lt;code&gt;ReadOnlySpan&lt;/code&gt;, it holds a &lt;code&gt;ReadOnlyMemory&lt;/code&gt;, which doesn't have the same restrictions.
This is the approach I've done on &lt;a href="https://github.com/TurnerSoftware/RobotsExclusionTools/blob/afb5661e1efe08d373e9f93c27c6611e80debc91/src/TurnerSoftware.RobotsExclusionTools/Tokenization/RobotsFileTokenReader.cs"&gt;a recent tokenizer I wrote for my Robots Exclusion Tools library&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Anyway, if you find yourself scratching your head as to why your custom ref-struct isn't in a code coverage report, it might just be because ref-structs are technically obsolete.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This post is part of the &lt;a href="https://csadvent.christmas/"&gt;2022 C# advent calendar&lt;/a&gt;.
Check out the other articles posted there as part of the event.&lt;/p&gt;
&lt;/blockquote&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>The pain points of C# source generators: February 2022 Update</title>
			<link>https://turnerj.com/blog/csharp-source-generator-pain-points-february-2022-update</link>
			<description>Ten months on from my original post, a quick update on things.</description>
			<enclosure url="https://turnerj.com/blog/images/social/csharp-source-generator-pain-points-february-2022-update.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/csharp-source-generator-pain-points-february-2022-update</guid>
			<pubDate>Mon, 21 Feb 2022 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;This is an update to my previous post about &lt;a href="https://turnerj.com/blog/the-pain-points-of-csharp-source-generators"&gt;the pain points of C# source generators&lt;/a&gt;.
Since writing about it in April 2021, there has been a bit of progress.&lt;/p&gt;
&lt;h2 id="debugging-source-generators"&gt;Debugging Source Generators&lt;/h2&gt;
&lt;p&gt;This has gotten &lt;strong&gt;significantly&lt;/strong&gt; easier in Visual Studio (&lt;a href="https://docs.microsoft.com/en-us/visualstudio/releases/2019/release-notes-v16.10#NETProductivity"&gt;specifically from v16.10&lt;/a&gt;), at least in my usecases.
Previously I said that I wanted to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Put a breakpoint in the source generator code&lt;/li&gt;
&lt;li&gt;Press the "Debug" button in Visual Studio&lt;/li&gt;
&lt;li&gt;Code stops at the breakpoint&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thanks to some updates in Visual Studio, you can do this!
To start, add the following to your project file for the source generator:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-xml"&gt;&amp;lt;IsRoslynComponent&amp;gt;true&amp;lt;/IsRoslynComponent&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After this, you need to enable the "Roslyn Component" option in the project properties and select the appropriate target.
For me with Schema.NET, it effectively generated the following "launchSettings.json" file in the "Properties" folder:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-json"&gt;{
  "profiles": {
    "Schema.NET.Tool": {
      "commandName": "DebugRoslynComponent",
      "targetProject": "..\\..\\Source\\Schema.NET\\Schema.NET.csproj"
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then, set the source generator project as the startup project.
Start the debugger and watch any breakpoints you have configured get hit.
I've had this successfully working with &lt;a href="https://github.com/RehanSaeed/Schema.NET"&gt;Schema.NET&lt;/a&gt; and the source generator I wrote for it.&lt;/p&gt;
&lt;p&gt;For me, debugging is practically a solved problem with source generators now.&lt;/p&gt;
&lt;h2 id="transient-dependencies"&gt;Transient Dependencies&lt;/h2&gt;
&lt;p&gt;The short answer is, transient dependencies are basically just as painful for my usecase as they were originally.
The issue seems to stem from how analyzers work with the build process - I'm no expert in this area though is something I've been looking a little into to see if I could help push this forwards.&lt;/p&gt;
&lt;p&gt;I previously &lt;a href="https://github.com/dotnet/roslyn/issues/52017"&gt;raised this in the Roslyn repo&lt;/a&gt; through the issues I encountered making transient dependencies work for Schema.NET.
Since then, I've &lt;a href="https://github.com/dotnet/sdk/issues/17775"&gt;raised a more direct issue in the SDK repo&lt;/a&gt; for tracking the support for automatic transient dependency packaging.
Progress has mostly stalled on that issue too as it seems there may be a number of blocker issues (due to the analyzer work I mentioned).&lt;/p&gt;
&lt;p&gt;My previous post had the following as a potential automatic transient dependency solution:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-xml"&gt;&amp;lt;PropertyGroup&amp;gt;
    &amp;lt;GetTargetPathDependsOn&amp;gt;$(GetTargetPathDependsOn);GetDependencyTargetPaths&amp;lt;/GetTargetPathDependsOn&amp;gt;
&amp;lt;/PropertyGroup&amp;gt;

&amp;lt;Target Name="GetDependencyTargetPaths" AfterTargets="ResolvePackageDependenciesForBuild"&amp;gt;
    &amp;lt;ItemGroup&amp;gt;
        &amp;lt;TargetPathWithTargetPlatformMoniker Include="@(ResolvedCompileFileDefinitions)" IncludeRuntimeDependency="false" /&amp;gt;
    &amp;lt;/ItemGroup&amp;gt;
&amp;lt;/Target&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This had two issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ResolveCompileFileDefinitions&lt;/code&gt; wasn't always available&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ResolveCompileFileDefinitions&lt;/code&gt; contains &lt;em&gt;more&lt;/em&gt; than the specific dependencies we are wanting&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On GitHub, &lt;a href="https://github.com/dotnet/sdk/issues/17775#issuecomment-848451355"&gt;@ericstj&lt;/a&gt; mentioned:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add &lt;code&gt;DependsOnTargets="ResolveReferences"&lt;/code&gt; to fix the first problem&lt;/li&gt;
&lt;li&gt;Setting &lt;code&gt;CopyLocalLockFileAssemblies=true&lt;/code&gt; and using &lt;code&gt;ReferenceCopyLocalPaths&lt;/code&gt; item instead to fix the second&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Another developer on GitHub, &lt;a href="https://github.com/dotnet/sdk/issues/17775#issuecomment-1046225146"&gt;@HavenDV&lt;/a&gt;, has a different alteration on my solution that works for NuGet packages:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-xml"&gt;&amp;lt;!-- 
    https://github.com/dotnet/roslyn/issues/52017#issuecomment-1046216200
    This automatically adds explicit and transient dependencies so that they are available at the time the generator is executed. 
--&amp;gt;
&amp;lt;Target Name="AddGenerationTimeReferences" AfterTargets="ResolvePackageDependenciesForBuild"&amp;gt;
    &amp;lt;ItemGroup&amp;gt;
        &amp;lt;None Include="@(ResolvedCompileFileDefinitions)" Pack="true" PackagePath="analyzers/dotnet/cs" /&amp;gt;
    &amp;lt;/ItemGroup&amp;gt;
&amp;lt;/Target&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you are wanting to experiment further with automatic transient dependencies, you can look into how those help the situation.&lt;/p&gt;
&lt;p&gt;So far I haven't seen any updates about whether fixes to transient dependencies will make it in the .NET 7 SDK but one can hope!&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>Fixing my BF1942 woes with Win32 APIs</title>
			<link>https://turnerj.com/blog/fixing-bf1942-with-win32</link>
			<description>Fullscreen didn't work and window mode was bugged. Thought I'd try programming a solution - mix things up a little.</description>
			<enclosure url="https://turnerj.com/blog/images/social/fixing-bf1942-with-win32.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/fixing-bf1942-with-win32</guid>
			<pubDate>Fri, 21 Jan 2022 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;Battlefield 1942 was one of a number of great games I played when I was growing up.
I was first introduced to the game through a demo disc on a magazine - it was the expansion "Secret Weapons of WWII".
I spent many hours playing that demo and eventually managed to snag the complete set of the game with both expansions.
It was game that ran pretty well on old hardware and have a lot of fun memories playing it.&lt;/p&gt;
&lt;p&gt;Jumping forward &lt;em&gt;several&lt;/em&gt; years, I wanted to give the old game another play.
I knew the graphics wouldn't hold up but to play in its nostalgic sandbox would more than overcome that.
I installed the game and the various patches, modified configuration files to use my monitor's native resolution (1080p) and tried to launch it.
Unfortunately, it wouldn't launch at all.&lt;/p&gt;
&lt;p&gt;Some people online suggest it was issues with SecureROM as &lt;a href="https://www.rockpapershotgun.com/windows-10-safedisc-securom-drm"&gt;that no longer works with Windows 10&lt;/a&gt; while others suggest it is an issue with Direct Play.
What ultimately solved it for me was an unofficial patch by a group called &lt;a href="https://team-simple.org/"&gt;Team Simple&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now with the game launching, I start to hear &lt;a href="https://www.youtube.com/watch?v=kaopMpvMZbg"&gt;the classic BF1942 main menu music&lt;/a&gt;.
Sped through the profile setup, jumped to instant battle, picked my favourite level (Hellendoorn) and pressed start.
The music changes to, in my opinion, &lt;a href="https://www.youtube.com/watch?v=IPMnEmkoPFs"&gt;an even more iconic piece when the level is loading&lt;/a&gt;.
The progress bar moves maybe a quarter of the way and then... I'm back on my desktop.&lt;/p&gt;
&lt;p&gt;After going back and forth with settings and the resolution changes I made, I found it just didn't want to work in full screen.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/fixing-bf1942-initial-window-mode.jpg" alt="Battlefield 1942 in Window mode"&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For the sharp-eyed individuals, you might notice that this is actually playing through &lt;a href="https://parsec.app/"&gt;Parsec&lt;/a&gt; - this is part of &lt;a href="https://turnerj.com/blog/remote-desktop-experience-part-1-planning"&gt;my vision for a remote desktop experience&lt;/a&gt;.
It actually plays great (no noticeable input lag) via Parsec, over Wi-Fi, from my desktop to my laptop.
That said, I did try the game directly on that machine and it still crashed so something else was to blame.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;While I could play in window mode, I couldn't actually move the window so I could see the whole screen.
Anytime I attempted to drag the window, it would just bring the cursor back in the game.
Tried shortcuts to maximise the window but none of those worked either.&lt;/p&gt;
&lt;p&gt;So what would any good programmer do? &lt;del&gt;Search for an existing solution online.&lt;/del&gt; Write their own program to fix it!&lt;/p&gt;
&lt;p&gt;Seemed like the fun thing to do anyway.&lt;/p&gt;
&lt;h2 id="the-plan"&gt;The Plan&lt;/h2&gt;
&lt;p&gt;This is what I wanted to do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Get rid of the game's window border as it was just taking space&lt;/li&gt;
&lt;li&gt;Position the window so it is centered to the monitor&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The second point is important - the menu displays at 800x600 but the game when loading and playing is at whatever resolution I configured in window mode.
My plan was to build a launcher that would bootstrap the main game.&lt;/p&gt;
&lt;p&gt;I've messed around with removing borders from applications years back when I wanted to run a console application in the background.
The way I achieved it back then was to invoke Win32 APIs from .NET and figured that would be a good starting place.
My initial task was to find the APIs I need to use.&lt;/p&gt;
&lt;p&gt;Fortunately &lt;a href="https://github.com/dtgDTGdtg/SRWE/blob/b439859e15ca44b6c4715fdb015c321a49ef634a/SRWE/Window.cs"&gt;someone already found the APIs to use to remove borders and reposition the window&lt;/a&gt;.
My job then was to work out how to bring that into my application.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;//A snippet of the code that helped me with the Win32 APIs from the Simple Runtime Window Editor (SRWE)
//Source: https://github.com/dtgDTGdtg/SRWE/blob/b439859e15ca44b6c4715fdb015c321a49ef634a/SRWE/Window.cs
public void RemoveBorders()
{
	uint nStyle = (uint)WinAPI.GetWindowLong(m_hWnd, WinAPI.GWL_STYLE);
	nStyle = (nStyle | (WinAPI.WS_THICKFRAME + WinAPI.WS_DLGFRAME + WinAPI.WS_BORDER)) ^ (WinAPI.WS_THICKFRAME + WinAPI.WS_DLGFRAME + WinAPI.WS_BORDER);
	WinAPI.SetWindowLong(m_hWnd, WinAPI.GWL_STYLE, nStyle);

	nStyle = (uint)WinAPI.GetWindowLong(m_hWnd, WinAPI.GWL_EXSTYLE);
	nStyle = (nStyle | (WinAPI.WS_EX_DLGMODALFRAME + WinAPI.WS_EX_WINDOWEDGE + WinAPI.WS_EX_CLIENTEDGE + WinAPI.WS_EX_STATICEDGE)) ^ (WinAPI.WS_EX_DLGMODALFRAME + WinAPI.WS_EX_WINDOWEDGE + WinAPI.WS_EX_CLIENTEDGE + WinAPI.WS_EX_STATICEDGE);
	WinAPI.SetWindowLong(m_hWnd, WinAPI.GWL_EXSTYLE, nStyle);

	uint uFlags = WinAPI.SWP_NOSIZE | WinAPI.SWP_NOMOVE | WinAPI.SWP_NOZORDER | WinAPI.SWP_NOACTIVATE | WinAPI.SWP_NOOWNERZORDER | WinAPI.SWP_NOSENDCHANGING | WinAPI.SWP_FRAMECHANGED;
	WinAPI.SetWindowPos(m_hWnd, 0, 0, 0, 0, 0, uFlags);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I put together bits and pieces from that codebase like the remove border and window positioning code and combined it with additional API calls for monitor information.
I needed the following Win32 APIs to do everything I wanted:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-getwindowinfo"&gt;GetWindowInfo&lt;/a&gt; (for the current window bounds)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-MonitorFromWindow"&gt;MonitorFromWindow&lt;/a&gt; (getting the monitor handle the window is on)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-GetMonitorInfoW"&gt;GetMonitorInfoW&lt;/a&gt; (getting the monitor bounds)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-SetWindowPos"&gt;SetWindowPos&lt;/a&gt; (setting the window position)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-SendMessageW"&gt;SendMessageW&lt;/a&gt; (to send &lt;code&gt;WM_EXITSIZEMOVE&lt;/code&gt; to the window, a tip from &lt;a href="https://github.com/dtgDTGdtg/SRWE#exitsizemove"&gt;SRWE&lt;/a&gt; - never checked if it was strictly necessary for BF1942 though)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-getwindowlongw"&gt;GetWindowLongW&lt;/a&gt; (getting window settings)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-setwindowlongw"&gt;SetWindowLongW&lt;/a&gt; (updating window settings)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After doing a rough integration with pieces from that codebase, I gave it a run and... my application crashed.
I was using &lt;code&gt;process.MainWindowHandle&lt;/code&gt; to get BF1942's game window.
Turns out that it isn't set till, well, there is a main window available.
So I wrote some code to wait for that and bingo - it launched the game and worked!&lt;/p&gt;
&lt;p&gt;Well, it mostly worked - see BF1942 has an interesting quirk where it launches a new process when you end a match and go back to the main menu.
This required me to write logic to track when processes changed while also still allowing it to exit when BF1942 is closed properly.&lt;/p&gt;
&lt;h2 id="improving-the-win32-apis"&gt;Improving the Win32 APIs&lt;/h2&gt;
&lt;p&gt;While my prototype worked, cobbled together from bits of SRWE and my own bits, I wasn't entirely happy with how I integrated the Win32 APIs.
Below is the snippet of code I had that takes a window handle, gets the window's size, the monitor's size, calculates the position and finally sets it.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;unsafe static void UpdateWindowPosition(int handle)
{
	var info = new WINDOWINFO();
	var success = WinAPI.GetWindowInfo(handle, ref info);
	if (success)
	{
		var windowDimensions = info.rcWindow;
		var monitorHandle = WinAPI.MonitorFromWindow(handle, 0);
		var monitorInfo = new LPMONITORINFO
		{
			cbSize = (uint)sizeof(LPMONITORINFO)
		};
		WinAPI.GetMonitorInfoA(monitorHandle, ref monitorInfo);
		var monitorDimensions = monitorInfo.rcMonitor;
		var x = monitorDimensions.Width / 2 - windowDimensions.Width / 2;
		var y = monitorDimensions.Height / 2 - windowDimensions.Height / 2;
		SetPosition(handle, x, y);
	}
}

static void SetPosition(int handle, int x, int y)
{
	uint uFlags = WinAPI.SWP_NOSIZE | WinAPI.SWP_NOZORDER | WinAPI.SWP_NOACTIVATE | WinAPI.SWP_NOOWNERZORDER | WinAPI.SWP_NOSENDCHANGING | WinAPI.SWP_FRAMECHANGED;
	WinAPI.SetWindowPos(handle, WinAPI.HWND_TOPMOST, x, y, 0, 0, uFlags);
	WinAPI.SendMessage(handle, WinAPI.WM_EXITSIZEMOVE, 0, 0);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What I would prefer is to actually have it feel more like a typical .NET API, something more like:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;static void UpdateWindowPosition(Window window)
{
	var monitorBounds = window.GetCurrentMonitor().GetBounds();
	var windowBounds = window.GetBounds();
	var x = monitorBounds.Width / 2 - windowBounds.Width / 2;
	var y = monitorBounds.Height / 2 - windowBounds.Height / 2;
	window.SetPosition(x, y);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;All I'm doing is abstracting away the Win32 APIs but it makes my "business logic" here far cleaner.
While doing this, I also decided to remove the pieces of SRWE and replace it with a more maintainable interface to the APIs.&lt;/p&gt;
&lt;p&gt;I tried out both &lt;a href="https://github.com/terrafx/terrafx.interop.windows"&gt;TerraFX.Interop.Windows&lt;/a&gt; and &lt;a href="https://github.com/microsoft/CsWin32"&gt;CsWin32&lt;/a&gt;, ultimately settling on the latter.
CsWin32 was a little less intimidating as the API is generated based on strings in a text file rather than containing everything at once.
Also I like jumping to definition of types to read more and explore APIs etc and doing that to one of the types in the TerraFX library crashed Visual Studio.
That's more of a VS problem than a TerraFX library but still - CsWin32 would work great for what I'm doing.&lt;/p&gt;
&lt;p&gt;The way I went about achieving my desired interface to the Win32 APIs I needed was via creating record-struct wrappers around the various native handles and having instance methods wrap the API calls themselves.
For example, below is my &lt;code&gt;Window&lt;/code&gt; type that I have most of my functionality hanging off of.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;public readonly record struct Window(nint Handle)
{
	private HWND Win32Handle =&amp;gt; new(Handle);

	public Monitor GetCurrentMonitor()
	{
		nint handle = PInvoke.MonitorFromWindow(Win32Handle, 0);
		return new(handle);
	}

	public void SetPosition(int x, int y)
	{
		var flags = SET_WINDOW_POS_FLAGS.SWP_NOSIZE | SET_WINDOW_POS_FLAGS.SWP_NOZORDER | SET_WINDOW_POS_FLAGS.SWP_NOACTIVATE |
			SET_WINDOW_POS_FLAGS.SWP_NOOWNERZORDER | SET_WINDOW_POS_FLAGS.SWP_NOSENDCHANGING | SET_WINDOW_POS_FLAGS.SWP_FRAMECHANGED;
		PInvoke.SetWindowPos(Win32Handle, PInvoke.HWND_TOPMOST, x, y, 0, 0, flags);
		PInvoke.SendMessage(Win32Handle, PInvoke.WM_EXITSIZEMOVE, default, default);
	}

	public Rectangle GetBounds()
	{
		var windowInfo = new WINDOWINFO();
		PInvoke.GetWindowInfo(Win32Handle, ref windowInfo);
		return Rectangle.From(windowInfo.rcWindow);
	}

	public void RemoveBorders()
	{
		var style = PInvoke.GetWindowLong(Win32Handle, WINDOW_LONG_PTR_INDEX.GWL_STYLE);
		style &amp;amp;= ~(int)(WINDOW_STYLE.WS_THICKFRAME | WINDOW_STYLE.WS_DLGFRAME | WINDOW_STYLE.WS_BORDER);
		_ = PInvoke.SetWindowLong(Win32Handle, WINDOW_LONG_PTR_INDEX.GWL_STYLE, style);

		style = PInvoke.GetWindowLong(Win32Handle, WINDOW_LONG_PTR_INDEX.GWL_EXSTYLE);
		style &amp;amp;= ~(int)(WINDOW_EX_STYLE.WS_EX_DLGMODALFRAME | WINDOW_EX_STYLE.WS_EX_WINDOWEDGE | WINDOW_EX_STYLE.WS_EX_CLIENTEDGE | WINDOW_EX_STYLE.WS_EX_STATICEDGE);
		_ = PInvoke.SetWindowLong(Win32Handle, WINDOW_LONG_PTR_INDEX.GWL_EXSTYLE, style);
		PInvoke.SendMessage(Win32Handle, PInvoke.WM_EXITSIZEMOVE, default, default);
	}
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="the-end-result"&gt;The End Result&lt;/h2&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/fixing-bf1942-menu-window.jpg" alt="BF1942 in a borderless window for the intro cinematics"&gt;
&lt;img src="https://turnerj.com/images/fixing-bf1942-borderless-fullscreen.jpg" alt="BF1942 in a borderless window at fullscreen"&gt;&lt;/p&gt;
&lt;p&gt;I called my project &lt;em&gt;Borderless 1942&lt;/em&gt; and &lt;a href="https://github.com/Turnerj/Borderless1942"&gt;is available on GitHub&lt;/a&gt;.
It is a self-contained, single-file .NET 6 application.
Because it is self-contained, you don't need .NET 6 installed to run it.&lt;/p&gt;
&lt;p&gt;I'm quite happy that I got this working and could enjoy the game again.
In terms of the code, the main thing I'd want to change is to move from a constant loop resetting the window position to something that listens on window resize events.
This is possible via the Win32 APIs but has its own complications which I haven't got around to addressing yet.&lt;/p&gt;
&lt;p&gt;I'm also looking at turning the style of wrapper I wrote into a dedicated library.
There seems to be &lt;a href="https://www.reddit.com/r/dotnet/comments/r1pz3x/would_you_wantuse_an_improved_interface_to_native/"&gt;some interest&lt;/a&gt; in improved access to the Win32 APIs.
I don't know how far I'd go with it (what Win32 APIs I'd support) but I think it could make this a lot easier for developers in certain situations.&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>Fun with Flags, Enums and Bit Shifting</title>
			<link>https://turnerj.com/blog/fun-with-flags-enums-and-bit-shifting</link>
			<description>Vexillology and bit shifting are not talked about together - until now.</description>
			<enclosure url="https://turnerj.com/blog/images/social/fun-with-flags-enums-and-bit-shifting.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/fun-with-flags-enums-and-bit-shifting</guid>
			<pubDate>Thu, 02 Dec 2021 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;While thinking of posts to write about, the title "Fun with Flags" came to mind &lt;a href="https://www.youtube.com/watch?v=Xl12Sp1KiEk"&gt;from a certain TV show&lt;/a&gt; and I wondered how I might connect that to programming.
There are &lt;em&gt;enum flags&lt;/em&gt; in C# via the &lt;a href="https://docs.microsoft.com/en-us/dotnet/api/system.flagsattribute?view=net-5.0#remarks"&gt;&lt;code&gt;Flags&lt;/code&gt; attribute&lt;/a&gt; so maybe that was something I could write about.
That said, I wanted to do something more creative than some humdrum post about using enums even if the end result isn't really practical.&lt;/p&gt;
&lt;p&gt;Instead I decided to make &lt;em&gt;real&lt;/em&gt; flags in C# with enums - turning a number like &lt;code&gt;52357729848&lt;/code&gt; into a flag:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://funwithflags.turnerj.com/api/flag/generate.png?v=52357729848" alt="The German Flag"&gt;&lt;/p&gt;
&lt;p&gt;A gave myself certain requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I needed to be able to generate more than one flag&lt;/li&gt;
&lt;li&gt;I wanted to encode everything about the flag in a single value via enums&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are technical limitations too as I can't store a lot of data in an enum and will need to make compromises.
An enum can be backed by a few different types but I chose &lt;code&gt;long&lt;/code&gt; so I could get a full 64-bits of data to play with.&lt;/p&gt;
&lt;h2 id="encoding-a-value"&gt;Encoding a Value&lt;/h2&gt;
&lt;p&gt;My initial thought with this was to pick the easiest form of flag - simple flags with stripes.
If I split a typical flag into 9 segments, maybe I can store 9 colours and that would allow drawing of horizontal and vertical stripes.
It seemed like the most straightforward approach at the time (I realised later it might have been better if I stored "shape" data instead for more flag variety but &lt;em&gt;oh well&lt;/em&gt;).&lt;/p&gt;
&lt;p&gt;The problem is, storing 9 segments in 64 bits is pretty hard and would leave me with about 7.11-bits per segment making colour data very limited.
I wanted 3 colour channels so that only gives me really 2-bits per colour which is not a lot of variety.
Having then 6-bits per segment, it left me with 10-bits that I don't really have much use for.
Initially I tried using those bits to help extend the range of colours, acting as a multiplier for a specific channel.
In the end though, it wasn't overly useful for this so I cut it.&lt;/p&gt;
&lt;p&gt;This is the data structure I ended with:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;[PPPPPPPPPP]
[RRGGBB][RRGGBB][RRGGBB]
[RRGGBB][RRGGBB][RRGGBB]
[RRGGBB][RRGGBB][RRGGBB]

P = Padding
R = Red Intensity
G = Green Intensity
B = Blue Intensity
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;My primary data only takes 54-bits so my data structure has 10-bits of padding at the front.
Having the padding at the front allows the generated number to be smaller.&lt;/p&gt;
&lt;p&gt;I will be using &lt;a href="https://github.com/SixLabors/ImageSharp/"&gt;ImageSharp&lt;/a&gt; for converting this value into an actual image.
Because I have 9 segments, it seemed like the best idea to treat the image as a 3x3 pixel square and get ImageSharp to resize it for me.
The pixel format for the data though was RGB24 so I needed to work out how to scale up my colours from 2-bits to 8-bits per channel.&lt;/p&gt;
&lt;p&gt;With 8-bits, the max value I can have is 255 for a single channel.
Full colour intensity for any channel is 3 so I decided to simply divide the max value by the full colour intensity leaving me the magic number of 85 to scale my values by.&lt;/p&gt;
&lt;p&gt;That is the nuts-and-bolts of the format, now it was just to make that work in code.&lt;/p&gt;
&lt;h2 id="a-bit-shifty"&gt;A Bit Shifty&lt;/h2&gt;
&lt;p&gt;Knowing I can store the data is one thing, actually making it work was another.
I don't often use &lt;a href="https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/bitwise-and-shift-operators"&gt;bitwise and shift operators&lt;/a&gt; but for this, it was going to use them quite heavily.&lt;/p&gt;
&lt;p&gt;Firstly, we need an enum of colour intensity values:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;public enum Intensity : byte
{
	None = 0,
	OneThird = 1,
	TwoThirds = 2,
	Max = 3
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The specific values here are important because of what they represent in binary.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;00000000 // Intensity.None
00000001 // Intensity.OneThird
00000010 // Intensity.TwoThirds
00000011 // Intensity.Max
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because these values represent a single channel's intensity, we need to combine 3 of them together to form our full colour.
We combine them by using bit shifting and bitwise OR operations to create our 6-bit colour value.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;public enum Colour : long
{
	Black = 0,
	Red = Intensity.Max &amp;lt;&amp;lt; 4,
	Green = Intensity.Max &amp;lt;&amp;lt; 2,
	Blue = Intensity.Max,
	White = Red | Green | Blue
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While we are using a &lt;code&gt;long&lt;/code&gt; here (helping us with our later bit shifting operations), the values we are setting fit within 6-bits.
Viewing the colours as bytes in binary, the shifting and OR-ing of data would look a little like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;00110000 // Red = Intensity.Max &amp;lt;&amp;lt; 4
00001100 // Green = Intensity.Max &amp;lt;&amp;lt; 2
00000011 // Blue = Intensity.Max
00111111 // White = Red | Green | Blue
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using different intensity values for the different colour channels, we can create new colours too.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;00110000 // Red = Intensity.Max &amp;lt;&amp;lt; 4
00001000 // Green = Intensity.TwoThirds &amp;lt;&amp;lt; 2
00000000 // Blue = Intensity.None
========
00111000 // Yellow = (Intensity.Max &amp;lt;&amp;lt; 4) | (Intensity.TwoThirds &amp;lt;&amp;lt; 2)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To create a few different types of flags, we will need a few more colours...&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;public enum Colour : long
{
	Black = 0,
	Red = Intensity.Max &amp;lt;&amp;lt; 4,
	Green = Intensity.Max &amp;lt;&amp;lt; 2,
	Blue = Intensity.Max,
	White = Red | Green | Blue,
	Orange = (Intensity.Max &amp;lt;&amp;lt; 4) |
		(Intensity.OneThird &amp;lt;&amp;lt; 2),
	Yellow = (Intensity.Max &amp;lt;&amp;lt; 4) |
		(Intensity.TwoThirds &amp;lt;&amp;lt; 2),
	MediumGreen = Intensity.TwoThirds &amp;lt;&amp;lt; 2,
	LightBlue = (Intensity.TwoThirds &amp;lt;&amp;lt; 2) |
		Intensity.Max,
	DarkBlue = Intensity.OneThird
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Identifying the right combinations of values for colours was relatively straight forward - I used an RGB colour picker in &lt;a href="https://www.getpaint.net/"&gt;Paint.NET&lt;/a&gt; and selected thirds of the different colour channels.
Like if I had two thirds red and one third green, I'd approximately have orange.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/fun-with-flags-colour-picker.png" alt="Colour picker showing 2 thirds red and 1 third green"&gt;&lt;/p&gt;
&lt;p&gt;So now we've got our colours, we need to encode the final value of a flag.
In a similar approach to combining the colour channels, we need to combine the colours of the 9 segments by shifting and OR-ing.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;public enum CountryFlags : long
{
	Germany = Colour.Black &amp;lt;&amp;lt; 48 | Colour.Black &amp;lt;&amp;lt; 42 | Colour.Black &amp;lt;&amp;lt; 36 |
		Colour.Red &amp;lt;&amp;lt; 30 | Colour.Red &amp;lt;&amp;lt; 24 | Colour.Red &amp;lt;&amp;lt; 18 |
		Colour.Yellow &amp;lt;&amp;lt; 12 | Colour.Yellow &amp;lt;&amp;lt; 6 | Colour.Yellow
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While &lt;code&gt;Colour.Black&lt;/code&gt; does encode as &lt;code&gt;0&lt;/code&gt; so the first 3 values aren't actually needed, it made it easier to still think of it as 9 distinct segments that all needed colours set.&lt;/p&gt;
&lt;p&gt;In binary, the operation to encode our German flag would look like:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;        00000000 // Black, shifted by 48-bits
              00000000
                    00000000
                          00110000 // Red, shifted by 30-bits
                                00110000
                                      00110000
                                            00111000 // Yellow, shifted by 12-bits
                                                  00111000
                                                        00111000
================================================================
0000000000000000000000000000110000110000110000111000111000111000
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As a decimal, that would be &lt;code&gt;52357729848&lt;/code&gt;.
This is only half the job though, we have our flag data as number but we also need to decode it to an image.&lt;/p&gt;
&lt;h2 id="generating-an-image"&gt;Generating an Image&lt;/h2&gt;
&lt;p&gt;So how do we take &lt;code&gt;52357729848&lt;/code&gt; and turn it into an image?
We use more bit shifting and now AND-ing of our data to get each individual colour.
Also, we will be reading the data in reverse.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var blueComponent = (byte)(flagData &amp;amp; 3) * 85;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The value &lt;code&gt;flagData&lt;/code&gt; here is a &lt;code&gt;long&lt;/code&gt; of our generated number.&lt;/p&gt;
&lt;p&gt;To get the blue component, we don't need to shift but we do need to perform a logical AND of the data.
We only want the last two bits of the number - if we just convert the number to a byte, we will get the last 8-bits.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;0000000000000000000000000000110000110000110000111000111000111000
                                    // We only want this part ^^
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By doing &lt;code&gt;flagData &amp;amp; 3&lt;/code&gt;, we get just the last 2-bits from the full value.
To get the next components, we do the same but now on a bit shifted value so the last 2-bits are of the colour we want.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var greenComponent = (byte)((flagData &amp;gt;&amp;gt; 2) &amp;amp; 3) * 85;
var redComponent = (byte)((flagData &amp;gt;&amp;gt; 4) &amp;amp; 3) * 85;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we have the 3 colour channels of the bottom right segment of our flag.
As a reminder, the 85x multiplier is to adjust the colour to fit within a full 8-bits for the RGB24 pixel format.
Really now, it is just a matter of wrapping the code within some loops to set it to an image.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;using var image = new Image&amp;lt;Rgb24&amp;gt;(3, 3);
for (var y = 2; y &amp;gt;= 0; --y)
{
    for (var x = 2; x &amp;gt;= 0; --x)
    {
        var pixel = image[x, y];
        var blueComponent = (byte)((flagData &amp;gt;&amp;gt; 0) &amp;amp; 3) * 85;
        pixel.B = (byte)blueComponent;
        var greenComponent = (byte)((flagData &amp;gt;&amp;gt; 2) &amp;amp; 3) * 85;
        pixel.G = (byte)greenComponent;
        var redComponent = (byte)((flagData &amp;gt;&amp;gt; 4) &amp;amp; 3) * 85;
        pixel.R = (byte)redComponent;
        flagData &amp;gt;&amp;gt;= 6;
        image[x, y] = pixel;
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In our inner-most loop, we also shift our bits in &lt;code&gt;flagData&lt;/code&gt; over 6-bits so we are in the next segment for the next iteration.
This code though would only leave us with a 3x3 flag which doesn't look right so with a little more code, we can make it be bigger and more flag-like.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;image.Mutate(x =&amp;gt; x.Resize(400, 240, new NearestNeighborResampler()));
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;NearestNeighborResampler&lt;/code&gt; here is important - it allows us to scale up our specific "blocky" image here without distorting or blurring it.&lt;/p&gt;
&lt;p&gt;And that's basically it - we can take a bunch of enums and encode a value then take the value and decode it to an image.
I've set up an Azure Function running this code to show it working and few flags I've generated:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;https://funwithflags.turnerj.com/api/flag/generate.png?v=ENCODED_VALUE&lt;/code&gt;&lt;/p&gt;
&lt;h3 id="germany"&gt;Germany&lt;/h3&gt;
&lt;p&gt;Value: &lt;a href="https://funwithflags.turnerj.com/api/flag/generate.png?v=52357729848"&gt;52357729848&lt;/a&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;Germany = Colour.Black &amp;lt;&amp;lt; 48 | Colour.Black &amp;lt;&amp;lt; 42 | Colour.Black &amp;lt;&amp;lt; 36 |
	Colour.Red &amp;lt;&amp;lt; 30 | Colour.Red &amp;lt;&amp;lt; 24 | Colour.Red &amp;lt;&amp;lt; 18 |
	Colour.Yellow &amp;lt;&amp;lt; 12 | Colour.Yellow &amp;lt;&amp;lt; 6 | Colour.Yellow,
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id="italy"&gt;Italy&lt;/h3&gt;
&lt;p&gt;Value: &lt;a href="https://funwithflags.turnerj.com/api/flag/generate.png?v=2532184938287088"&gt;2532184938287088&lt;/a&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;Italy = Colour.MediumGreen &amp;lt;&amp;lt; 48 | Colour.White &amp;lt;&amp;lt; 42 | Colour.Red &amp;lt;&amp;lt; 36 |
	Colour.MediumGreen &amp;lt;&amp;lt; 30 | Colour.White &amp;lt;&amp;lt; 24 | Colour.Red &amp;lt;&amp;lt; 18 |
	Colour.MediumGreen &amp;lt;&amp;lt; 12 | Colour.White &amp;lt;&amp;lt; 6 | Colour.Red,
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id="france"&gt;France&lt;/h3&gt;
&lt;p&gt;Value: &lt;a href="https://funwithflags.turnerj.com/api/flag/generate.png?v=561852585091056"&gt;561852585091056&lt;/a&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;France = Colour.DarkBlue &amp;lt;&amp;lt; 48 | Colour.White &amp;lt;&amp;lt; 42 | Colour.Red &amp;lt;&amp;lt; 36 |
	Colour.DarkBlue &amp;lt;&amp;lt; 30 | Colour.White &amp;lt;&amp;lt; 24 | Colour.Red &amp;lt;&amp;lt; 18 |
	Colour.DarkBlue &amp;lt;&amp;lt; 12 | Colour.White &amp;lt;&amp;lt; 6 | Colour.Red,
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id="ireland"&gt;Ireland&lt;/h3&gt;
&lt;p&gt;Value: &lt;a href="https://funwithflags.turnerj.com/api/flag/generate.png?v=2532459817242612"&gt;2532459817242612&lt;/a&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;Ireland = Colour.MediumGreen &amp;lt;&amp;lt; 48 | Colour.White &amp;lt;&amp;lt; 42 | Colour.Orange &amp;lt;&amp;lt; 36 |
	Colour.MediumGreen &amp;lt;&amp;lt; 30 | Colour.White &amp;lt;&amp;lt; 24 | Colour.Orange &amp;lt;&amp;lt; 18 |
	Colour.MediumGreen &amp;lt;&amp;lt; 12 | Colour.White &amp;lt;&amp;lt; 6 | Colour.Orange,
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id="luxembourg"&gt;Luxembourg&lt;/h3&gt;
&lt;p&gt;Value: &lt;a href="https://funwithflags.turnerj.com/api/flag/generate.png?v=13725272368788171"&gt;13725272368788171&lt;/a&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;Luxembourg = Colour.Red &amp;lt;&amp;lt; 48 | Colour.Red &amp;lt;&amp;lt; 42 | Colour.Red &amp;lt;&amp;lt; 36 |
	Colour.White &amp;lt;&amp;lt; 30 | Colour.White &amp;lt;&amp;lt; 24 | Colour.White &amp;lt;&amp;lt; 18 |
	Colour.LightBlue &amp;lt;&amp;lt; 12 | Colour.LightBlue &amp;lt;&amp;lt; 6 | Colour.LightBlue,
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="final-thoughts"&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;What started out as a bit of a weird challenge I set myself turned into a fun and interesting learning experience.
I rarely mess around with bit shifting and bitwise operations.
While I still probably won't need to very often, I like that I know a lot more about them now.&lt;/p&gt;
&lt;p&gt;If I were to approach this again, I'd probably look at encoding shapes instead of pixels of colours.
That way I could encode a wider variety of flags like the flags of Sweden, Japan and perhaps even South Korea.&lt;/p&gt;
&lt;h3 id="if-you-liked-this"&gt;If you liked this...&lt;/h3&gt;
&lt;p&gt;If you liked the kinda strange and interesting nature of this, you might like my deep dive into Levenshtein Distance and the various optimizations.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://turnerj.com/blog/levenshtein-distance-part-1-what-is-it"&gt;Levenshtein Distance (Part 1: What is it?)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://turnerj.com/blog/levenshtein-distance-part-2-gotta-go-fast"&gt;Levenshtein Distance (Part 2: Gotta Go Fast)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://turnerj.com/blog/levenshtein-distance-part-3-optimize-everything"&gt;Levenshtein Distance (Part 3: Optimize Everything!)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://turnerj.com/blog/levenshtein-distance-with-simd"&gt;Levenshtein Distance with SIMD&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>My Ideal Desktop Experience via a Remote Machine</title>
			<link>https://turnerj.com/blog/remote-desktop-experience-part-1-planning</link>
			<description>Planning my desktop experience to run on a remote server.</description>
			<enclosure url="https://turnerj.com/blog/images/social/remote-desktop-experience-part-1-planning.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/remote-desktop-experience-part-1-planning</guid>
			<pubDate>Sun, 31 Oct 2021 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;I've had both a laptop and a desktop for several years.
My desktop was my primary machine for work and play where my laptop was rarely used.
Since &lt;a href="https://turnerj.com/blog/i-left-my-job-today-after-seven-years"&gt;leaving my previous job&lt;/a&gt;, I have been nearly entirely using my laptop to do work, leaving my desktop only for gaming.
While I had a multi-monitor setup, I've found that I don't miss that but I do miss the performance.
The work I do isn't usually too taxing for my laptop but the performance is thermally limited.&lt;/p&gt;
&lt;p&gt;Both machines though are a few years old, both running Skylake processors so neither meet the &lt;a href="https://docs.microsoft.com/en-us/windows-hardware/design/minimum/supported/windows-11-supported-intel-processors"&gt;minimum Windows 11 CPU requirements&lt;/a&gt;.
I know there are still a few years of support for Windows 10 so that isn't a big issue, just something that I'm factoring into my planning.
Anyway, neither machine is a slouch for what I do but thinking for the future and how I want to work, I need to plan the upgrade path for these machines.&lt;/p&gt;
&lt;p&gt;I also have a small HP ProLiant MicroServer Gen8 that I got from working overtime at my old job, acting as a NAS for my home network.
It holds my documents, photos, backup of my Steam library, runs a Plex server, a VM for Pi-hole, a VM for a local build server, a VM for Minecraft and a VM for Factorio.
While I don't run all the VMs at once, it still isn't the most powerful machine even with an upgraded CPU and maxed out at 16 GB of RAM.
In its 4 drive bays, I have one SSD for the host OS and 3 HDDs that split the rest of the data (no RAID).
It has worked hard for many years but I'm starting to outgrow it so I want to plan its upgrade path too.&lt;/p&gt;
&lt;h2 id="current-machine-specs"&gt;Current Machine Specs&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Desktop&lt;/th&gt;
&lt;th&gt;Laptop&lt;/th&gt;
&lt;th&gt;Server&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;i5-6600K (4C/4T)&lt;/td&gt;
&lt;td&gt;i7-6700HQ (4C/8T)&lt;/td&gt;
&lt;td&gt;Xeon E3-1230 V2 (4C/8T)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;td&gt;16 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GTX 970 4 GB&lt;/td&gt;
&lt;td&gt;GTX 960M 4 GB&lt;/td&gt;
&lt;td&gt;Matrox MGA-G200eH&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These are the types of things I need to consider:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I don't have the budget to upgrade 3 machines - really only the budget to upgrade 1 really well, or maybe 2 pretty well.&lt;/li&gt;
&lt;li&gt;While gaming is primarily on my desktop, I do play some games (FPS, RTS, Turn-based strategy) on my laptop when not at home.&lt;/li&gt;
&lt;li&gt;For the NAS, I'd like more drive bays so I can have SSDs for the VMs and a HDD RAID for the personal files.&lt;/li&gt;
&lt;li&gt;Depending how high-end I want to go, I may be able to re-use the desktop's GPU.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="potential-3-in-1-solution"&gt;Potential 3-in-1 Solution&lt;/h2&gt;
&lt;p&gt;I've chatted with a friend of mine about upgrading my server and he was talking about his ideal setup.
Instead of having a desktop and a server, he was going to run his desktop on his server which is a pretty clever idea.
I was thinking how I could extend that to my setup and also improve working on my laptop and it hit me - why don't I also run my laptop's "desktop" on the server?&lt;/p&gt;
&lt;p&gt;How would I do this though? Well, I was thinking about using &lt;a href="https://parsec.app/"&gt;Parsec&lt;/a&gt;.
Effectively, I'd be remote connecting to a local server running my desktop for my laptop.
My laptop and desktop machines would be thin clients for the powerful server.&lt;/p&gt;
&lt;p&gt;Here are some of the benefits for doing this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Would only need to upgrade one machine (the server) so I can save money and/or afford more powerful parts.&lt;/li&gt;
&lt;li&gt;Laptop wouldn't need to worry about thermal constraints for work.&lt;/li&gt;
&lt;li&gt;I could expand the storage space, greatly improve the CPU and massively increase the RAM.&lt;/li&gt;
&lt;li&gt;If anything happened to my laptop (it dies or is stolen), all my important files are securely stored.&lt;/li&gt;
&lt;li&gt;I could effectively &lt;a href="https://ghuntley.com/anywhere/"&gt;work from anywhere&lt;/a&gt; while having potentially better battery life too.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are some problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Gaming on my desktop will have more lag than current - maybe not noticeable but still more lag than none.&lt;/li&gt;
&lt;li&gt;Gaming on my laptop gets more complicated as some games connect via LAN so having a remote desktop doesn't help - the laptop still needs to be fast enough to play games.&lt;/li&gt;
&lt;li&gt;If I am working from anywhere, I'd still need a pretty good internet connection as it would be a streaming-heavy setup with Parsec.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other concerns:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I run my server 24/7 so power usage is a concern - the existing server is quite power efficient.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At some point in the future, I will still need to upgrade the desktop and laptop machines.
If the gaming side for the desktop machine through the server is fine, I might be able to get away with something as simple as a Raspberry Pi.
For a future laptop, while I still want some gaming performance, I'd mainly be looking for a lightweight laptop with great battery life and probably USB-C power.&lt;/p&gt;
&lt;h3 id="renting-vs-buying"&gt;Renting-vs-Buying&lt;/h3&gt;
&lt;p&gt;Now there is another option that is possible - I could just rent a dedicated server somewhere and run everything there.
That definitely is possible and would be lower the upfront cost, potentially help with backups and likely have faster internet.
However, I don't like the idea that everything important to me would be hosted by someone else.
Not to mention that buying the equipment means I can potentially flip the parts later if I need to upgrade down the road.&lt;/p&gt;
&lt;p&gt;Also if I'm doing a lot of work still at home via my laptop, it seems a little redundant going outside my local network to connect to my remote desktop where instead I could just run it at home.&lt;/p&gt;
&lt;h2 id="secondary-goals"&gt;Secondary Goals&lt;/h2&gt;
&lt;p&gt;While the primary goals are to have a remote desktop, I do have a few secondary goals I'd like to achieve too...&lt;/p&gt;
&lt;h3 id="scriptable-os-install-and-configuration"&gt;Scriptable OS Install and Configuration&lt;/h3&gt;
&lt;p&gt;I like the idea of scripting an OS install/configuration.
I've wanted to do this for a while and I know it may not seem necessary when I have a remote desktop setup but I still see some utility in it.
The idea is using something like Chocolatey and a Powershell script to install all the software I want and configure it up just how I like it.&lt;/p&gt;
&lt;p&gt;I see this still being useful because my laptop will still need an OS of some sort and some basic programs.
Plus it allows me to tear down the VM that powers my remote desktop and rebuild it far easier - great if I screw something up.&lt;/p&gt;
&lt;p&gt;Ultimately even if a remote desktop experience for my work isn't right, having an auto-setup script would help with clean re-installs and device upgrades.
Other people have got basic scripts that do things like this for Windows already but often are limited to installing programs rather than configuring the OS.&lt;/p&gt;
&lt;h3 id="better-backups"&gt;Better Backups&lt;/h3&gt;
&lt;p&gt;Currently I backup my files onto a separate external drive but I am wanting to have a low-cost method for remote backup.
I've looked at Backblaze but am also considering other options like storing my current/existing server at a friend's house and using it as a remote backup.
If not the whole server, maybe we put space aside on eachother's servers to just run a backup VM for the other.&lt;/p&gt;
&lt;p&gt;Either way, I'd like to improve my current system to make sure my data is more secure.&lt;/p&gt;
&lt;h2 id="testing-and-next-steps"&gt;Testing and Next Steps&lt;/h2&gt;
&lt;h3 id="stage-1a"&gt;Stage 1A&lt;/h3&gt;
&lt;p&gt;Without splashing money on any new hardware straight away, I figure I should try and test this with what I've got currently.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Setup my current/old desktop machine with all the typical tools I use on my laptop.&lt;/li&gt;
&lt;li&gt;Install Parsec on my desktop and laptop, treating my desktop machine as my remote desktop for my laptop.&lt;/li&gt;
&lt;li&gt;Try working via Parsec for a week and note down any new pros/cons I find.&lt;/li&gt;
&lt;li&gt;Additionally try some basic gaming via Parsec from my desktop machine to my laptop.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="stage-1b"&gt;Stage 1B&lt;/h3&gt;
&lt;p&gt;Simultaneously I can also look at my secondary goals...&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Identifying what programs and configuration I want to script.&lt;/li&gt;
&lt;li&gt;Test out the script in a VM that I can easily reset and try again.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="stage-2"&gt;Stage 2&lt;/h3&gt;
&lt;p&gt;Assuming no major issues with testing Parsec for my work (Stage 1A)...&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Put together a parts list for my ideal server - I'm thinking something with a lot of cores.&lt;/li&gt;
&lt;li&gt;Cut that parts list down to something actually realistic. 😢&lt;/li&gt;
&lt;li&gt;Save up money to buy &lt;em&gt;any&lt;/em&gt; of those parts (something something supply chain issues).&lt;/li&gt;
&lt;li&gt;Buy the parts, build the server.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="stage-3"&gt;Stage 3&lt;/h3&gt;
&lt;p&gt;Once I have the new server built (Stage 2) and my auto-setup script ready (Stage 1B)...&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Setup a new VM on the new server for my remote desktop.&lt;/li&gt;
&lt;li&gt;Run my new auto-setup script for configuring the OS.&lt;/li&gt;
&lt;li&gt;Connect to it via Parsec and start giving my remote desktop experience a real try on the final hardware.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="stage-4"&gt;Stage 4&lt;/h3&gt;
&lt;p&gt;Assuming I'm happy with everything with Stage 3...&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Do a clean install of Windows on my laptop to get rid of the tools I no longer use - at this point I'm fully committing to the change for work.&lt;/li&gt;
&lt;li&gt;Try gaming from my desktop machine to my remote desktop, testing the performance/lag and see how happy I am with it.&lt;/li&gt;
&lt;li&gt;If I'm happy with how gaming works, wipe my desktop and commit to the same change like my laptop.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="lets-do-this-thing"&gt;Let's do this thing!&lt;/h2&gt;
&lt;p&gt;There is a lot to do between now and having my ideal setup.
Those stages also hide a lot of complexity so it won't be a fast process to go from where I am now to a fully remote desktop experience.
Ultimately though, I'm pretty excited to get started with this and looking forward to a powerful desktop experience wherever I am.&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>The pain points of C# source generators</title>
			<link>https://turnerj.com/blog/the-pain-points-of-csharp-source-generators</link>
			<description>With great power can come great pain points...</description>
			<enclosure url="https://turnerj.com/blog/images/social/the-pain-points-of-csharp-source-generators.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/the-pain-points-of-csharp-source-generators</guid>
			<pubDate>Wed, 07 Apr 2021 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;&lt;strong&gt;Update (February 2022)&lt;/strong&gt;: &lt;a href="https://turnerj.com/blog/csharp-source-generator-pain-points-february-2022-update"&gt;Debugging source generators is a lot better and an update on transient dependencies&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;I've recently completed my first foray into writing a &lt;a href="https://devblogs.microsoft.com/dotnet/introducing-c-source-generators/"&gt;C# source generator&lt;/a&gt; for &lt;a href="https://github.com/RehanSaeed/Schema.NET"&gt;Schema.NET&lt;/a&gt;.
There is a lot to like about source generators however there are a few things I wish I understood more before diving into it.&lt;/p&gt;
&lt;p&gt;For those that are unaware, source generators are a new feature added to C# whereby one can analyse existing source code and generate new source code all from C# itself.
One area where this is of interest is serialization - being able to generate an ideal serializer at compile time prevents the need of using reflection at runtime.&lt;/p&gt;
&lt;p&gt;In Schema.NET, we had hundreds of classes and interfaces that mapped to &lt;a href="https://schema.org/"&gt;Schema.org&lt;/a&gt; types.
While we had our own tool to generate these, the generated files sat in our Git repository creating a lot of noise when trying to change our tooling behaviour.
Source generators would allow us to remove these files and have them exist only as part of the compiled binary.
The move to source generators was also a good time to refactor the generating logic itself, making it easier to add new features later.&lt;/p&gt;
&lt;h2 id="pain-point-1-debugging-source-generators"&gt;Pain Point 1: Debugging Source Generators&lt;/h2&gt;
&lt;p&gt;Honestly I expected the debugging process to be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Put a breakpoint in the source generator code&lt;/li&gt;
&lt;li&gt;Press the "Debug" button in Visual Studio&lt;/li&gt;
&lt;li&gt;Code stops at the breakpoint&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Unfortunately, it isn't that simple. The source generator runs during compilation however the debugging experience starts after meaning our break point would never be hit.
After some research, it seems there are two different methods suggested.&lt;/p&gt;
&lt;h3 id="invoke-the-debugger-from-the-source-generator"&gt;Invoke the debugger from the source generator&lt;/h3&gt;
&lt;p&gt;Found this solution from &lt;a href="https://nicksnettravels.builttoroam.com/debug-code-gen/"&gt;Nick's .NET Travels&lt;/a&gt;.
Inside our source generator, likely in the "Initialize" method, we can invoke the debugger to attach to the current process with the following:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;#if DEBUG
if (!Debugger.IsAttached)
{
    Debugger.Launch();
}
#endif
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What we are doing here is using the &lt;a href="https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/preprocessor-directives"&gt;preprocessor directive&lt;/a&gt; &lt;code&gt;#if&lt;/code&gt; to conditionally include this code if the build configuration is "Debug".
When we are in the "Debug" configuration, we check if the debugger is already attached and if not, attach it via &lt;a href="https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.debugger.launch?view=netstandard-2.0"&gt;&lt;code&gt;Debugger.Launch()&lt;/code&gt;&lt;/a&gt;.
After the debugger launches, it comes up with a prompt about where to debug it (I chose a new instance of Visual Studio).
From here, the code will be paused on the &lt;code&gt;Debugger.Launch()&lt;/code&gt; line and this new instance of Visual Studio will listen for any breakpoints you may add.&lt;/p&gt;
&lt;p&gt;I probably spent a good few hours using this method and while it works, it is not a great experience.
For starters, the prompt I mention, it was appearing multiple times during a debugging session.
I'm not sure if the issue related to different target frameworks building simultaneously or maybe some timeout logic being handled by the build process.
Additionally I had Visual Studio crash a few times in either instance of Visual Studio I had open.&lt;/p&gt;
&lt;p&gt;Don't take my word for it, &lt;a href="https://github.com/dotnet/roslyn/discussions/50123"&gt;others have had&lt;/a&gt; similar &lt;a href="https://github.com/dotnet/roslyn/discussions/50606"&gt;difficulties&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="run-the-source-generator-manually"&gt;Run the source generator manually&lt;/h3&gt;
&lt;p&gt;A source generator itself is effectively like any other class - we can instantiate and call the initialization methods ourselves.
There is a &lt;a href="https://github.com/dotnet/roslyn/blob/9dad013b7a3fabeb1b4f36e260ed9c6e3344548e/docs/features/source-generators.cookbook.md"&gt;detailed document in the Roslyn repo&lt;/a&gt; that covers all sorts of things with regards to source generators.
One of the sections specifically covers &lt;a href="https://github.com/dotnet/roslyn/blob/9dad013b7a3fabeb1b4f36e260ed9c6e3344548e/docs/features/source-generators.cookbook.md#unit-testing-of-generators"&gt;testing source generators&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here is a modified version of their example that shows the general gist:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;Compilation inputCompilation = CreateCompilation(@"
namespace MyCode
{
    public class Program
    {
        public static void Main(string[] args)
        {
        }
    }
}
");

CustomGenerator generator = new CustomGenerator();

// Create the driver that will control the generation, passing in our generator
GeneratorDriver driver = CSharpGeneratorDriver.Create(generator);

// Run the generation pass
driver.RunGeneratorsAndUpdateCompilation(inputCompilation, out var outputCompilation, out var diagnostics);

static Compilation CreateCompilation(string source)
    =&amp;gt; CSharpCompilation.Create("compilation",
        new[] { CSharpSyntaxTree.ParseText(source) },
        new[] { MetadataReference.CreateFromFile(typeof(Binder).GetTypeInfo().Assembly.Location) },
        new CSharpCompilationOptions(OutputKind.ConsoleApplication));
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Basically this creates a compilation that the source generator can run against.
This method can be quite verbose as, depending on your source generator itself, you may require a lot of boilerplate source code for your generator to work upon.&lt;/p&gt;
&lt;p&gt;In my case with Schema.NET, I'm generating hundreds of classes based on some JSON so I have minimal boilerplate.
I could have gone this route however I decided on a more direct approach:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var generator = new SchemaSourceGenerator();
generator.Initialize(new Microsoft.CodeAnalysis.GeneratorInitializationContext());
generator.Execute(new Microsoft.CodeAnalysis.GeneratorExecutionContext());
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;My generator didn't care about any existing syntax tree - its job was to just pump out new classes and interfaces.
This method does have a bit of a fatal flaw in that calling most (any?) of the methods on &lt;code&gt;GeneratorInitializationContext&lt;/code&gt; or &lt;code&gt;GeneratorExecutionContext&lt;/code&gt; may fail.
These types are not instantiated with their different properties correctly configured which is something that more verbose way above did.
For my &lt;code&gt;SchemaSourceGenerator&lt;/code&gt;, I needed to comment out &lt;code&gt;context.AddSource(sourceName, sourceText)&lt;/code&gt; so it wouldn't throw an exception.&lt;/p&gt;
&lt;p&gt;My recommendation is for anyone working on a source generator, either have a separate console application to debug your source generator or create a special unit test.
Do it properly though and have the more verbose compilation code as shown in the earlier example so you don't need to modify your source generator to run it.&lt;/p&gt;
&lt;h2 id="pain-point-2-no-asyncawait"&gt;Pain Point 2: No Async/Await&lt;/h2&gt;
&lt;p&gt;The methods exposed by source generators (&lt;code&gt;Initialize&lt;/code&gt; and &lt;code&gt;Execute&lt;/code&gt;) do not return tasks so you can't invoke async APIs.
According to the Roslyn team &lt;a href="https://github.com/dotnet/roslyn/issues/44045"&gt;this is by design&lt;/a&gt; as the IO for reading/writing files is handled by the compiler.&lt;/p&gt;
&lt;p&gt;For Schema.NET, we do a HTTP request to get the JSON we need to build. There are reasons this isn't a good idea but this is what we do and it works well for us.
The &lt;code&gt;HttpClient&lt;/code&gt; has only had async APIs for a long while and while &lt;a href="https://github.com/dotnet/runtime/issues/32125"&gt;that is changing&lt;/a&gt;, source generators must target .NET Standard 2.0 so we can't leverage that change.&lt;/p&gt;
&lt;p&gt;My first iteration of getting the source generator to work was effectively wrapping my code in a &lt;code&gt;Task.Run()&lt;/code&gt; call:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;public void Initialize(GeneratorInitializationContext context) =&amp;gt; Task.Run(async () =&amp;gt;
{
    ...

    SchemaObjects = await schemaService.GetObjectsAsync();
}).GetAwaiter().GetResult();
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This admittedly did work but I really didn't like it - it felt like such a kludge solution.
There is a lot of information available about when and where you should be using &lt;code&gt;Task.Run()&lt;/code&gt; - Stephen Cleary has a &lt;a href="https://blog.stephencleary.com/2013/11/taskrun-etiquette-examples-using.html"&gt;good blog post&lt;/a&gt; or &lt;a href="https://blog.stephencleary.com/2013/11/taskrun-etiquette-examples-dont-use.html"&gt;two&lt;/a&gt; about it.
While a source generator is likely a new special case where &lt;em&gt;it depends&lt;/em&gt;, I still decided to change it.
I ended up with calling &lt;code&gt;.GetAwaiter().GetResult()&lt;/code&gt; directly on the method of mine that was async instead.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;public void Initialize(GeneratorInitializationContext context)
{
    ...

    SchemaObjects = schemaService.GetObjectsAsync().GetAwaiter().GetResult();
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I'll be honest - I don't know if this is technically &lt;em&gt;better&lt;/em&gt; in this scenario but I know it works.&lt;/p&gt;
&lt;h2 id="pain-point-3-transient-dependencies"&gt;Pain Point 3: Transient Dependencies&lt;/h2&gt;
&lt;p&gt;An issue with dependencies was something I wasn't expecting at all when I started with my source generator - why should it be?
Every other library and application I've written in C# in the last few years follows a fairly predictable pattern of using a &lt;code&gt;&amp;lt;PackageReference&amp;gt;&lt;/code&gt; to define which package and version.
The basics of including a package reference like that for source generators is still the same, it is just all the other bits it now also requires.&lt;/p&gt;
&lt;p&gt;For Schema.NET, our source generator was parsing JSON so we needed a serializer.
We were previously using &lt;code&gt;Newtonsoft.Json&lt;/code&gt; for our tool however in this refactor, we were also moving to using &lt;code&gt;System.Text.Json&lt;/code&gt; for the parsing of the initial schema data from Schema.org.
This dependency needs to only exist for the generator, not the library the generator is creating classes etc for.
Normally you can just specify &lt;code&gt;PrivateAssets="all"&lt;/code&gt; on the package reference and that's it but for source generators, you need to specify a few more things:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-xml"&gt;&amp;lt;ItemGroup&amp;gt;
    &amp;lt;PackageReference Include="System.Text.Json" Version="5.0.1" GeneratePathProperty="true" PrivateAssets="all" /&amp;gt;
&amp;lt;/ItemGroup&amp;gt;

&amp;lt;PropertyGroup&amp;gt;
    &amp;lt;GetTargetPathDependsOn&amp;gt;$(GetTargetPathDependsOn);GetDependencyTargetPaths&amp;lt;/GetTargetPathDependsOn&amp;gt;
&amp;lt;/PropertyGroup&amp;gt;

&amp;lt;Target Name="GetDependencyTargetPaths"&amp;gt;
    &amp;lt;ItemGroup&amp;gt;
        &amp;lt;TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Text_Json)\lib\netstandard2.0\System.Text.Json.dll" IncludeRuntimeDependency="false" /&amp;gt;
    &amp;lt;/ItemGroup&amp;gt;
&amp;lt;/Target&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Not too bad right? Well, what if I told you that you needed to do this for all dependencies.
By that I mean &lt;em&gt;every dependency in the dependency tree&lt;/em&gt; which for us was:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Microsoft.Bcl.AsyncInterfaces, 5.0.0&lt;/li&gt;
&lt;li&gt;System.Buffers, 4.5.1&lt;/li&gt;
&lt;li&gt;System.Memory, 4.5.4
&lt;ul&gt;
&lt;li&gt;System.Numerics.Vectors, 4.4.0&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;System.Numerics.Vectors, 4.5.0&lt;/li&gt;
&lt;li&gt;System.Runtime.CompilerServices.Unsafe, 5.0.0&lt;/li&gt;
&lt;li&gt;System.Text.Encodings.Web, 5.0.0&lt;/li&gt;
&lt;li&gt;System.Threading.Tasks.Extensions, 4.5.4&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Our example would look more like:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-xml"&gt;&amp;lt;ItemGroup&amp;gt;
    &amp;lt;PackageReference Include="System.Text.Json" Version="5.0.1" GeneratePathProperty="true" PrivateAssets="all" /&amp;gt;
    &amp;lt;PackageReference Include="Microsoft.Bcl.AsyncInterfaces" Version="5.0.0" GeneratePathProperty="true" PrivateAssets="all" /&amp;gt;
    &amp;lt;PackageReference Include="System.Runtime.CompilerServices.Unsafe" Version="5.0.0" GeneratePathProperty="true" PrivateAssets="all" /&amp;gt;
    &amp;lt;PackageReference Include="System.Threading.Tasks.Extensions" Version="4.5.4" GeneratePathProperty="true" PrivateAssets="all" /&amp;gt;
    &amp;lt;PackageReference Include="System.Text.Encodings.Web" Version="5.0.1" GeneratePathProperty="true" PrivateAssets="all" /&amp;gt;
    &amp;lt;PackageReference Include="System.Buffers" Version="4.5.1" GeneratePathProperty="true" PrivateAssets="all" /&amp;gt;
    &amp;lt;PackageReference Include="System.Memory" Version="4.5.4" GeneratePathProperty="true" PrivateAssets="all" /&amp;gt;
    &amp;lt;PackageReference Include="System.Numerics.Vectors" Version="4.4.0" GeneratePathProperty="true" PrivateAssets="all" /&amp;gt;
&amp;lt;/ItemGroup&amp;gt;

&amp;lt;PropertyGroup&amp;gt;
    &amp;lt;GetTargetPathDependsOn&amp;gt;$(GetTargetPathDependsOn);GetDependencyTargetPaths&amp;lt;/GetTargetPathDependsOn&amp;gt;
&amp;lt;/PropertyGroup&amp;gt;

&amp;lt;Target Name="GetDependencyTargetPaths"&amp;gt;
    &amp;lt;ItemGroup&amp;gt;
        &amp;lt;TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Text_Json)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" /&amp;gt;
        &amp;lt;TargetPathWithTargetPlatformMoniker Include="$(PKGMicrosoft_Bcl_AsyncInterfaces)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" /&amp;gt;
        &amp;lt;TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Runtime_CompilerServices_Unsafe)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" /&amp;gt;
        &amp;lt;TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Threading_Tasks_Extensions)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" /&amp;gt;
        &amp;lt;TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Buffers)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" /&amp;gt;
        &amp;lt;TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Memory)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" /&amp;gt;
        &amp;lt;TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Numerics_Vectors)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" /&amp;gt;
        &amp;lt;TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Text_Encodings_Web)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" /&amp;gt;
    &amp;lt;/ItemGroup&amp;gt;
&amp;lt;/Target&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If any of these dependencies pick up any new dependencies themselves, they need to be included too - this can happen with patch version changes like between &lt;code&gt;System.Text.Encodings.Web&lt;/code&gt; going from &lt;a href="https://www.nuget.org/packages/System.Text.Encodings.Web/5.0.0"&gt;5.0.0&lt;/a&gt; to &lt;a href="https://www.nuget.org/packages/System.Text.Encodings.Web/5.0.1"&gt;5.0.1&lt;/a&gt; where it picked up a few new dependencies.&lt;/p&gt;
&lt;p&gt;Currently for Schema.NET, I'm only specifying &lt;code&gt;System.Text.Json&lt;/code&gt; and &lt;code&gt;System.Text.Encodings.Web&lt;/code&gt; directly which allows our builds to work on our CI but Visual Studio complains during the build.
I &lt;a href="https://github.com/dotnet/roslyn/issues/52017"&gt;raised an issue with the Roslyn team&lt;/a&gt; about this extra weird behaviour though it seems to amount for a difference between builds triggered by .NET Framework (Visual Studio and MSBuild) and .NET Core (&lt;code&gt;dotnet build&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;My biggest gripe here though is: &lt;em&gt;Why doesn't the compiler just do this for us?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The compiler knows all our dependencies so with some sort of flag to indicate that this is a source generator, the compiler should do all this work for us.
The burden to make sure we keep track of all transient dependencies when any dependency gets an update is something I don't want to do.&lt;/p&gt;
&lt;h3 id="potential-transient-dependency-workaround"&gt;Potential Transient Dependency Workaround&lt;/h3&gt;
&lt;p&gt;While not a perfect solution, if you are like me and really don't like specifying every package reference in the dependency tree like that, you can automate it somewhat with a custom MSBuild target.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-xml"&gt;&amp;lt;ItemGroup&amp;gt;
    &amp;lt;PackageReference Include="System.Text.Json" Version="5.0.1" PrivateAssets="all" /&amp;gt;
&amp;lt;/ItemGroup&amp;gt;

&amp;lt;PropertyGroup&amp;gt;
    &amp;lt;GetTargetPathDependsOn&amp;gt;$(GetTargetPathDependsOn);GetDependencyTargetPaths&amp;lt;/GetTargetPathDependsOn&amp;gt;
&amp;lt;/PropertyGroup&amp;gt;

&amp;lt;Target Name="GetDependencyTargetPaths" AfterTargets="ResolvePackageDependenciesForBuild"&amp;gt;
    &amp;lt;ItemGroup&amp;gt;
        &amp;lt;TargetPathWithTargetPlatformMoniker Include="@(ResolvedCompileFileDefinitions)" IncludeRuntimeDependency="false" /&amp;gt;
    &amp;lt;/ItemGroup&amp;gt;
&amp;lt;/Target&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This "works" in the sense that &lt;code&gt;ResolveCompileFileDefinitions&lt;/code&gt; does contain a list of our transient dependencies so everything that needs to be passed in is passed in.
The problem with this solution is that &lt;code&gt;ResolveCompileFileDefinitions&lt;/code&gt; contains &lt;em&gt;more&lt;/em&gt; than the specific dependencies we are wanting and could have undesired behaviour.&lt;/p&gt;
&lt;p&gt;Ideally I'd like &lt;em&gt;something like this&lt;/em&gt; to be an automatic target for source generator projects but perfected to target only private dependencies so they are bundled correctly.&lt;/p&gt;
&lt;h2 id="conclusion-was-migrating-to-source-generators-worth-it"&gt;Conclusion: Was migrating to source generators worth it?&lt;/h2&gt;
&lt;p&gt;&lt;big&gt;&lt;strong&gt;Yes.&lt;/strong&gt;&lt;/big&gt;&lt;/p&gt;
&lt;p&gt;Switching to source generators, combined with my refactor, added 700 lines of code while &lt;strong&gt;removing 69,203 lines of code&lt;/strong&gt;.
&lt;a href="https://github.com/RehanSaeed/Schema.NET/pull/252"&gt;My pull request&lt;/a&gt; affected 765 files, the vast majority being generated classes and interfaces that no longer need to sit in the repository.&lt;/p&gt;
&lt;p&gt;The refactor of our generation code also sets us up nicely for the future where we can support pending Schema.org types (&lt;a href="https://github.com/RehanSaeed/Schema.NET/issues/203"&gt;something that has been requested by a few people&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;While these pain points are annoying, source generators are a great feature that I hope getting tool updates to improve the developer experience.&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>A Better Mousetrap</title>
			<link>https://turnerj.com/blog/a-better-mousetrap</link>
			<description>My journey to launch a product and take the next steps in my career</description>
			<enclosure url="https://turnerj.com/blog/images/social/a-better-mousetrap.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/a-better-mousetrap</guid>
			<pubDate>Tue, 15 Sep 2020 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;Today is a big day for me as after many months (or years depending how you look at it), I've &lt;a href="https://www.producthunt.com/posts/brandvantage"&gt;finally launched the first product for my business, BrandVantage&lt;/a&gt;.
This post is the story of how I started with one idea and ended up launching with a different one.&lt;/p&gt;
&lt;h2 id="the-original-idea-lets-build-a-digital-brand-expert"&gt;The Original Idea: Let's build a digital brand expert!&lt;/h2&gt;
&lt;p&gt;I worked as a web developer for a local web development agency for a number of years and in that time, I learnt a lot about how a variety of different businesses operated online.&lt;/p&gt;
&lt;p&gt;There were a few key "problems" I found in common across many of those businesses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Under-utilising analytics&lt;/li&gt;
&lt;li&gt;Misunderstanding analytics&lt;/li&gt;
&lt;li&gt;Not keeping on top of industry information&lt;/li&gt;
&lt;li&gt;Lack of competitor analysis/understanding&lt;/li&gt;
&lt;li&gt;Difficulty with Search Engine Optimization (SEO)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In moderate-to-large companies where you have marketing departments, most of this stuff can be covered by one or more staff dedicated to these things.
In smaller companies, the business owner is normally the one where these tasks fall on to, but they are already wearing many different hats.
It felt like something was here - if I could automate some of these tasks in different ways, I could both help business owners and earn myself some money along the way.&lt;/p&gt;
&lt;p&gt;Automation of tasks, especially ones in analytics or SEO spaces, isn't a new idea.
In fact, I've seen many businesses in a similar space launch on Product Hunt over the years since starting, but that didn't deter me.
I was building &lt;a href="https://idioms.thefreedictionary.com/a+better+mousetrap"&gt;&lt;em&gt;a better mousetrap&lt;/em&gt;&lt;/a&gt; and wanting to launch it at a lower price, not something truly innovative so it was going to be an uphill battle.
This area though, helping small businesses online be as efficient in tasks as some bigger businesses can, is something I felt passionately about so I proceeded anyway.&lt;/p&gt;
&lt;h3 id="attempt-one-very-hacky-in-php"&gt;Attempt One: Very Hacky (in PHP)&lt;/h3&gt;
&lt;p&gt;Way back in 2015/16/17, while still at my full-time job, I spent nights and weekends building and tinkering on solutions to the problems business owners face.
It was a hacky PHP solution pulling real-time information from sources like Twitter, Google Analytics and Facebook.
A hacky approach seemed like a good idea as that seemed to be the way people launched things, do the quickest and hackiest thing you can to get it out the door.&lt;/p&gt;
&lt;p&gt;While working on it, I had a few interested parties though what I built could barely be considered a prototype.
The thing was a mess.
I could do some basic queries, but it wasn't what I considered sellable and definitely not user-friendly, something I considered key to the product.
I was also running into technical problems with scale - any sufficiently complex query was performed real-time, which was getting more complicated.
Real-time processing had to be out.
I needed to pre-compute and store it in a database.&lt;/p&gt;
&lt;p&gt;I wanted to take this more seriously and I didn't feel like a "quick and hacky" approach to building a product was right for me.
With this in mind, it seemed like a good opportunity to change the tech stack to something that would be better long term.&lt;/p&gt;
&lt;h3 id="attempt-two-slightly-less-hacky-in.net"&gt;Attempt Two: Slightly Less Hacky (in .NET)&lt;/h3&gt;
&lt;p&gt;Moving to .NET felt like the smart move for me as at my job I had spent a lot more time working in .NET than PHP, plus I vastly prefered the tooling in .NET vs PHP.
That said, the .NET code I had worked on to-date would definitely be considered "legacy code".&lt;/p&gt;
&lt;p&gt;My first version in .NET (specifically .NET Framework), predating my use of version control, was trying to keep costs low by using MySQL through Entity Framework.
After a lot of pain and suffering with that, I had a short stint of MSSQL before I settled upon MongoDB.&lt;/p&gt;
&lt;p&gt;MongoDB might seem like a weird choice - there are some people that have very strong opinions about which type of database you should use.
Honestly it came down to a gut feel after messing around with it - it seemed more compatible to the way I was approaching problems than a relational database would.
I liked the code-first approach to Entity Framework so much though that I recreated the "feel" of Entity Framework for MongoDB with some custom code.
This later became an open source project of mine called &lt;a href="https://www.mongoframework.com/"&gt;MongoFramework&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I'm not going to lie, progress was... slow.
While I was putting quite a lot of time into working on it, it was still an extremely ambitious project.
I have strong feelings about building "MVPs" where some people focus too much on the "minimum" without enough focus on the "viable".
At the end of the day people buy products that meet their needs, and cutting too much out would meet no-ones needs.
If someone was going to use this, in a market with many competitors of varying quality, it had to do its job well.
There didn't seem much I could reasonably cut to make it any more minimal if I wanted people to buy it.&lt;/p&gt;
&lt;p&gt;I kept working at it every night, building pieces to extract and store data from a variety of sources.
I was pulling in data from Google Analytics, Google Webmaster Tools (now called Google Search Console), Twitter, Facebook, IP Geolocation, DNS information and also from news articles.
What I thought I could do is once I had the different data sources together, I would write custom rules that could infer insights from individual or combined data sets.
These insights would form the basis of the "digital brand expert".
After all, that was the goal of the idea, something that could help out small business owners.&lt;/p&gt;
&lt;p&gt;After 2 years of working on this in my spare time, it felt the right time to leave my job and go into this full time.
I felt like I was &lt;em&gt;so close&lt;/em&gt; to launching and I just needed something more than the same day-to-day work.
So I did it - &lt;a href="https://turnerj.com/blog/i-left-my-job-today-after-seven-years"&gt;I left my job after 7 years&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="going-full-time-into-the-idea"&gt;Going Full-time into the Idea&lt;/h3&gt;
&lt;p&gt;Right out of the gate, I had moved from .NET Framework to .NET Core, was working on UI/UX improvements for the application and launched the website for it.
I worked with an accountant and a lawyer to setup the business, bought a trade mark for the product name, and I felt good like I was only a few months away from launching.
This feeling didn't last though...&lt;/p&gt;
&lt;p&gt;Over time, it felt like I was taking two steps forward then one step back - some technical, some business related.
Sure, that is still progress, but having new issues crop up every day or so can really crush your motivation.&lt;/p&gt;
&lt;p&gt;My best/happiest/most productive days were days I ignored or avoided different issues I had.
If I had a problem with the login system, I would focus on how the UX of the menus worked.
If I had a problem with data gathering, I would add more tests to the codebase.
While I didn't entirely ignore the problem, I would wait a week or two before I looked at it again, somewhat hoping it would solve itself - unfortunately that isn't how things work.&lt;/p&gt;
&lt;p&gt;In time though, I got to a stage where it felt like I could launch and was hyping myself up until reality struck: I didn't actually build what I set out to build.&lt;/p&gt;
&lt;p&gt;The UI/UX was good, I had strategies for deployment and plans for next steps, but it wasn't a "digital brand expert".
It was instead a glorified data store for information that people could better access through existing tools.
That's kinda a big problem!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/a-better-mousetrap-ive-made-a-huge-mistake.gif" alt="Gob Bluth saying &amp;quot;I've made a huge mistake.&amp;quot; from the TV Show &amp;quot;Arrested Development&amp;quot;"&gt;&lt;/p&gt;
&lt;p&gt;When realising this I poured time into fixing that huge lapse in judgement, but I couldn't do it.
No matter how I tried, I just couldn't figure out how to build this rules engine.
It was like my entire thought process was just clouded.
I couldn't see the solution to the problem like I can for most other things.&lt;/p&gt;
&lt;p&gt;This was depressing and I ended up having a month or so hiatus from working on it.
When I have had stints of not feeling like or not being able to do programming in the past, I try and spur it on again by watching some show or movie which has some strong relation to technology (fictional or not).
My go-to is usually something like &lt;a href="https://www.imdb.com/title/tt0371746/"&gt;Iron Man&lt;/a&gt;, but this time I was rewatching &lt;a href="https://www.imdb.com/title/tt2543312/"&gt;Halt and Catch Fire&lt;/a&gt; where I found some inspiration.&lt;/p&gt;
&lt;h2 id="the-pivot-an-api-to-the-internet"&gt;The Pivot: An API to the Internet&lt;/h2&gt;
&lt;p&gt;Later in the series a lot of the focus is around the Web, and it was in these episodes where my thoughts about the Internet and the data on it have changed.
There is a quote from one of the main characters at the end of Season 3 that resonates with me:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;"The moment we decide what the Web is, we've lost. The moment we try to tell people what to do with it, we've lost.
All we have to do is build a door and let them inside."&lt;/p&gt;
&lt;p&gt;- Joe MacMillan (Season 3, Episode 10)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The Internet is a treasure trove of information, it is searchable but generally unstructured.
People have managed to create all sorts of different pages in HTML, but in the process of making a website everything is designed for a human user.
It is this way for obvious reasons, &lt;em&gt;we&lt;/em&gt; are the consumers of web pages after all... aren't we?&lt;/p&gt;
&lt;p&gt;Behind these user-friendly web pages are usually other specific bits of markup, providing some level of structured data for specific situations.
Sometimes it is a description metatag for search engines, other times it might be &lt;a href="https://ogp.me/"&gt;Open Graph&lt;/a&gt; metatags for social media links.
We build these things to help aid computers processing our web pages.&lt;/p&gt;
&lt;p&gt;In 2011, &lt;a href="https://schema.org/"&gt;Schema.org&lt;/a&gt; was created.
This was a collaborative effort between Google, Bing and Yahoo (later that year, Yandex as well) with the mission to "create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond".
Through 3 different encodings (&lt;a href="https://turnerj.com/blog/what-is-microdata-and-why-should-i-care"&gt;Microdata&lt;/a&gt;, &lt;a href="http://rdfa.info/"&gt;RDFa&lt;/a&gt; and &lt;a href="https://json-ld.org/"&gt;JSON-LD&lt;/a&gt;), websites could express detailed structured data.&lt;/p&gt;
&lt;p&gt;There is another quote from Halt and Catch Fire which I like:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;"Computers aren't the thing. They're the thing that gets us to the thing."&lt;/p&gt;
&lt;p&gt;- Joe MacMillan (Season 1, Episode 1)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As much as I like computers and programming, they are used to help us achieve other goals.
From my attempts of trying to build a "digital brand expert", I knew that data is fundamental to help build more advanced systems and give new insights.
Having easier access to other forms of data from web pages around the world may allow new and different tools to be built.&lt;/p&gt;
&lt;p&gt;So I decided rather than try and solve a problem that I was clouded by, I would pivot my product.
It wouldn't be the "digital brand expert" (yet), instead it would be a data provider in its own right.
The scope of functionality was smaller and the path seemed clearer - I provide structured data from web pages.&lt;/p&gt;
&lt;p&gt;Being a data provider in this manner can help me achieve my original goal at a later point in time - I'll have a different and unique dataset that my competitors wouldn't actually have or have it at a lower cost than they might.
For example, when I was integrating news articles into my "digital brand expert", the service I was using had a high monthly cost and still was relatively limited on queries.
Instead as my own data provider, I could get access to something like news articles at no additional cost.&lt;/p&gt;
&lt;p&gt;Thinking this way, I'm basically letting my "digital brand expert" concept take a hiatus while I could earn money providing data for others to build tools or integrate into their own workflow.
So I pivoted in late 2019 to build a tool to get structured data out of web pages.&lt;/p&gt;
&lt;h3 id="actually-building-the-thing"&gt;Actually Building the Thing&lt;/h3&gt;
&lt;p&gt;The goal was standization and interoperability so I needed to support the major existing types of structured data, but also derive structured data for where it is missing.
For interoperability reasons, I didn't want to &lt;a href="https://xkcd.com/927/"&gt;create a new standard&lt;/a&gt; for structured data.
Instead, I decided the Schema.org vocabulary would be a good fit for my use case.&lt;/p&gt;
&lt;p&gt;There are a lot of types in Schema.org and I didn't want to write them myself so I found a library called &lt;a href="https://github.com/RehanSaeed/Schema.NET"&gt;Schema.NET&lt;/a&gt;.
And because I care about open source, and I would be taking a large dependence on the library, I &lt;a href="https://github.com/RehanSaeed/Schema.NET/pull/105"&gt;contributed&lt;/a&gt; a &lt;a href="https://github.com/RehanSaeed/Schema.NET/pull/107"&gt;variety&lt;/a&gt; of &lt;a href="https://github.com/RehanSaeed/Schema.NET/pull/108"&gt;patches&lt;/a&gt; and &lt;a href="https://github.com/RehanSaeed/Schema.NET/pull/109"&gt;performance&lt;/a&gt; &lt;a href="https://github.com/RehanSaeed/Schema.NET/pull/119"&gt;improvements&lt;/a&gt;.
I'm now a joint collaborator on the project with the project's creator, &lt;a href="https://rehansaeed.com/"&gt;Muhammad Rehan Saeed&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I built an initial prototype to see if it would be doable and it was - I was able to extract common known data formats into a singular format.
Over the next few weeks, I continued to refine it and expand it with some basic logic to derive new structured data from pages without any.&lt;/p&gt;
&lt;p&gt;Now that I achieved the goal I set out for, I needed to put it into something sellable.&lt;/p&gt;
&lt;p&gt;I had all my code to date for my "digital brand expert" and a lot of it would actually be useable, so I just ripped out what I didn't need and started to port my prototype into it.
It needed a bit of work to tie it together, but overall this part went pretty smoothly.&lt;/p&gt;
&lt;p&gt;Everything seemed good till I was actually integrating subscriptions/payment into the application.
What I had for the "digital brand expert" idea was flawed in a few ways and I recently discovered the fun world of international sales tax.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/a-better-mousetrap-oh-come-on.gif" alt="Gob Bluth saying &amp;quot;Oh, come on!&amp;quot; from the TV Show &amp;quot;Arrested Development&amp;quot;"&gt;&lt;/p&gt;
&lt;p&gt;I tried a few different solutions and was liaising with my accountant about what would work though ended up taking way longer than I wanted.
Each of these different business/integration issues hurt my productivity like the issues I had with my original idea - it has been hard.&lt;/p&gt;
&lt;p&gt;I did hit different productivity slumps (and one breakdown) though in the end, I slowly and steadily made progress and finally reached the point of launching &lt;em&gt;something&lt;/em&gt;.&lt;/p&gt;
&lt;h3 id="behind-the-technical-curtain"&gt;Behind the Technical Curtain&lt;/h3&gt;
&lt;p&gt;I like knowing about how things work and I'm sure other people out there are similar so here is the technical breakdown of some aspects:&lt;/p&gt;
&lt;p&gt;The core is an ASP.NET Core 3.1 application running as an &lt;a href="https://azure.microsoft.com/en-us/services/app-service/"&gt;Azure App Service&lt;/a&gt; on Linux.
The database is powered by MongoDB (on &lt;a href="https://www.mongodb.com/cloud/atlas"&gt;MongoDB Atlas&lt;/a&gt;) using my open source "Entity Framework"-like library called MongoFramework.
The various pages in the application are Razor Pages and the API itself is using MVC.&lt;/p&gt;
&lt;p&gt;Internally to the API, I am using Schema.NET for converting to/from the Schema.org vocabulary.
The API itself honours Robots.txt files when converting pages so I built &lt;a href="https://github.com/TurnerSoftware/RobotsExclusionTools"&gt;a robust open source solution for parsing Robots.txt files&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Additionally different aspects of the system internally use caching where I used &lt;a href="https://www.cachetower.com/"&gt;Cache Tower&lt;/a&gt;, my own caching library that supports &lt;a href="https://turnerj.com/blog/multilayer-caching-in-dotnet"&gt;multi-layer caching&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For handling background tasks like removing old data from the database, I use &lt;a href="https://www.hangfire.io/"&gt;Hangfire&lt;/a&gt;.
For error logging, I use &lt;a href="https://sentry.io/welcome/"&gt;Sentry&lt;/a&gt; which I've written a custom layer to hook Hangfire exceptions into.
For performance monitoring, I use &lt;a href="https://miniprofiler.com/dotnet/"&gt;MiniProfiler&lt;/a&gt; where I've added support for MongoFramework so I can see how long my queries take.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/"&gt;GitHub&lt;/a&gt; manages the code itself with &lt;a href="https://azure.microsoft.com/en-us/services/devops/"&gt;Azure DevOps&lt;/a&gt; managing the building, testing and deploying of the application.
I actually run my own Azure DevOps build agent locally which helps quite a bit with build and release performance.&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;There's something cathartic about writing this post as I am closing one chapter of my life and opening another.
Launching BrandVantage is a big step for me - I'm both excited and nervous about doing so, though optimistic in the future of the business.&lt;/p&gt;
&lt;p&gt;I have a lot of big plans for &lt;a href="https://brandvantage.co/"&gt;BrandVantage&lt;/a&gt; including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;News API: News articles from around the web as &lt;a href="https://schema.org/Article"&gt;Article Schema objects&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Product API: Product pages restuctured to &lt;a href="https://schema.org/Product"&gt;Product Schema objects&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Plus a few others...&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I do plan to revisit the "digital brand expert" idea again.
I still think there is something good there, but next time I think I'll be a little more prepared.&lt;/p&gt;
&lt;h3 id="links"&gt;Links&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://brandvantage.co/"&gt;BrandVantage Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.producthunt.com/posts/brandvantage"&gt;BrandVantage on Product Hunt&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.turnersoftware.com.au/"&gt;Turner Software Website (My company behind BrandVantage)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>Multilayer Caching in .NET</title>
			<link>https://turnerj.com/blog/multilayer-caching-in-dotnet</link>
			<description>Optimise your caching strategy through layered caching.</description>
			<enclosure url="https://turnerj.com/blog/images/social/multilayer-caching-in-dotnet.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/multilayer-caching-in-dotnet</guid>
			<pubDate>Sun, 07 Jun 2020 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;Caching is a powerful tool in a programmer's toolbox but it isn't magic.
It can help scale an application to a vast number of users or it can be the thing dragging down your application.
Layered caching is a technique of stacking different types of cache on top of each other which play to different strengths.&lt;/p&gt;
&lt;p&gt;I was first inspired to the idea of multilayered caching by &lt;a href="https://twitter.com/Nick_Craver"&gt;Nick Craver&lt;/a&gt;. He wrote a great article about &lt;a href="https://nickcraver.com/blog/2019/08/06/stack-overflow-how-we-do-app-caching/"&gt;how Stack Overflow do caching&lt;/a&gt; which has a lot of interesting insights - definitely worth checking out if you haven't already.
It was his article that inspired me to create &lt;a href="https://www.cachetower.com/"&gt;Cache Tower&lt;/a&gt;, my own multilayered caching solution for .NET with an emphasis on performance.&lt;/p&gt;
&lt;p&gt;Using the example he illustrated in his post, our own computers already do multiple layers of caching:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;L1/L2 CPU Cache&lt;/li&gt;
&lt;li&gt;RAM&lt;/li&gt;
&lt;li&gt;SSD/HDD (Pagefile)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The performance profiles of each of these is drastically different where the CPU caches are the fastest but also hold the least amount of data.
This is probably the first important takeaway from caching - its not just what you cache, its how you cache it.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There is an interesting case with Cloudflare where &lt;a href="https://blog.cloudflare.com/why-we-started-putting-unpopular-assets-in-memory/"&gt;they put unpopular items in the RAM and more popular items into their SSD storage&lt;/a&gt;.
They use a multilayered cache system of RAM then SSD. While they have some extremely fast SSDs, it turns out &lt;a href="https://www.usenix.org/conference/fast12/reducing-ssd-read-latency-nand-flash-program-and-erase-suspension"&gt;when you read and write to them at the same time, you can suffer a performance penalty&lt;/a&gt;.
To avoid that penalty, they realised that having unpopular items (items never hit or hit only once) purely in the RAM allowed their overall system to perform better.
It may not be perfect but they got some interesting results!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Looking at caching from an application's point of view, the layers may look a bit different but the concept is still the same.
We move from the fastest layers which have limited space to slower layers which have more space.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In-Memory Cache&lt;/li&gt;
&lt;li&gt;Redis/Memcached&lt;/li&gt;
&lt;li&gt;Database/File&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While it might seem simple enough to implement yourself, there are a few considerations to keep in mind for building a scalable multilayered caching solution.&lt;/p&gt;
&lt;h2 id="keeping-cache-layers-up-to-date"&gt;Keeping Cache Layers Up-to-Date&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Scenario: You have multiple instances of an application with their own local caches (in-memory) while also having a shared cache (Redis).&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Like in a normal caching scenario, you want to avoid cache misses. In multilayered caching, we have two types of cache misses - &lt;em&gt;close&lt;/em&gt; misses and &lt;em&gt;complete&lt;/em&gt; misses.
If your in-memory cache does not have the item but Redis does, this is a &lt;em&gt;close&lt;/em&gt; cache miss.
You will need to propagate the cache result back to your in-memory cache to achieve maximum performance.&lt;/p&gt;
&lt;p&gt;You could do this via a background task however this wouldn't scale. It would require iterating all keys of one cache layer and comparing them to the keys in another.&lt;/p&gt;
&lt;p&gt;To get the best benefit here, you will want only propagate the item if you actually need it. This keeps your in-memory cache as small as what it actually requires.
Because we are having to fetch the item from the shared cache anyway, we can spend a few extra cycles storing it in our local in-memory cache.&lt;/p&gt;
&lt;p&gt;The extra time spent storing it in our in-memory cache should pale in comparison to the time required for a &lt;em&gt;complete&lt;/em&gt; cache miss.&lt;/p&gt;
&lt;h2 id="managing-evictions"&gt;Managing Evictions&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Scenario: You have an in-memory cache and a filesystem cache for a single application instance&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Depending on the in-memory caching solution, you might already have an auto-eviction system.
This can be found, for example, in Microsoft's &lt;a href="https://docs.microsoft.com/en-us/dotnet/api/system.runtime.caching.memorycache?view=dotnet-plat-ext-3.1&amp;amp;viewFallbackFrom=netcore-3.1"&gt;MemoryCache&lt;/a&gt;.
Unlike what is available in something like Redis though, caching to a file is both extremely slow and doesn't have a method to auto-evict expired items.&lt;/p&gt;
&lt;p&gt;While your code may consider expired cache items as &amp;quot;missed&amp;quot;, its important to actually evict the expired records as they may be taking up precious space in memory, disk or a database.
It seems pretty straight forward, loop over the items known to be in the cache and evict any expired records.&lt;/p&gt;
&lt;p&gt;Its important to consider that some cache layer technologies may have optimizations that allow bulk eviction of records instead of individual evictions.
For example, a database cache layer would likely be able to query all expired items at once and be able to run a single &amp;quot;delete&amp;quot; operation.&lt;/p&gt;
&lt;p&gt;This bulk eviction &amp;quot;cleanup&amp;quot; is a good candidate for a background task - something where there are few instances of it and it can start the cleanup at regular intervals.&lt;/p&gt;
&lt;h2 id="background-refreshing-stale-vs-expired-cache-items"&gt;Background Refreshing (Stale vs Expired Cache Items)&lt;/h2&gt;
&lt;p&gt;Background refreshing isn't exclusive to a multilayer cache solution however it can be invaluable for maximising performance in one.
The important part for background refreshes is working out the best time for refreshing.
Refreshing too early may put an unnecessary strain on the data source however refreshing too late may have the data be overly stale.&lt;/p&gt;
&lt;p&gt;The control of the refreshing is important too - you don't want to do this on a schedule as the cache may be overly eager.
Like propagating between cache layers, you want to perform this if the cache item is actively being hit.&lt;/p&gt;
&lt;p&gt;To keep throughput up, we need to simultaneously return our &amp;quot;stale&amp;quot; cache item while triggering a refresh to update our data.
This update of data needs to hit every cache layer too so other application instances can benefit from the refreshed data.&lt;/p&gt;
&lt;h2 id="distributed-locking"&gt;Distributed Locking&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Scenario: You have multiple instances of an application with their own local caches (in-memory) while also having a shared cache (Redis).&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;If you're looking at a multilayered caching solution, you likely are running multiple instances of your application.
If &amp;quot;Web Server 1&amp;quot; is already attempting to update Redis then &amp;quot;Web Server 2&amp;quot; doesn't need to waste any time doing the same.
This is important to factor especially if retrieving the original data is an expensive operation.&lt;/p&gt;
&lt;p&gt;Distributed locking helps alleviate this however there is a catch - you don't want multiple requests on the same server checking the distributed cache every time for a lock.
If the same server already has a lock, you will want to track that locally in-memory so the lock-check is faster.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Layered caching can provide the best of multiple different cache types.
You can get the performance of in-memory cache with the larger cache sizes from a Redis instance, database or file system.
It won't automatically solve every caching performance problem but in the right scenarios, can be an extremely useful tool.&lt;/p&gt;
&lt;p&gt;I hope these tips can help you out with your own caching solution. If you don't want to roll your own, check out my library &lt;a href="https://www.cachetower.com/"&gt;Cache Tower&lt;/a&gt; which supports these things and more.&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>Levenshtein Distance with SIMD</title>
			<link>https://turnerj.com/blog/levenshtein-distance-with-simd</link>
			<description>Using CPU-specific instructions for even more performance</description>
			<enclosure url="https://turnerj.com/blog/images/social/levenshtein-distance-with-simd.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/levenshtein-distance-with-simd</guid>
			<pubDate>Wed, 04 Mar 2020 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;This is a bonus part because the other post was already jam-packed with optimizations plus this is a pretty exotic optimization that less developers are likely to directly use.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/SIMD"&gt;&amp;quot;Single Instruction, Multiple Data&amp;quot; (SIMD)&lt;/a&gt; is a method by which you can operate on a vector of data - allowing for certain mathematical and logic operations to take place on every element in the vector &lt;em&gt;at the same time&lt;/em&gt;. This differs from &lt;a href="https://en.wikipedia.org/wiki/Simultaneous_multithreading"&gt;Simultaneous Multithreading (SMT)&lt;/a&gt; where threads perform independent instructions - SIMD does a single instruction but to more than one bit of data at once.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;To make things more confusing, there is also &lt;a href="https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads"&gt;&amp;quot;Single Instruction, Multiple Threads&amp;quot; (SIMT)&lt;/a&gt; which is effectively how modern GPUs operate but that is a topic for a different post.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you haven't heard of SIMD before, you may have heard of it under specific implementation names such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/MMX_(instruction_set)"&gt;MMX&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/3DNow!"&gt;3DNow!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions"&gt;SSE&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/SSE2"&gt;SSE2&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/SSE3"&gt;SSE3&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/SSE4"&gt;SSE4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Advanced_Vector_Extensions"&gt;AVX&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2"&gt;AVX2&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/AVX-512"&gt;AVX-512&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a very low level, CPU specific, optimization and is suited for algorithms that can be vectorized. Some instructions operate on 128-bits, some on 256-bits and some can go all the way to 512-bits (AVX-512).&lt;/p&gt;
&lt;p&gt;The Levenshtein Distance algorithm isn't exactly a good candidate as processing a single cell relies on the computation of the cells around it. Nevertheless, there are some areas that SIMD instructions will still help us.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;To target SIMD instructions in our code, we will be making using of &lt;a href="https://devblogs.microsoft.com/dotnet/hardware-intrinsics-in-net-core/"&gt;new APIs specifically in .NET&lt;/a&gt; though SIMD instructions are most commonly found in lower-level languages like C, C++ or hand-written assembly. In the future, &lt;a href="https://github.com/WebAssembly/simd"&gt;even WebAssembly may support SIMD instructions&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Let's run through a super basic example of a SIMD vector operation, adding numbers across two vectors.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[ 1, 2, 4, 6 ]
  +  +  +  +
[ 3, 5, 3, 2 ]
  =  =  =  =
[ 4, 7, 7, 8 ]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we look at the individual columns in the above vector calculation, you'll see how it operates (&lt;code&gt;1 + 3 = 4&lt;/code&gt;, &lt;code&gt;2 + 5 = 7&lt;/code&gt;, &lt;code&gt;4 + 3 = 7&lt;/code&gt;, &lt;code&gt;6 + 2 = 8&lt;/code&gt;). Assuming those numbers are all 32-bit Signed Integers, that could be the SSE2 Add instruction &lt;a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_add_epi32&amp;amp;expand=94"&gt;&amp;quot;PADDD&amp;quot;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let's look at another example, comparing two vectors.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[  1,  2,  4,  6 ]
  ==  ==  ==  ==
[  4,  2,  4,  9 ]
   =   =   =   =
[  0, -1, -1,  0 ]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What we are getting as a result here is the HIGH and LOW value where HIGH (all &lt;em&gt;bits&lt;/em&gt; are &lt;code&gt;1&lt;/code&gt;) means equal and LOW (all &lt;em&gt;bits&lt;/em&gt; are &lt;code&gt;0&lt;/code&gt;) means not equal. We are using a 32-bit Signed Integers again here so an &amp;quot;all bits are &lt;code&gt;1&lt;/code&gt;&amp;quot; case means our result is &lt;code&gt;-1&lt;/code&gt;. If we used an unsigned number, the value would be the maximum value of that number instead.&lt;/p&gt;
&lt;p&gt;Using 32-bit Signed Integers, the instruction to do the vector comparison could be the SSE2 Compare Equals instruction &lt;a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cmpeq_epi32&amp;amp;expand=94,773"&gt;&amp;quot;PCMPEQD&amp;quot;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Where we can actually use SIMD instructions for our Levenshtein Distance optimizations include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Comparing the beginning and end characters of a string&lt;/li&gt;
&lt;li&gt;Filling our initial row with an incrementing sequence&lt;/li&gt;
&lt;li&gt;Branchless calculating the minimum of two values&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="comparing-the-beginning-and-end-characters-of-a-string"&gt;Comparing the beginning and end characters of a string&lt;/h3&gt;
&lt;p&gt;Put simply, our best comparison/trimming code for the start and end of strings was performing one-character comparisons at a time.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var startIndex = 0;
var sourceEnd = source.Length;
var targetEnd = target.Length;

while (startIndex &amp;lt; sourceEnd &amp;amp;&amp;amp; startIndex &amp;lt; targetEnd &amp;amp;&amp;amp; source[startIndex] == target[startIndex])
{
    startIndex++;
}
while (startIndex &amp;lt; sourceEnd &amp;amp;&amp;amp; startIndex &amp;lt; targetEnd &amp;amp;&amp;amp; source[sourceEnd - 1] == target[targetEnd - 1])
{
    sourceEnd--;
    targetEnd--;
}

var sourceLength = sourceEnd - startIndex;
var targetLength = targetEnd - startIndex;

ReadOnlySpan&amp;lt;char&amp;gt; sourceSpan = source;
ReadOnlySpan&amp;lt;char&amp;gt; targetSpan = target;

sourceSpan = sourceSpan.Slice(startIndex, sourceLength);
targetSpan = targetSpan.Slice(startIndex, targetLength);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In .NET, the individual &lt;code&gt;char&lt;/code&gt; of a string is a &lt;code&gt;ushort&lt;/code&gt; - a 16-bit value. With this in mind, we could compare 8 characters at a time with a 128-bit vector or 16 characters with a 256-bit vector. We'll opt for the 16 character comparison which will utilise AVX2 SIMD instructions.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var charactersAvailableToTrim = Math.Min(sourceEnd, targetEnd);
if (charactersAvailableToTrim &amp;gt;= 16)
{
	fixed (char* sourcePtr = source)
	fixed (char* targetPtr = target)
	{
		var sourceUShortPtr = (ushort*)sourcePtr;
		var targetUShortPtr = (ushort*)targetPtr;

		while (charactersAvailableToTrim &amp;gt;= 16)
		{
			var sectionEquality = Avx2.MoveMask(
				Avx2.CompareEqual(
					Avx.LoadDquVector256(sourceUShortPtr + startIndex),
					Avx.LoadDquVector256(targetUShortPtr + startIndex)
				).AsByte()
			);

			if (sectionEquality != -1)
			{
				break;
			}

			startIndex += 16;
			charactersAvailableToTrim -= 16;
		}

		while (charactersAvailableToTrim &amp;gt;= 16)
		{
			var sectionEquality = Avx2.MoveMask(
				Avx2.CompareEqual(
					Avx.LoadDquVector256(sourceUShortPtr + (sourceEnd - 16 + 1)),
					Avx.LoadDquVector256(targetUShortPtr + (targetEnd - 16 + 1))
				).AsByte()
			);

			if (sectionEquality != -1)
			{
				break;
			}

			sourceEnd -= 16;
			targetEnd -= 16;
			charactersAvailableToTrim -= 16;
		}
	}
}

while (charactersAvailableToTrim &amp;gt; 0 &amp;amp;&amp;amp; source[startIndex] == target[startIndex])
{
	charactersAvailableToTrim--;
	startIndex++;
}

while (charactersAvailableToTrim &amp;gt; 0 &amp;amp;&amp;amp; source[sourceEnd - 1] == target[targetEnd - 1])
{
	charactersAvailableToTrim--;
	sourceEnd--;
	targetEnd--;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are a lot of things going on in this block of code but let's focus on the most important part for us - the SIMD instructions.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;while (charactersAvailableToTrim &amp;gt;= 16)
{
	var sectionEquality = Avx2.MoveMask(
		Avx2.CompareEqual(
			Avx.LoadDquVector256(sourceUShortPtr + startIndex),
			Avx.LoadDquVector256(targetUShortPtr + startIndex)
		).AsByte()
	);

	if (sectionEquality != -1)
	{
		break;
	}
	startIndex += 16;
	charactersAvailableToTrim -= 16;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our SIMD instructions are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Avx.LoadDquVector256&lt;/code&gt;: Loads a 256-bit vector from a pointer (&lt;a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=vlddqu&amp;amp;expand=3296"&gt;&amp;quot;VLDDQU&amp;quot;&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Avx2.CompareEqual&lt;/code&gt;: Our equality comparison of two 256-bit vectors (&lt;a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=vpcmpeqw&amp;amp;expand=3296,766,766"&gt;&amp;quot;VPCMPEQW&amp;quot;&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Avx2.MoveMask&lt;/code&gt;: Creates a bitmask from the most significant bit of each item in the vector (&lt;a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3296,766,766&amp;amp;text=vpmovmskb"&gt;&amp;quot;VPMOVMSKB&amp;quot;&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Basically we are saying &amp;quot;Unless all characters of this 16-character chunk are equal, break from the loop&amp;quot;.&lt;/p&gt;
&lt;p&gt;Because we are operating in 16-character blocks, we will still fall back to individual character comparisons so we can have the smallest strings to compare in the main Levenshtein Distance calculating code.&lt;/p&gt;
&lt;h3 id="filling-our-initial-row-with-an-incrementing-sequence"&gt;Filling our initial row with an incrementing sequence&lt;/h3&gt;
&lt;p&gt;In Part 2, we showed how there is a simple loop that fills in the initial row of data.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;for (var i = 1; i &amp;lt;= target.Length; ++i)
{
	previousRow[i] = i;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But we can go faster, much faster...&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var columnIndex = 0;
var columnsRemaining = previousRow.Length;

fixed (int* previousRowPtr = previousRow)
{
	var lastVector256 = Vector256.Create(0, 1, 2, 3, 4, 5, 6, 7);
	var shiftVector256 = Vector256.Create(8);

	while (columnsRemaining &amp;gt;= 8)
	{
		columnsRemaining -= 8;
		Avx.Store(previousRowPtr + columnIndex, lastVector256);
		lastVector256 = Avx2.Add(lastVector256, shiftVector256);
		columnIndex += 8;
	}

	if (columnsRemaining &amp;gt; 4)
	{
		columnsRemaining -= 4;
		previousRowPtr[columnIndex] = ++columnIndex;
		previousRowPtr[columnIndex] = ++columnIndex;
		previousRowPtr[columnIndex] = ++columnIndex;
		previousRowPtr[columnIndex] = ++columnIndex;
	}

	while (columnsRemaining &amp;gt; 0)
	{
		columnsRemaining--;
		previousRowPtr[columnIndex] = ++columnIndex;
	}
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our implementation now is partially using SIMD and is partially unrolled. Because our &lt;code&gt;previousRow&lt;/code&gt; is an array of 32-bit Integers, we are only doing 8 characters at a time on a 256-bit vector.&lt;/p&gt;
&lt;p&gt;Focusing on the SIMD instructions:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var lastVector256 = Vector256.Create(0, 1, 2, 3, 4, 5, 6, 7);
var shiftVector256 = Vector256.Create(8);

while (columnsRemaining &amp;gt;= 8)
{
	columnsRemaining -= 8;
	Avx.Store(previousRowPtr + columnIndex, lastVector256);
	lastVector256 = Avx2.Add(lastVector256, shiftVector256);
	columnIndex += 8;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We create an initial vector of the first 8 values (&lt;code&gt;lastVector256&lt;/code&gt;), we get a vector we want to increment by (&lt;code&gt;shiftVector256&lt;/code&gt;) and we simply work up the remaining columns of &lt;code&gt;previousRow&lt;/code&gt; 8 items at a time.&lt;/p&gt;
&lt;p&gt;Our SIMD instructions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Avx.Store&lt;/code&gt;: Save a 256-bit vector to the specified pointer location (&lt;a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3296,766,766,5596,5596,5596&amp;amp;text=_mm256_store_si256"&gt;&amp;quot;VMOVDQA&amp;quot;&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Avx2.Add&lt;/code&gt;: Adds two 256-bit vectors together, returning the result (&lt;a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3296,766,766,5596,5596,5596,97,97,97&amp;amp;text=vpaddd&amp;amp;techs=AVX2"&gt;&amp;quot;VPADDD&amp;quot;&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="branchless-calculating-the-minimum-of-two-values"&gt;Branchless calculating the minimum of two values&lt;/h3&gt;
&lt;p&gt;In Part 3, we covered cutting out branches by re-organizing code or unrolling loops. We can actually take it slightly further using SIMD instructions by abusing/misusing how a SIMD &lt;code&gt;Min&lt;/code&gt; comparison works.&lt;/p&gt;
&lt;p&gt;Here is our last version of the calculating code:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;localCost = previousDiagonal;
deletionCost = previousRow[j];
if (sourceChar != target[j - 1])
{
    localCost = Math.Min(previousColumn, localCost);
    localCost = Math.Min(deletionCost, localCost);
    localCost++;
}
previousColumn = localCost;
previousRow[j++] = localCost;
previousDiagonal = deletionCost;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can't eliminate the if-statement - I've tried, multiple times - but we can eliminate the branches that would occur as part of &lt;code&gt;Math.Min&lt;/code&gt;. To be clear, &lt;code&gt;Math.Min&lt;/code&gt; isn't a slow function and by itself, it should perform faster than what we are about to do however when there are two in a row, we can start to benefit from our optimization.&lt;/p&gt;
&lt;p&gt;To gain the most performance out of this, we don't want to jump to and from vectors - the more operations we can do while it is a vector, the better it will be for our performance. This is the latency-vs-throughput potential with SIMD instructions.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;localCostVector = previousDiagonalVector;
deletionCostVector = Vector128.Create(previousRowPtr[columnIndex]);
if (sourcePrevChar != targetPtr[columnIndex])
{
	localCostVector = Sse2.Add(
		Sse41.Min(
			Sse41.Min(
				previousColumnVector,
				localCostVector
			),
			deletionCostVector
		),
		allOnesVector
	);
}
previousColumnVector = localCostVector;
previousRowPtr[columnIndex++] = localCostVector.GetElement(0);
previousDiagonalVector = deletionCostVector;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While most of the variables are the same with just &amp;quot;Vector&amp;quot; at the end, there are some differences to know about.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;deletionCostVector = Vector128.Create(previousRowPtr[columnIndex]);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This creates a 128-bit vector of the value at that specific pointer. That is to say, it is only a single value that is in the vector 4 times.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;localCostVector = Sse2.Add(
	Sse41.Min(
		Sse41.Min(
			previousColumnVector,
			localCostVector
		),
		deletionCostVector
	),
	allOnesVector
);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Like the original code, we are doing two &lt;code&gt;Math.Min&lt;/code&gt; equivalent operations and then adding &lt;code&gt;1&lt;/code&gt; to the result. The vector &lt;code&gt;allOnesVector&lt;/code&gt; is aptly named because the vector only contains the number &lt;code&gt;1&lt;/code&gt; in all positions.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;previousRowPtr[columnIndex++] = localCostVector.GetElement(0);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, we end with taking only the first element from the vector and storing it at our specific column.&lt;/p&gt;
&lt;p&gt;You might realise now why I said we were abusing/misusing SIMD here - we don't actually care about how big the vector is as we simply don't use anything more than one item in the vector. This is because we are not taking advantage of the vector, we are taking advantage that &lt;code&gt;Sse41.Min&lt;/code&gt; makes our code effectively branchless in the minimum value comparison. With the constraint that the Levenshtein Distance calculation relies on previously calculated cells, I can't see a situation where you can make use of the full vector to speed up the calculations further.&lt;/p&gt;
&lt;p&gt;Our SIMD instructions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Sse41.Min&lt;/code&gt;: Gets the minimum of each value in the vector (&lt;a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3296,766,766,5596,5596,5596,97,97,97,3726,3726,3726&amp;amp;text=pminud&amp;amp;techs=SSE4_1"&gt;&amp;quot;PMINUD&amp;quot;&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Sse2.Add&lt;/code&gt;: Adds the values across the vectors (&lt;a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3296,766,766,5596,5596,5596,97,97,97,3726,3726,3726,94,94&amp;amp;techs=SSE2&amp;amp;text=paddd"&gt;&amp;quot;PADDD&amp;quot;&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="caveats-of-simd-instructions"&gt;Caveats of SIMD Instructions&lt;/h3&gt;
&lt;p&gt;Besides that the data and algorithm needing to support vectorization, it isn't a magic bullet. When digging this deep for performance, you'll be looking at the latency and throughput of instructions.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The latency and throughput of instructions will differ per CPU model.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The best resource that I have found that digs into the latency and throughput for specific models are the &lt;a href="https://www.agner.org/optimize/#manuals"&gt;optimization manuals by Agner&lt;/a&gt;. That said, Intel does have some information about them on their &lt;a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/"&gt;Intrinsics Guide&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It is important for you to benchmark your code to see if it is an appropriate optimization for your own code and the machines you are targeting.&lt;/p&gt;
&lt;h2 id="further-reading-using-simd-for-sorting"&gt;Further Reading: Using SIMD for Sorting&lt;/h2&gt;
&lt;p&gt;In this post, I am really just scratching the surface of what SIMD instructions can do. Dan Shechter (aka. &lt;a href="https://twitter.com/damageboy"&gt;damageboy&lt;/a&gt;) has been doing some amazing work building a QuickSort implementation using AVX2 instructions. If my post has you curious, definitely go through &lt;a href="https://bits.houmus.org/2020-01-28/this-goes-to-eleven-pt1"&gt;his series&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="helpful-links"&gt;Helpful Links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/"&gt;Intel's Intrinsics Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://db.in.tum.de/%7Efinis/x86-intrin-cheatsheet-v2.2.pdf"&gt;x86 Intrinsics Cheatsheet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.agner.org/optimize/#manuals"&gt;Agner's amazing optimization manuals&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>Levenshtein Distance (Part 3: Optimize Everything!)</title>
			<link>https://turnerj.com/blog/levenshtein-distance-part-3-optimize-everything</link>
			<description>Less Allocations &amp; Smarter Processing</description>
			<enclosure url="https://turnerj.com/blog/images/social/levenshtein-distance-part-3-optimize-everything.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/levenshtein-distance-part-3-optimize-everything</guid>
			<pubDate>Wed, 04 Mar 2020 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;In &lt;a href="https://turnerj.com/blog/levenshtein-distance-part-1-what-is-it"&gt;Part 1&lt;/a&gt; we went through what the Levenshtein Distance is and in &lt;a href="https://turnerj.com/blog/levenshtein-distance-part-2-gotta-go-fast"&gt;Part 2&lt;/a&gt; we covered a few major optimizations for memory and performance. In Part 3 (this post) we will be taking things up to 11 and trying to squeeze every bit of performance out of our code.&lt;/p&gt;
&lt;p&gt;While there are some aspects of this post that are language agnostic, this post will talk about a number of C# specific optimizations - there may be equivalent optimizations in your programming language of choice.&lt;/p&gt;
&lt;h2 id="being-smarter-with-data"&gt;Being Smarter with Data&lt;/h2&gt;
&lt;p&gt;In Part 2, one of our best versions had an inner loop that looked like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;for (var i = 1; i &amp;lt;= source.Length; ++i)
{
    var previousDiagonal = previousRow[0];
    var previousColumn = previousRow[0]++;

    for (var j = 1; j &amp;lt;= target.Length; ++j)
    {
        var insertOrDelete = Math.Min(previousColumn, previousRow[j]) + 1;
        var edit = previousDiagonal + (source[i - 1] == target[j - 1] ? 0 : 1);

        previousColumn = Math.Min(insertOrDelete, edit);
        previousDiagonal = previousRow[j];
        previousRow[j] = previousColumn;
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you look carefully at how we are accessing some of this data, we are doing some relatively repetitive actions - specifically, how we access the "source" character for the comparison.&lt;/p&gt;
&lt;p&gt;Each iteration of the inner-loop, we are looking up &lt;code&gt;source[i - 1]&lt;/code&gt; which we can actually cache in the body of the outer-loop like so:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;for (var i = 1; i &amp;lt;= source.Length; ++i)
{
    var previousDiagonal = previousRow[0];
    var previousColumn = previousRow[0]++;
    var sourceChar = source[i - 1];

    for (var j = 1; j &amp;lt;= target.Length; ++j)
    {
        var insertOrDelete = Math.Min(previousColumn, previousRow[j]) + 1;
        var edit = previousDiagonal + (sourceChar == target[j - 1] ? 0 : 1);

        previousColumn = Math.Min(insertOrDelete, edit);
        previousDiagonal = previousRow[j];
        previousRow[j] = previousColumn;
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While it might not be the largest performance boost, we are in the territory where every little performance boost helps.&lt;/p&gt;
&lt;p&gt;There is another way we can be smarter here by analysing that inner-most loop logic. We are always doing 2x &lt;code&gt;Math.Min&lt;/code&gt; calls and always adding our source-target comparison to our &lt;code&gt;previousDiagonal&lt;/code&gt; value. These might not be the slowest operations you can run but when you run them thousands of times, it does add up.&lt;/p&gt;
&lt;p&gt;If you shift the code around just right, we can actually cut down the number of operations in one path of our code.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;for (var i = 1; i &amp;lt;= source.Length; ++i)
{
    var previousDiagonal = previousRow[0];
    var previousColumn = previousRow[0]++;
    var sourceChar = source[i - 1];

    for (var j = 1; j &amp;lt;= target.Length; ++j)
    {
        if (sourceChar == target[j - 1])
        {
            previousColumn = previousDiagonal;
        }
        else
        {
            previousColumn = Math.Min(previousColumn, previousDiagonal);
            previousColumn = Math.Min(previousColumn, previousRow[j]);
            previousColumn++;
        }

        previousDiagonal = previousRow[j];
        previousRow[j] = previousColumn;
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This change plays on the fact that when the two characters are equal, the substitution cost (aka. &lt;code&gt;previousDiagonal&lt;/code&gt;) will be the lowest cost of the three values to compare.&lt;/p&gt;
&lt;p&gt;Going one step further, you might notice that one path actually has two calls to &lt;code&gt;previousRow[j]&lt;/code&gt; - we can eliminate this too with more local variables. With a bit of refactoring, it could look something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;for (var i = 1; i &amp;lt;= source.Length; ++i)
{
    var previousDiagonal = previousRow[0];
    var previousColumn = previousRow[0]++;
    var sourceChar = source[i - 1];

    for (var j = 1; j &amp;lt;= target.Length; ++j)
    {
        var localCost = previousDiagonal;
        var deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j] = localCost;
        previousDiagonal = deletionCost;
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These tweaks combined will amount to a decent increase in performance but we aren't done yet...&lt;/p&gt;
&lt;h2 id="span-ing-the-memory"&gt;&lt;code&gt;Span&lt;/code&gt;-ing the Memory&lt;/h2&gt;
&lt;p&gt;Memory allocations - they aren't all bad BUT if we can eliminate some, that will help us. We are dealing with strings, potentially very big strings, and depending how we handle them we can allocate a lot of memory.&lt;/p&gt;
&lt;p&gt;In Part 2, I showed a way we can trim the strings that have equal prefixes and suffixes to give us a performance boost. This code, while works, actually isn't the best.&lt;/p&gt;
&lt;p&gt;As a reminder, here is the piece of code I shared:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var startIndex = 0;
var sourceEnd = source.Length;
var targetEnd = target.Length;

while (startIndex &amp;lt; sourceEnd &amp;amp;&amp;amp; startIndex &amp;lt; targetEnd &amp;amp;&amp;amp; source[startIndex] == target[startIndex])
{
    startIndex++;
}
while (startIndex &amp;lt; sourceEnd &amp;amp;&amp;amp; startIndex &amp;lt; targetEnd &amp;amp;&amp;amp; source[sourceEnd - 1] == target[targetEnd - 1])
{
    sourceEnd--;
    targetEnd--;
}

var sourceLength = sourceEnd - startIndex;
var targetLength = targetEnd - startIndex;

source = source.Substring(startIndex, sourceLength);
target = target.Substring(startIndex, targetLength);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our biggest problem here is the last two lines and their &lt;code&gt;Substring&lt;/code&gt; call. In C#, getting a substring of another string performs another allocation equal to the length of the new string. So if we had a 500 character string being substring'd to 200 characters, we would be allocating 200 characters worth of string.&lt;/p&gt;
&lt;p&gt;This might not be bad individually but we can do better - we can do a ZERO allocation substring by using &lt;a href="https://devblogs.microsoft.com/dotnet/welcome-to-c-7-2-and-span/"&gt;a (relatively) new feature of C# called &lt;code&gt;Span&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The most concise description I can give, &lt;code&gt;Span&lt;/code&gt; and its comparison type &lt;code&gt;ReadOnlySpan&lt;/code&gt; allow access to a block of memory. This block of memory might be an array, it might be a pointer or it might be a string. Accessing data in a &lt;code&gt;Span&lt;/code&gt; is the same as accessing data in a normal array like &lt;code&gt;mySpan[42]&lt;/code&gt;. While wrapping these various types of memory is extremely useful for safe access to data, it also has one killer function - &lt;code&gt;Slice&lt;/code&gt; - giving us a slice of the memory without actually allocating/copying it.&lt;/p&gt;
&lt;p&gt;To use it in our example with a string, we need to cast it to &lt;code&gt;ReadOnlySpan&amp;lt;char&amp;gt;&lt;/code&gt; (We must use &lt;code&gt;ReadOnlySpan&lt;/code&gt; specifically because strings are immutable). After that, we simply replace our &lt;code&gt;Substring&lt;/code&gt; calls with their &lt;code&gt;Slice&lt;/code&gt; equivalent.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var startIndex = 0;
var sourceEnd = source.Length;
var targetEnd = target.Length;

while (startIndex &amp;lt; sourceEnd &amp;amp;&amp;amp; startIndex &amp;lt; targetEnd &amp;amp;&amp;amp; source[startIndex] == target[startIndex])
{
    startIndex++;
}
while (startIndex &amp;lt; sourceEnd &amp;amp;&amp;amp; startIndex &amp;lt; targetEnd &amp;amp;&amp;amp; source[sourceEnd - 1] == target[targetEnd - 1])
{
    sourceEnd--;
    targetEnd--;
}

var sourceLength = sourceEnd - startIndex;
var targetLength = targetEnd - startIndex;

ReadOnlySpan&amp;lt;char&amp;gt; sourceSpan = source;
ReadOnlySpan&amp;lt;char&amp;gt; targetSpan = target;

sourceSpan = sourceSpan.Slice(startIndex, sourceLength);
targetSpan = targetSpan.Slice(startIndex, targetLength);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One caveat is that from now on, we can only deal with methods that accept &lt;code&gt;ReadOnlySpan&amp;lt;char&amp;gt;&lt;/code&gt; so if a method only accepts &lt;code&gt;string&lt;/code&gt; type, we would need to re-allocate our span back to a full string.&lt;/p&gt;
&lt;p&gt;This can be one of the few downsides with these types - there are many APIs that simply don't have overloads to accept &lt;code&gt;Span&lt;/code&gt; etc. That said, the .NET team have done a lot of work adding new overloads to accept &lt;code&gt;Span&lt;/code&gt; or &lt;code&gt;ReadOnlySpan&lt;/code&gt; across the entire framework.&lt;/p&gt;
&lt;p&gt;Even with that in mind, &lt;code&gt;Span&lt;/code&gt; has other limitations like it being stack-only, you can't store a &lt;code&gt;Span&lt;/code&gt; on the heap (eg. as a property in a class). With what we are doing above though, &lt;code&gt;Span&lt;/code&gt; works out perfectly.&lt;/p&gt;
&lt;p&gt;For more information about &lt;code&gt;Span&lt;/code&gt;, have a read of &lt;a href="https://adamsitnik.com/Span/"&gt;Adam Sitnik's blog post about &lt;code&gt;Span&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="f-pooling-around-with-arrays"&gt;&lt;del&gt;F&lt;/del&gt; &lt;strong&gt;P&lt;/strong&gt;ooling around with Arrays&lt;/h2&gt;
&lt;p&gt;Strings are not our only source of allocations in our code. If we look back to how we instantiate our &lt;code&gt;previousRow&lt;/code&gt; array, that is itself an allocation.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var previousRow = new int[target.Length + 1];
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our problem is that we &lt;em&gt;need&lt;/em&gt; this array but creating new arrays is an allocation - how do we remove this allocation? With &lt;a href="https://docs.microsoft.com/en-us/dotnet/api/system.buffers.arraypool-1?view=netcore-3.1"&gt;&lt;code&gt;ArrayPool&lt;/code&gt;&lt;/a&gt; of course!&lt;/p&gt;
&lt;p&gt;Another one of the gems that have been added to .NET is sharing/renting arrays. We ask for an array of X size and we get an array at least that big. After we are done, we just return the array back to the pool for it to be used elsewhere.&lt;/p&gt;
&lt;p&gt;While there is a lot of magic that goes on behind the scenes to make that work and scale with various size arrays, for the size arrays we can reasonably deal with, &lt;code&gt;ArrayPool&lt;/code&gt; is a perfect fit.&lt;/p&gt;
&lt;p&gt;So how do we use this in our code? Just two places need to change - the start and end of our code.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Start of Code&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var arrayPool = ArrayPool&amp;lt;int&amp;gt;.Shared;
var previousRow = arrayPool.Rent(target.Length + 1);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;End of Code&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var result = previousRow[targetLength];
arrayPool.Return(pooledArray);
return result;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That's it - our array allocation is now gone thanks to the hard work of the .NET team.&lt;/p&gt;
&lt;p&gt;There are limitations with &lt;code&gt;ArrayPool&lt;/code&gt; like by default the array won't be empty, the array may be longer than what you rented and has a default max rent size of 1,048,576.&lt;/p&gt;
&lt;p&gt;For more information about &lt;code&gt;ArrayPool&lt;/code&gt;, Adam Sitnik did &lt;a href="https://adamsitnik.com/Array-Pool/"&gt;another great post&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="dabbling-in-parallel-processing"&gt;Dabbling in Parallel Processing&lt;/h2&gt;
&lt;p&gt;The Levenshtein Distance algorithm isn't exactly parallel friendly. In my effort to write the fastest Levenshtein Distance implementation, I did find a brute force way to make it happen. I don't want to get your hopes up - while the code can be very fast, it is riddled with race conditions because &lt;em&gt;threading is hard&lt;/em&gt; - this means that it won't always be correct if you run it on the same strings each time. If you have the time and patience to implement it properly, you'll certainly have one of the fastest implementations around.&lt;/p&gt;
&lt;p&gt;Anyway, let's dig into this!&lt;/p&gt;
&lt;p&gt;The theory goes like this - if we divide a virtual matrix by the number of cores available on the machine, we could "stagger" our calculations. Visualising this as a matrix will give you some idea of how it would work and why it is hard.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-3-threading-empty-matrix.png" alt="Levenshtein Distance matrix of &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; divided into two sections for two threads"&gt;&lt;/p&gt;
&lt;p&gt;The two colours of the matrix above represent the area our threads will calculate and write to. When it comes to reading data is where the problem lies - the section on the right (blue) is dependent on the column with the letter "u" which is written to by the left side (pink). While it is certainly possible to carefully hand off from one thread to another - this is most certainly going to be the hardest part of the implementation to solve.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-3-threading-partial-matrix.png" alt="Levenshtein Distance matrix of &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; divided into two sections for two threads - partially calculated in each thread"&gt;&lt;/p&gt;
&lt;p&gt;We can see above what is might look like during a parallel calculation. The left thread (pink) needs to only perform scans on the first 4 characters of "Saturday" before moving onto the next line. The thread on the right (blue) can only start if the left thread (pink) has completed that row.&lt;/p&gt;
&lt;p&gt;This doesn't seem all bad - the left thread (pink) could run dozens of rows ahead of the right thread (blue) and we would still calculate everything correctly. In Part 2 though, we covered shrinking a full matrix down to a single row - how will that work for our parallel-ness? If we want to adopt the same optimization, it is going to make our threading a lot more complicated.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-3-threading-row-race-1.png" alt="Single row of calculations for the Levenshtein Distance with threading sections"&gt;&lt;/p&gt;
&lt;p&gt;Using the row above (with our same threading colours) as an example of what our calculation row would look like. The left thread (pink) can proceed because the right thread (blue) has written its first value for that row.&lt;/p&gt;
&lt;p&gt;Let's skip forward and say the left thread (pink) manages to fill in all its values for the next row while the right thread (blue) manages to fill one value.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-3-threading-row-race-2.png" alt="Single row of calculations for the Levenshtein Distance with threading sections and the left thread being a row ahead."&gt;&lt;/p&gt;
&lt;p&gt;Now we are in a bit of a pickle - if the left thread (pink) manages to write another row before the right thread (blue) reads the "shared" value, it will miscalculate.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-3-threading-row-race-3.png" alt="Single row of calculations for the Levenshtein Distance with threading sections and the left thread being two rows ahead."&gt;&lt;/p&gt;
&lt;p&gt;Oh no, now that shared value is &lt;code&gt;3&lt;/code&gt; instead of the expected &lt;code&gt;2&lt;/code&gt; - now our right thread (blue) will have the wrong insert cost when it starts the next row.&lt;/p&gt;
&lt;p&gt;Expand this problem across bigger string comparisons with more threads and the chances of hitting this condition go up unless you have measures in place to trigger the right threads at the right time - not impossible but not trivial either.&lt;/p&gt;
&lt;p&gt;Even with the performance bonus of a successful parallel implementation, you will be hit with performance penalities just keeping tracking of all the rows that threads are in not to mention the thread starting/stopping cost. You could (like me) just stuff threads in a &lt;code&gt;while(true)&lt;/code&gt; loop but let's be honest, that is a bad idea.&lt;/p&gt;
&lt;p&gt;With my (broken) implementation, for small strings it was up to 12x slower. For medium length strings (500 characters), it performed about the same as a non-parallel version. One the string length was around 8000 characters, it was performing up to 3x faster (with 8 threads) than a relatively well optimized non-parallel version.&lt;/p&gt;
&lt;p&gt;Keeping in mind my version is flawed, that is still a significant speed boost. With more threads, bigger performance gains could likely be made.&lt;/p&gt;
&lt;p&gt;In conclusion - I thought it would be interesting to share as a proof-of-concept but in reality, a good implementation of parallelism in Levenshtein Distance is not going to be a fun time.&lt;/p&gt;
&lt;p&gt;Unless you have absolutely HUGE strings, I wouldn't even bother going this direction.&lt;/p&gt;
&lt;h2 id="the-enemy-of-processing-branch-misprediction"&gt;The Enemy of Processing: Branch Misprediction&lt;/h2&gt;
&lt;p&gt;Processors are fast using a variety of tricks with one of them being &lt;a href="https://en.wikipedia.org/wiki/Branch_predictor"&gt;branch prediction&lt;/a&gt;. Put simply, it is the idea that the processor guesses whether a conditional jump will be taken or not. With its guess, it will start fetching, decoding and potentially even &lt;a href="https://en.wikipedia.org/wiki/Speculative_execution"&gt;speculatively executing it&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It works wonders when it guesses right but when it guesses wrong, a mispredict, the processor may need to unroll and re-execute the code correctly.&lt;/p&gt;
&lt;p&gt;All of this is important to consider given our nested for-loops - every loop performs a conditional jump (our comparison to the source or target strings). With two 1,000 character strings to compare, our inner for-loop would iterate 1,000,000 times. Two 8,000 character strings would iterate 64,000,000 times.&lt;/p&gt;
&lt;p&gt;Earlier in this post, we covered a clever optimization to avoid our &lt;code&gt;Math.Min&lt;/code&gt; calls - part of the benefit there wasn't avoiding &lt;em&gt;just another instruction&lt;/em&gt;, it was avoiding a conditional jump instruction.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;for (var i = 1; i &amp;lt;= source.Length; ++i)
{
    var previousDiagonal = previousRow[0];
    var previousColumn = previousRow[0]++;
    var sourceChar = source[i - 1];

    for (var j = 1; j &amp;lt;= target.Length; ++j)
    {
        var localCost = previousDiagonal;
        var deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            // The conditional jumps associated with Math.Min only execute
            // if the source character is not equal to the target character.
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j] = localCost;
        previousDiagonal = deletionCost;
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Even with this in mind, we still have a lot of conditional jumps going on in our code. To take this to the next level, we will want to think about &lt;a href="https://en.wikipedia.org/wiki/Loop_unrolling"&gt;loop unrolling&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The basic premise is, cut down on the number of instructions for each iteration of the loop. In our case, we will use this to avoid our "j &amp;lt;= target.Length;" cost for every iteration.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;for (var i = 1; i &amp;lt;= source.Length; ++i)
{
    var previousDiagonal = previousRow[0];
    var previousColumn = previousRow[0]++;
    var sourceChar = source[i - 1];

    var j = 1;
    var columnsRemaining = target.Length;

    int localCost;
    int deletionCost;

    while (columnsRemaining &amp;gt;= 8)
    {
        columnsRemaining -= 8;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;
    }

    if (columnsRemaining &amp;gt;= 4)
    {
        columnsRemaining -= 4;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;
    }

    while (columnsRemaining &amp;gt; 0)
    {
        columnsRemaining--;

        localCost = previousDiagonal;
        deletionCost = previousRow[j];
        if (sourceChar != target[j - 1])
        {
            localCost = Math.Min(previousColumn, localCost);
            localCost = Math.Min(deletionCost, localCost);
            localCost++;
        }
        previousColumn = localCost;
        previousRow[j++] = localCost;
        previousDiagonal = deletionCost;
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There is quite a bit to unpack for that code but one of the things you might be able to tell is how long some basic unrolling code might be.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;while (columnsRemaining &amp;gt;= 8)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our first of 3 processing chunks - we are unrolled to 8 columns of calculations at a time. This means for every loop, we've removed 7 conditional jumps that needed handling.&lt;/p&gt;
&lt;p&gt;Once this has processed all it can, we have between 0 and 7 columns remaining for processing.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;if (columnsRemaining &amp;gt;= 4)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In our second of 3 processing chunks, we attempt to unroll 4 columns if there are enough columns available to do so. We are removing 3 conditional jumps in this block.&lt;/p&gt;
&lt;p&gt;Once this has processed all it can, we have between 0 and 3 columns remaining for processing.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;while (columnsRemaining &amp;gt; 0)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In our final processing chunk, there is no unrolling - we process each item individually. At worst, we are only needing to loop through 3 columns.&lt;/p&gt;
&lt;p&gt;The actual calculation code in each chunk is identical, just replicated the number of times needed for the given chunk.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;localCost = previousDiagonal;
deletionCost = previousRow[j];
if (sourceChar != target[j - 1])
{
    localCost = Math.Min(previousColumn, localCost);
    localCost = Math.Min(deletionCost, localCost);
    localCost++;
}
previousColumn = localCost;
previousRow[j++] = localCost;
previousDiagonal = deletionCost;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This technique above, going from 8 to 4 to 1, was inspired by how the .NET runtime does this for &lt;a href="https://github.com/dotnet/runtime/blob/4f9ae42d861fcb4be2fcd5d3d55d5f227d30e723/src/libraries/System.Private.CoreLib/src/System/SpanHelpers.T.cs#L62-L118"&gt;their &lt;code&gt;SpanHelpers&lt;/code&gt; code&lt;/a&gt;. While there likely was some significance to the numbers chosen for optimizing cache lines relative to the size of the code unrolled, our code likely doesn't benefit to the same extent.&lt;/p&gt;
&lt;p&gt;One of the biggest drawbacks to applying loop unrolling like this is the dramatic increase in binary size for the same functionality, not to mention the maintenance overhead having all of those blocks repeated. A single mistake in one could break the whole calculation.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Don't even think about pushing that to its own function as function calls have overheads too unless the compiler will inline it for you!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;That said, loop unrolling still provided us a net benefit if all we cared about was raw performance.&lt;/p&gt;
&lt;p&gt;You could go further and unroll the outer loop some amount too. If done right, you would be able to minimise the number of lookups of characters in the target string. That however I will leave as an exercise to the reader.&lt;/p&gt;
&lt;h2 id="bonus-using-simd-instructions"&gt;Bonus: Using SIMD Instructions&lt;/h2&gt;
&lt;p&gt;Due in part to the extraordinary length of this post, I've split the longest part about SIMD instructions to a separate post. If you want to dive into how vectorizing CPU instructions can help us perform even faster - check it out &lt;a href="https://turnerj.com/blog/levenshtein-distance-with-simd"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;We've extracted a lot of performance out of an algorithm which isn't very performant while dramatically decreasing our memory usage.&lt;/p&gt;
&lt;p&gt;This whole blog series about Levenshtein Distance came up purely because I wanted to build a fast and memory efficient implementation.
Besides parallel support, I have actually made a version which implements every other performance feature I've talked about this series - I call it &lt;a href="https://github.com/Turnerj/Quickenshtein"&gt;Quickenshtein&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;If its not the fastest Levenshtein Distance implementation, it is surely close to it - all while allocating 0 bytes.&lt;/p&gt;
&lt;p&gt;If .NET is your thing and this could help you, check it out. If you want to implement your own version in another language, feel free to use my implementation as a guide.&lt;/p&gt;
&lt;p&gt;Until next time fellow readers - let your code be fast and your allocations be nil.&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>Levenshtein Distance (Part 2: Gotta Go Fast)</title>
			<link>https://turnerj.com/blog/levenshtein-distance-part-2-gotta-go-fast</link>
			<description>Faster Calculations &amp; Less Memory Usage</description>
			<enclosure url="https://turnerj.com/blog/images/social/levenshtein-distance-part-2-gotta-go-fast.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/levenshtein-distance-part-2-gotta-go-fast</guid>
			<pubDate>Thu, 13 Feb 2020 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;In &lt;a href="https://turnerj.com/blog/levenshtein-distance-part-1-what-is-it"&gt;Part 1&lt;/a&gt; I explained what the Levenshtein Distance is and that it is both computationally and memory inefficient in a simple form. In Part 2 (this post), I'll cover ways to decrease the memory overhead and increase the performance.&lt;/p&gt;
&lt;h2 id="example-levenshtein-implementation"&gt;Example Levenshtein Implementation&lt;/h2&gt;
&lt;p&gt;Before we get started, I'll walk through a real implementation of Levenshtein Distance with no optimizations - this is what we will be improving from.&lt;/p&gt;
&lt;p&gt;&lt;small&gt;Note: While the examples in this post are in C#, there is very little that couldn't be copied over to any other programming language with only minor edits.&lt;/small&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;public int CalculateDistance(string source, string target)
{
	var costMatrix = Enumerable
	  .Range(0, source.Length + 1)
	  .Select(line =&amp;gt; new int[target.Length + 1])
	  .ToArray();

	for (var i = 1; i &amp;lt;= source.Length; ++i)
	{
		costMatrix[i][0] = i;
	}

	for (var i = 1; i &amp;lt;= target.Length; ++i)
	{
		costMatrix[0][i] = i;
	}

	for (var i = 1; i &amp;lt;= source.Length; ++i)
	{
		for (var j = 1; j &amp;lt;= target.Length; ++j)
		{
			var insert = costMatrix[i][j - 1] + 1;
			var delete = costMatrix[i - 1][j] + 1;
			var edit = costMatrix[i - 1][j - 1] + (source[i - 1] == target[j - 1] ? 0 : 1);

			costMatrix[i][j] = Math.Min(Math.Min(insert, delete), edit);
		}
	}

	return costMatrix[source.Length][target.Length];
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Breaking this down:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var costMatrix = Enumerable
  .Range(0, source.Length + 1)
  .Select(line =&amp;gt; new int[target.Length + 1])
  .ToArray();
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This builds our matrix - an array of arrays - to be one longer than the source string (eg. "Sunday") in rows and one longer than the target string (eg. "Saturday") in columns.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;for (var i = 1; i &amp;lt;= source.Length; ++i)
{
	costMatrix[i][0] = i;
}

for (var i = 1; i &amp;lt;= target.Length; ++i)
{
	costMatrix[0][i] = i;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These two for-loops build the top row and left column of our matrix. With the example "Saturday" and "Sunday", it would look like this:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-2-empty-matrix.png" alt="Levenshtein Distance Matrix for the words &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; with only the left column and top row filled in."&gt;&lt;/p&gt;
&lt;p&gt;We have our main code:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;for (var i = 1; i &amp;lt;= source.Length; ++i)
{
	for (var j = 1; j &amp;lt;= target.Length; ++j)
	{
		var insert = costMatrix[i][j - 1] + 1;
		var delete = costMatrix[i - 1][j] + 1;
		var edit = costMatrix[i - 1][j - 1] + (source[i - 1] == target[j - 1] ? 0 : 1);

		costMatrix[i][j] = Math.Min(Math.Min(insert, delete), edit);
	}
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Simply put - for every column in every row, check the cell to my left (insert), the cell above (delete) and the cell diagonally to the left (edit). We do our Levenshtein Distance math (explained in Part 1) and we store that in the current cell.&lt;/p&gt;
&lt;p&gt;And finally, our humble return of the last cell in the matrix - our Levenshtein Distance.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;return costMatrix[source.Length][target.Length];
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let's start making this faster and more memory efficient!&lt;/p&gt;
&lt;h2 id="memory-optimizations"&gt;Memory Optimizations&lt;/h2&gt;
&lt;h3 id="turning-n1m1-into-n12"&gt;Turning &lt;code&gt;(n+1)*(m+1)&lt;/code&gt; into &lt;code&gt;(n+1)*2&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;One of the biggest problems is the sheer amount of memory a full matrix of two long strings would require BUT it isn't necessary and here is why: &lt;em&gt;We don't actually need all the data all the time&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Using same example we had in Part 1 with "Saturday" and "Sunday", yellow is the values we depend on and green is the values we calculate AND depend on.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-2-first-row-dependencies.png" alt="Levenshtein Distance Matrix for words &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; with the first row calculated."&gt;&lt;/p&gt;
&lt;p&gt;If we continue this for the next row, it looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-2-second-row-dependencies.png" alt="Levenshtein Distance Matrix for words &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; with the second row calculated."&gt;&lt;/p&gt;
&lt;p&gt;We can see that we no longer depend on the very first row of the matrix so in reality, we only need two rows of memory.&lt;/p&gt;
&lt;p&gt;Below is an example of how that might look in code:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;public int CalculateDistance(string source, string target)
{
	var costMatrix = Enumerable
	  .Range(0, 2)
	  .Select(line =&amp;gt; new int[target.Length + 1])
	  .ToArray();

	for (var i = 1; i &amp;lt;= target.Length; ++i)
	{
		costMatrix[0][i] = i;
	}

	for (var i = 1; i &amp;lt;= source.Length; ++i)
	{
		costMatrix[i % 2][0] = i;

		for (var j = 1; j &amp;lt;= target.Length; ++j)
		{
			var insert = costMatrix[i % 2][j - 1] + 1;
			var delete = costMatrix[(i - 1) % 2][j] + 1;
			var edit = costMatrix[(i - 1) % 2][j - 1] + (source[i - 1] == target[j - 1] ? 0 : 1);

			costMatrix[i % 2][j] = Math.Min(Math.Min(insert, delete), edit);
		}
	}

	return costMatrix[source.Length % 2][target.Length];
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So what is going on here?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Our &lt;code&gt;costMatrix&lt;/code&gt; variable is nearly instantiated the same however we only are creating two rows.&lt;/li&gt;
&lt;li&gt;We dropped one of our for-loops, the one that builds the left column values. Instead, we now have &lt;code&gt;costMatrix[i % 2][0] = i;&lt;/code&gt; in our main loop that effectively mimics this behaviour.&lt;/li&gt;
&lt;li&gt;We are using &lt;code&gt;i % 2&lt;/code&gt; in a lot of places, allowing us to switch which row we are on.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That last point about &lt;code&gt;% 2&lt;/code&gt; switching what row we are on might sound odd so let me explain. The &lt;code&gt;%&lt;/code&gt; operator is called the "Modulo" operator - it gets the remainder after a division.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;i = 0&lt;/code&gt;, the remainder of dividing by 2 is &lt;code&gt;0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;i = 1&lt;/code&gt;, the remainder of dividing by 2 is &lt;code&gt;1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;i = 2&lt;/code&gt;, the remainder of dividing by 2 is &lt;code&gt;0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;and so on...&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is a nice little shortcut for us to switch between the two rows of data as if we had a full matrix.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;source.Length % 2&lt;/code&gt; on the return will always get us the "last" row.&lt;/p&gt;
&lt;p&gt;So at this point, we've dramatically decreased the memory usage but know we can do better - what if we only needed one row...&lt;/p&gt;
&lt;h3 id="turning-n12-into-n1"&gt;Turning &lt;code&gt;(n+1)*2&lt;/code&gt; into &lt;code&gt;n+1&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;I said earlier that we don't need all the data all the time and that is still the case with the two-row example above - we don't need all the data. In this case, we don't need all the data in the same row.&lt;/p&gt;
&lt;p&gt;How does that work? Let's look at the matrix again...&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-2-first-cell-dependencies.png" alt="Levenshtein Distance Matrix for the words &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; with the first cell calculated and the cells it depends on (the cell above, to the left and diagonally above to the left) highlighted"&gt;&lt;/p&gt;
&lt;p&gt;With the yellow cells being what we depend on and green being what we calculated, we are only depending on 3 values. The next cell will also depend on only 3 values...&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-2-second-cell-dependencies.png" alt="Levenshtein Distance Matrix for the words &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; with the second cell calculated and the cells it depends on (the cell above, to the left and diagonally above to the left) highlighted"&gt;&lt;/p&gt;
&lt;p&gt;Now repeating this for one more cell, we can see more clearly the values we don't care about any more for this row.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-2-third-cell-dependencies.png" alt="Levenshtein Distance Matrix for the words &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; with the third cell calculated and the cells it depends on (the cell above, to the left and diagonally above to the left) highlighted"&gt;&lt;/p&gt;
&lt;p&gt;When calculating the third cell, we don't actually care about the first cell we calculated. Taking this in mind, an implementation might look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;public int CalculateDistance(string source, string target)
{
	var previousRow = new int[target.Length + 1];

	for (var i = 1; i &amp;lt;= target.Length; ++i)
	{
		previousRow[i] = i;
	}

	for (var i = 1; i &amp;lt;= source.Length; ++i)
	{
		var previousDiagonal = previousRow[0];
		var previousColumn = previousRow[0]++;

		for (var j = 1; j &amp;lt;= target.Length; ++j)
		{
			var insertOrDelete = Math.Min(previousColumn, previousRow[j]) + 1;
			var edit = previousDiagonal + (source[i - 1] == target[j - 1] ? 0 : 1);

			previousColumn = Math.Min(insertOrDelete, edit);
			previousDiagonal = previousRow[j];
			previousRow[j] = previousColumn;
		}
	}

	return previousRow[target.Length];
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are a lot more changes here and now with different variable names so let's break this down:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var previousRow = new int[target.Length + 1];
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our &lt;code&gt;costMatrix&lt;/code&gt; is out for a single array called &lt;code&gt;previousRow&lt;/code&gt; - this will still hold our costs but from the point of view of our calculation, they will always be the "previous" set of data.&lt;/p&gt;
&lt;p&gt;We still have our for-loop setting the first row like normal.&lt;/p&gt;
&lt;p&gt;Our main loops are the other big differences:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;for (var i = 1; i &amp;lt;= source.Length; ++i)
{
	var previousDiagonal = previousRow[0];
	var previousColumn = previousRow[0]++;

	for (var j = 1; j &amp;lt;= target.Length; ++j)
	{
		var insertOrDelete = Math.Min(previousColumn, previousRow[j]) + 1;
		var edit = previousDiagonal + (source[i - 1] == target[j - 1] ? 0 : 1);

		previousColumn = Math.Min(insertOrDelete, edit);
		previousDiagonal = previousRow[j];
		previousRow[j] = previousColumn;
	}
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Starting with &lt;code&gt;previousDiagonal&lt;/code&gt;, this represents our last substitution/edit cost. At the start of each logical row, the cost here is always one less than the insert cost (our &lt;code&gt;previousColumn&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Speaking of &lt;code&gt;previousColumn&lt;/code&gt;, with it being our last insert cost and knowing that our insert costs start one higher than our substitution/edit cost, we add one to the value of the &lt;code&gt;previousDiagonal&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Our delete costs for our current column are found in &lt;code&gt;previousRow[j]&lt;/code&gt; which also coincides with the value we are about to set.&lt;/p&gt;
&lt;p&gt;In the meat of the for-loops, we can see I'm playing around with the variables a bit setting the &lt;code&gt;previousColumn&lt;/code&gt; and &lt;code&gt;previousDiagonal&lt;/code&gt; before finally setting &lt;code&gt;previousRow[j]&lt;/code&gt; - this is all to make sure we have the right values in the right spots.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We set &lt;code&gt;previousColumn&lt;/code&gt; to be our result because our result will be the "previous" result for the next column over.&lt;/li&gt;
&lt;li&gt;We set &lt;code&gt;previousDiagonal&lt;/code&gt; to be our our delete cost (&lt;code&gt;previousRow[j]&lt;/code&gt;) as for the next column, our delete cost is their substitution/edit cost.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally after all the processing is done - we return the last value of our &lt;code&gt;previousRow&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Now &lt;code&gt;n+1&lt;/code&gt; is great but what if we wanted &lt;code&gt;n&lt;/code&gt;-sized array instead? I can say it is totally possible with a bit more footwork with what variables you set where but I'll leave that up to the reader to work out. (Or stay tuned for Part 3 where we push things to 11)&lt;/p&gt;
&lt;h2 id="performancetime-optimizations"&gt;Performance/Time Optimizations&lt;/h2&gt;
&lt;p&gt;Now while the memory optimizations are great and will increase performance through less memory lookups or even from the removal of one of the for-loops, there are still big performance-specific optimizations on the table.&lt;/p&gt;
&lt;p&gt;There are some basic shortcuts to the Levenshtein Distance, these include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;StringA&lt;/code&gt; is empty, the Levenshtein Distance is &lt;code&gt;StringB&lt;/code&gt;'s length.&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;StringB&lt;/code&gt; is empty, the Levenshtein Distance is &lt;code&gt;StringA&lt;/code&gt;'s length.&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;StringA&lt;/code&gt; is equal to &lt;code&gt;StringB&lt;/code&gt;, the Levenshtein Distance is &lt;code&gt;0&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Those are good but we can do better - What if we didn't even need build a matrix of all the letters from both words? What if we could trim the words that share a prefix or suffix? Now we're onto something...&lt;/p&gt;
&lt;p&gt;With our example strings &lt;code&gt;Saturday&lt;/code&gt; and &lt;code&gt;Sunday&lt;/code&gt;, they share a common prefix and suffix so makes a great example for this. We only need to start where the strings are different and end on the last difference.&lt;/p&gt;
&lt;p&gt;This makes "Saturday"-"Sunday" become "atur"-"un", a much smaller amount to process and as a bonus, a much smaller amount of memory needed!&lt;/p&gt;
&lt;p&gt;Just to be sure it works, let's look at a matrix of this:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-2-subsection-matrix.png" alt="Levenshtein Distance Matrix of &amp;quot;atur&amp;quot; and &amp;quot;un&amp;quot; with all the cells calculated."&gt;&lt;/p&gt;
&lt;p&gt;We can actually find this segment in the full matrix of "Saturday" and "Sunday":&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-2-subsection-highlighted.png" alt="Levenshtein Distance Matrix of &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; with all the cells calculated and the cells matching &amp;quot;atur&amp;quot; and &amp;quot;un&amp;quot; highlighted."&gt;&lt;/p&gt;
&lt;p&gt;This optimization can be extremely useful on large strings, bringing them down to a far more manageable size.&lt;/p&gt;
&lt;p&gt;A partial implementation of this might look like:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;var startIndex = 0;
var sourceEnd = source.Length;
var targetEnd = target.Length;

while (startIndex &amp;lt; sourceEnd &amp;amp;&amp;amp; startIndex &amp;lt; targetEnd &amp;amp;&amp;amp; source[startIndex] == target[startIndex])
{
	startIndex++;
}
while (startIndex &amp;lt; sourceEnd &amp;amp;&amp;amp; startIndex &amp;lt; targetEnd &amp;amp;&amp;amp; source[sourceEnd - 1] == target[targetEnd - 1])
{
	sourceEnd--;
	targetEnd--;
}

var sourceLength = sourceEnd - startIndex;
var targetLength = targetEnd - startIndex;

source = source.Substring(startIndex, sourceLength);
target = target.Substring(startIndex, targetLength);
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id="gotta-go-faster-i-want-more-optimizations"&gt;Gotta Go Faster - I want more optimizations!&lt;/h2&gt;
&lt;p&gt;So do I! In &lt;a href="https://turnerj.com/blog/levenshtein-distance-part-3-optimize-everything"&gt;Part 3&lt;/a&gt;, we will take everything we've done to 11! We will remove any memory allocations we have, decrease the number of string lookups, use hardware intrinsics to improve performance, learn the secrets of loop unrolling and even dabble in parallel processing! 😈&lt;/p&gt;
&lt;p&gt;It's gonna be great!&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>Levenshtein Distance (Part 1: What is it?)</title>
			<link>https://turnerj.com/blog/levenshtein-distance-part-1-what-is-it</link>
			<description>How similar are two strings?</description>
			<enclosure url="https://turnerj.com/blog/images/social/levenshtein-distance-part-1-what-is-it.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/levenshtein-distance-part-1-what-is-it</guid>
			<pubDate>Fri, 07 Feb 2020 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;The Levenshtein Distance is a deceptively simple algorithm - by looping over two strings, it can provide the "distance" (the number of differences) between the two. These differences are calculated in terms of "inserts", "deletions" and "substitutions".&lt;/p&gt;
&lt;p&gt;The "distance" is effectively how similar two strings are. A distance of &lt;code&gt;0&lt;/code&gt; would mean the strings are equal (no differences). A distance can be as high as there are as many characters in the longest string - this would mean there is absolutely nothing "similar" between these strings.&lt;/p&gt;
&lt;p&gt;An application of the Levenshtein Distance algorithm is spell checking - knowing how similar two words are allows it to provide suggestions on what you may have intended to type.&lt;/p&gt;
&lt;h2 id="calculating-the-levenshtein-distance"&gt;Calculating the Levenshtein Distance&lt;/h2&gt;
&lt;p&gt;Take the words "Saturday" and "Sunday". What do we know about these words?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Saturday is 8 characters long and Sunday is 6.&lt;/li&gt;
&lt;li&gt;They both start with "S" and end with "day".&lt;/li&gt;
&lt;li&gt;Relative to the beginning of the string, the shared "u" is in a different position.&lt;/li&gt;
&lt;li&gt;This just leaves "a", "t" and "r" from Saturday and "n" from Sunday as the only other differences.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Calculating the Levenshtein Distance of these two strings is effectively like building a matrix of values which represent the various inserts, deletions and substitutions required.&lt;/p&gt;
&lt;p&gt;Let's build a matrix of this ourselves. We're going to start with the top row and left most column filled with numbers from &lt;code&gt;0&lt;/code&gt; to each of the string's lengths.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-1-empty-matrix.png" alt="Levenshtein Distance - Empty matrix for the words &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot;"&gt;&lt;/p&gt;
&lt;p&gt;Starting from the position of where the asterisks is, we will look at the cell above (the deletion cost), the cell to the left (the insertion cost) and the cell diagonally top-left (the substitution cost).&lt;/p&gt;
&lt;p&gt;The operations to fill our cell are as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Get the minimum of the insert cost (&lt;code&gt;1&lt;/code&gt;) and deletion cost (&lt;code&gt;1&lt;/code&gt;), adding &lt;code&gt;1&lt;/code&gt; to the result (&lt;code&gt;MIN(1,1) + 1 = 2&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Get the substitution cost (&lt;code&gt;0&lt;/code&gt;), adding &lt;code&gt;1&lt;/code&gt; to it if the letter on this column ("S") and row ("S") are different (&lt;code&gt;0 + 0 = 0&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Take the minimum of those numbers (&lt;code&gt;MIN(2,0) = 0&lt;/code&gt;) and put that in our cell&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-1-first-cell.png" alt="Levenshtein Distance - Matrix for the words &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; with first cell filled in"&gt;&lt;/p&gt;
&lt;p&gt;Rinse and repeat for the next cell in the row:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Get the minimum of the insert cost (&lt;code&gt;0&lt;/code&gt;) and deletion cost (&lt;code&gt;2&lt;/code&gt;), adding &lt;code&gt;1&lt;/code&gt; to the result (&lt;code&gt;MIN(0,2) + 1 = 1&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Get the substitution cost (&lt;code&gt;1&lt;/code&gt;), adding &lt;code&gt;1&lt;/code&gt; to it if the letter on this column ("a") and row ("S") are different (&lt;code&gt;1 + 1 = 2&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Take the minimum of those numbers (&lt;code&gt;MIN(1,2) = 1&lt;/code&gt;) and put that in our cell&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-1-second-cell.png" alt="Levenshtein Distance - Matrix for the words &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; with second cell filled in"&gt;&lt;/p&gt;
&lt;p&gt;This keeps going for the whole row:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-1-first-row.png" alt="Levenshtein Distance - Matrix for the words &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; with first row filled in"&gt;&lt;/p&gt;
&lt;p&gt;And eventually the whole table:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/levenshtein-distance-part-1-filled-matrix.png" alt="Levenshtein Distance - Matrix for the words &amp;quot;Saturday&amp;quot; and &amp;quot;Sunday&amp;quot; with entire matrix filled in"&gt;&lt;/p&gt;
&lt;p&gt;The actual Levenshtein Distance can be found in the bottom-right cell (&lt;code&gt;3&lt;/code&gt;). Amazing right?&lt;/p&gt;
&lt;h2 id="deceptively-simple"&gt;"Deceptively Simple"&lt;/h2&gt;
&lt;p&gt;I said early this algorithm is deceptively simple - I say this because of its computational complexity. This has a "Big-O" notation of &lt;code&gt;O(n*m)&lt;/code&gt; where &lt;code&gt;n&lt;/code&gt; is the length of one string and &lt;code&gt;m&lt;/code&gt; is the length of the other. Memory-wise, building that matrix above is effectively &lt;code&gt;(n+1)*(m+1)&lt;/code&gt; cells.&lt;/p&gt;
&lt;p&gt;For the example above, there are 48 character comparisons with 96 checks for minimum value. Representing that table in memory (&lt;code&gt;(6+1)*(8+1)&lt;/code&gt; cells by 4 bytes a cell, assuming 32-bit numbers) would account for 252 bytes!&lt;/p&gt;
&lt;p&gt;Yes, I'm sure you're rolling your eyes that I'm squawking at 252 bytes. The problem is when you want to compare bigger strings. How about comparing two 1000 character strings? You'd be looking at... 4 megabytes.&lt;/p&gt;
&lt;p&gt;That might not sound like a lot for a single lookup but now imagine doing that dozens or hundreds of times in a language that utilises garbage collection - the allocations would be huge!&lt;/p&gt;
&lt;p&gt;When I first learnt about Levenshtein Distance I wanted to use it on web pages, to see how different two web pages are. Web pages can be quite large when counting every character in the HTML (this post as on The Practical DEV - at the time of writing - is north of 77K characters) or even just every text node (6K+ characters on this page at time of writing). Punching that into the same calculations above would be ~34 megabytes for 6K characters or a whopping &lt;em&gt;&lt;strong&gt;5 gigabytes&lt;/strong&gt;&lt;/em&gt; for 77K characters!&lt;/p&gt;
&lt;p&gt;Now what if we could reduce the number of cells we need from &lt;code&gt;(n+1)*(m+1)&lt;/code&gt; to &lt;code&gt;(MIN(n,m)+1)*2&lt;/code&gt; or even just &lt;code&gt;MIN(n,m)&lt;/code&gt;? In &lt;a href="https://turnerj.com/blog/levenshtein-distance-part-2-gotta-go-fast"&gt;Part 2&lt;/a&gt;, I'll talk about the strategies of making this algorithm faster and more memory efficient.&lt;/p&gt;
&lt;h3 id="further-reading"&gt;Further Reading&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Levenshtein_distance"&gt;Levenshtein Distance on Wikipedia&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>What is Microdata and why should I care?</title>
			<link>https://turnerj.com/blog/what-is-microdata-and-why-should-i-care</link>
			<description>Schema-ify the Web</description>
			<enclosure url="https://turnerj.com/blog/images/social/what-is-microdata-and-why-should-i-care.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/what-is-microdata-and-why-should-i-care</guid>
			<pubDate>Wed, 29 Jan 2020 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;To get this out of the way, Microdata is &lt;strong&gt;NOT&lt;/strong&gt; related to &lt;a href="https://en.wikipedia.org/wiki/Microservices"&gt;Microservices&lt;/a&gt;. Its not some paradigm shift with handling or processing data. Microdata is one of 3 popular formats used for describing content within a web page - the two others being &lt;a href="https://en.wikipedia.org/wiki/RDFa"&gt;RDFa (Resource Description Framework in Attributes)&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/JSON-LD"&gt;JSON-LD (JavaScript Object Notation for Linked Data)&lt;/a&gt;. These are all primarily used for Search Engine Optimization (SEO) however that isn't their sole purpose.&lt;/p&gt;
&lt;p&gt;Similar to Microdata and RDFa, there is also &lt;a href="https://ogp.me/"&gt;Open Graph&lt;/a&gt; which has been made popular by Facebook. While it does allow describing of data and is a popular method used by social media websites, it is more limited to what it describes and doesn't flow into the natural HTML of the page like Microdata or RDFa do.&lt;/p&gt;
&lt;p&gt;In this post, we will walk through an example of Microdata as seen on &lt;a href="https://dev.to/"&gt;dev.to&lt;/a&gt; - specifically, the syndicated version of this very post.
A &lt;a href="https://dev.to/turnerj/what-is-microdata-and-why-should-i-care-23jk"&gt;typical post on DEV&lt;/a&gt; uses Microdata to describe this article, the cover image, the author and the publisher. Let's have a closer look...&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-html"&gt;&amp;lt;article itemscope itemtype="http://schema.org/Article" itemprop="mainEntity"&amp;gt;
      &amp;lt;meta itemprop="url" content="https://dev.to/turnerj/what-is-microdata-and-why-should-i-care-23jk"&amp;gt;
      &amp;lt;meta itemprop="image" content="https://res.cloudinary.com/practicaldev/image/fetch/s--dVn-CraX--/c_imagga_scale,f_auto,fl_progressive,h_500,q_auto,w_1000/https://dev-to-uploads.s3.amazonaws.com/i/9ofe2id7kzynypzdkdps.png"&amp;gt;
      &amp;lt;div itemprop="publisher" itemscope itemtype="https://schema.org/Organization"&amp;gt;
        &amp;lt;div itemprop="logo" itemscope itemtype="https://schema.org/ImageObject"&amp;gt;
          &amp;lt;meta itemprop="url" content="https://practicaldev-herokuapp-com.freetls.fastly.net/assets/android-icon-192x192-0409854849dca4043b26f85039b8c3d42cbac2bd8793fec1004eb389fa153877.png"&amp;gt;
          &amp;lt;meta itemprop="width" content="192"&amp;gt;
          &amp;lt;meta itemprop="height" content="192"&amp;gt;
        &amp;lt;/div&amp;gt;
        &amp;lt;meta itemprop="name" content="DEV Community"&amp;gt;
      &amp;lt;/div&amp;gt;
      &amp;lt;header class="title" id="main-title"&amp;gt;
        &amp;lt;h1 class="medium" itemprop="name headline"&amp;gt;
          What is Microdata and why should I care?
        &amp;lt;/h1&amp;gt;
        &amp;lt;h3&amp;gt;
          &amp;lt;span itemprop="author" itemscope itemtype="http://schema.org/Person"&amp;gt;
            &amp;lt;meta itemprop="url" content="https://dev.to/turnerj"&amp;gt;
            &amp;lt;a href="/turnerj" class="author"&amp;gt;
              &amp;lt;img class="profile-pic" src="https://res.cloudinary.com/practicaldev/image/fetch/s--erE_cpgk--/c_fill,f_auto,fl_progressive,h_50,q_auto,w_50/https://dev-to-uploads.s3.amazonaws.com/uploads/user/profile_image/95629/bd0aa8b6-0c56-4a69-a2cf-77e6d484e77c.jpeg" alt="turnerj profile image" /&amp;gt;
              &amp;lt;span itemprop="name"&amp;gt;James Turner&amp;lt;/span&amp;gt;
            &amp;lt;/a&amp;gt;
          &amp;lt;/span&amp;gt;
        &amp;lt;/h3&amp;gt;
          &amp;lt;div class="tags"&amp;gt;
              &amp;lt;a class="tag" href="/t/webdev" style="background-color:#562765;color:#ffffff"&amp;gt;#webdev&amp;lt;/a&amp;gt;
          &amp;lt;/div&amp;gt;
      &amp;lt;/header&amp;gt;
      &amp;lt;div class="body" data-article-id="250783" id="article-body" itemprop="articleBody"&amp;gt;
        &amp;lt;p&amp;gt;To get this out of the way, no, Microdata is not related to Microservices. Its not some paradigm shift with handling or processing data. Microdata is one of 3 distinct formats used for describing content within a web page - the two others being RDFa and JSON-LD.&amp;lt;/p&amp;gt;

...

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There is a lot of stuff going on there! Let's break it down into smaller chunks.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-html"&gt;&amp;lt;article itemscope itemtype="http://schema.org/Article" itemprop="mainEntity"&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We have our &lt;code&gt;&amp;lt;article&amp;gt;&lt;/code&gt; HTML tag with a few interesting attributes.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;itemscope&lt;/code&gt;: Think of this as saying "I'm an object with sub-properties". Inside the article tag, we will find more properties.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;itemtype&lt;/code&gt;: This describes what type the child properties belong to. With this in mind, you'll always see &lt;code&gt;itemscope&lt;/code&gt; and &lt;code&gt;itemtype&lt;/code&gt; on the same tag.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;itemprop&lt;/code&gt;: This says the property that this object/value belongs to.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You might have a few questions like "What is schema.org?" and "Why does the article have an &lt;code&gt;itemprop&lt;/code&gt; set - what is it even set to?".&lt;/p&gt;
&lt;p&gt;Firstly, &lt;a href="https://schema.org/"&gt;schema.org&lt;/a&gt; is &lt;a href="https://en.wikipedia.org/wiki/Microdata_(HTML)#Vocabularies"&gt;a vocabulary to describe types&lt;/a&gt; - they define the types you can choose from and the properties that you can set. It is a community effort founded by Google, Microsoft, Yahoo and Yandex to help describe the web. While you will likely find many examples of Microdata, RDFa and JSON-LD using schema.org, these formats aren't tied to it - they can use any vocabulary as long as the desired third-party can understand it. However for the purposes of this article, I will keep referring to types as defined by &lt;a href="https://schema.org/"&gt;schema.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Regarding &lt;code&gt;itemprop&lt;/code&gt; existing on the article tag but there being no parent element with an &lt;code&gt;itemscope&lt;/code&gt;, web pages can be thought like implicitly being schema.org &lt;code&gt;WebPage&lt;/code&gt; type. The property &lt;code&gt;mainEntity&lt;/code&gt; allows defining that the article is the most primary object of the web page.&lt;/p&gt;
&lt;p&gt;So what do we know now? We have an article of type &lt;code&gt;Article&lt;/code&gt; defined by &lt;a href="https://schema.org/Article"&gt;schema.org&lt;/a&gt; which is the main entity of the web page. Right now that isn't a lot of information so lets keep digging...&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-html"&gt;&amp;lt;article itemscope itemtype="http://schema.org/Article" itemprop="mainEntity"&amp;gt;
      &amp;lt;meta itemprop="url" content="https://dev.to/turnerj/what-is-microdata-and-why-should-i-care-3o3o"&amp;gt;
      &amp;lt;meta itemprop="image" content="https://res.cloudinary.com/practicaldev/image/fetch/s--dVn-CraX--/c_imagga_scale,f_auto,fl_progressive,h_500,q_auto,w_1000/https://dev-to-uploads.s3.amazonaws.com/i/9ofe2id7kzynypzdkdps.png"&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Underneath the article tag there are... meta tags?! What might seem unusual, meta tags used in this purpose are useful for Microdata (or RDFa) to describe information that isn't displayed on the page. While the second of the two meta tags is actually the cover image (so it is actually displayed), this really comes down to personal preference on what to use. Anyway, these two tags describe the &lt;code&gt;url&lt;/code&gt; and &lt;code&gt;image&lt;/code&gt; of the page.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-html"&gt;      &amp;lt;div itemprop="publisher" itemscope itemtype="https://schema.org/Organization"&amp;gt;
        &amp;lt;div itemprop="logo" itemscope itemtype="https://schema.org/ImageObject"&amp;gt;
          &amp;lt;meta itemprop="url" content="https://practicaldev-herokuapp-com.freetls.fastly.net/assets/android-icon-192x192-0409854849dca4043b26f85039b8c3d42cbac2bd8793fec1004eb389fa153877.png"&amp;gt;
          &amp;lt;meta itemprop="width" content="192"&amp;gt;
          &amp;lt;meta itemprop="height" content="192"&amp;gt;
        &amp;lt;/div&amp;gt;
        &amp;lt;meta itemprop="name" content="DEV Community"&amp;gt;
      &amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What we have here is the &lt;code&gt;publisher&lt;/code&gt; property defined as &lt;a href="https://schema.org/Organization"&gt;an &lt;code&gt;Organization&lt;/code&gt; type&lt;/a&gt;. With it having the &lt;code&gt;itemscope&lt;/code&gt; attribute, we know its an object with its own properties (though as I noted earlier, having &lt;code&gt;itemtype&lt;/code&gt; effectively gives this away too).&lt;/p&gt;
&lt;p&gt;This &lt;code&gt;Organization&lt;/code&gt; has a logo (&lt;a href="https://schema.org/ImageObject"&gt;an &lt;code&gt;ImageObject&lt;/code&gt; type&lt;/a&gt;) for which has a number of its own properties too including the &lt;code&gt;url&lt;/code&gt;, &lt;code&gt;width&lt;/code&gt; and &lt;code&gt;height&lt;/code&gt; of the logo.&lt;/p&gt;
&lt;p&gt;We can also see the &lt;code&gt;name&lt;/code&gt; of the &lt;code&gt;Organization&lt;/code&gt; is "DEV Community".&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-html"&gt;      &amp;lt;header class="title" id="main-title"&amp;gt;
        &amp;lt;h1 class="medium" itemprop="name headline"&amp;gt;
          What is Microdata and why should I care?
        &amp;lt;/h1&amp;gt;
        &amp;lt;h3&amp;gt;
          &amp;lt;span itemprop="author" itemscope itemtype="http://schema.org/Person"&amp;gt;
            &amp;lt;meta itemprop="url" content="https://dev.to/turnerj"&amp;gt;
            &amp;lt;a href="/turnerj" class="author"&amp;gt;
              &amp;lt;img class="profile-pic" src="https://res.cloudinary.com/practicaldev/image/fetch/s--erE_cpgk--/c_fill,f_auto,fl_progressive,h_50,q_auto,w_50/https://dev-to-uploads.s3.amazonaws.com/uploads/user/profile_image/95629/bd0aa8b6-0c56-4a69-a2cf-77e6d484e77c.jpeg" alt="turnerj profile image" /&amp;gt;
              &amp;lt;span itemprop="name"&amp;gt;James Turner&amp;lt;/span&amp;gt;
            &amp;lt;/a&amp;gt;
          &amp;lt;/span&amp;gt;
        &amp;lt;/h3&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Skipping right along, we see the H1 tag's &lt;code&gt;itemprop&lt;/code&gt; describe what looks like two different properties. Yes, that is right - &lt;code&gt;itemprop&lt;/code&gt; allows space which allows setting two properties at once. In this case, "What is Microdata and why should I care?" is set to the properties &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;headline&lt;/code&gt; of the &lt;code&gt;Article&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Then we have another property, &lt;code&gt;author&lt;/code&gt;, as an object of nested properties. With the &lt;code&gt;url&lt;/code&gt; and &lt;code&gt;name&lt;/code&gt; properties defined.&lt;/p&gt;
&lt;p&gt;Finally on our journey of discovery, we have this:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-html"&gt;      &amp;lt;div class="body" data-article-id="250783" id="article-body" itemprop="articleBody"&amp;gt;
        &amp;lt;p&amp;gt;To get this out of the way, no, Microdata is not related to Microservices. Its not some paradigm shift with handling or processing data. Microdata is one of 3 distinct formats used for describing content within a web page - the two others being RDFa and JSON-LD.&amp;lt;/p&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A nice simple property, &lt;code&gt;articleBody&lt;/code&gt;, that defines the entire body of the article to that element.&lt;/p&gt;
&lt;p&gt;What does this all mean? If we parsed this page with the understanding of Microdata, we'd know at lot of specific details about the article, the author, the publisher and content. That the web page specifically points these details out in a standardised fashion makes it easier for those that would benefit from this detailed data.&lt;/p&gt;
&lt;h3 id="but-who-does-benefit-from-this-data-why-should-i-care-about-microdata"&gt;But who does benefit from this data? &lt;em&gt;Why should I care about Microdata?&lt;/em&gt;&lt;/h3&gt;
&lt;p&gt;Have you used Siri, Alexa or Google's voice assistant? Have you used shopping/price tracking websites to find the lowest price for a product? Have you searched for something on Google and seen the details pane on the right?&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/what-is-microdata-google-search-details-pane.jpg" alt="Google Search Details Pane"&gt;&lt;/p&gt;
&lt;p&gt;Services and features like these rely on data and while some (particularly voice assistants) might rely on dedicated APIs for their data, others need to effectively scrape the web for it. With every website having different HTML, class names and structures, being able to pull valuable information out of the page is difficult.&lt;/p&gt;
&lt;p&gt;Microdata, RDFa or JSON-LD are used as ways to communicate valuable information from a website in a format the other systems can interpret. As these formats embed directly in regular HTML, it isn't a paradigm shift in how things are built to communicate this detail.&lt;/p&gt;
&lt;p&gt;One of the biggest benefits I personally see with structured data is the decentrailization of data where individual websites can promote their data in a more structured way, allowing any number of third party tools to consume it.&lt;/p&gt;
&lt;p&gt;Whether it is to build more advanced voice assistants, better price trackers or smarter search engines, structured data (through formats like Microdata) provide tools of the future a standardised way to read the web.&lt;/p&gt;
&lt;h3 id="additional-resources"&gt;Additional Resources&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Microdata_(HTML)"&gt;Microdata&lt;/a&gt;, &lt;a href="https://en.wikipedia.org/wiki/RDFa#RDFa_Lite"&gt;RDFa&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/JSON-LD"&gt;JSON-LD&lt;/a&gt; on Wikipedia&lt;/li&gt;
&lt;li&gt;&lt;a href="https://schema.org/"&gt;Schema.org&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Schema.org"&gt;Schema.org on Wikipedia&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>How to get an idea for a product?</title>
			<link>https://turnerj.com/blog/so-you-want-to-launch-a-product</link>
			<description>So you want to launch a product? First you'll need an idea.</description>
			<enclosure url="https://turnerj.com/blog/images/social/so-you-want-to-launch-a-product.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/so-you-want-to-launch-a-product</guid>
			<pubDate>Fri, 19 Jul 2019 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;If you're anything like me, you have an itch you just can't scratch when working for someone. Maybe you're not very interested in the types of work presented. Perhaps you're feeling burnt out for doing many long hours maintaining legacy systems. At the very least, you want to create something new and launch it for the world to see.&lt;/p&gt;
&lt;p&gt;I'm not a doctor but... it sounds like you've got the entrepreneurial bug. 🤔&lt;/p&gt;
&lt;p&gt;You might have reservations about using the word "entrepreneur". I feel that "entrepreneur" is considered a bit of a dirty word in some circles - a word some use to describe "dreamers" and those they have far fetched plans without real world basis. In reality though, it is just one of many terms like "creator", "inventor" etc to describe someone putting in the hard yards to make an idea come to fruition.&lt;/p&gt;
&lt;p&gt;The entrepreneur/creator/inventor and their execution of an idea is one vital part of a success story but regardless of what you might hear, the idea itself still plays an important role. Amazing execution of a bad idea won't make it better. That said, terrible execution of a good idea is not a path to success. While it might sound obvious, great execution of a great idea is always going to have the best results.&lt;/p&gt;
&lt;p&gt;How do you come up with an idea though? How do you know the idea is great? My friends say my idea is great, how do I get millions of dollars in seed funding?&lt;/p&gt;
&lt;h2 id="the-best-ideas-start-from-a-problem-not-a-solution"&gt;The best ideas start from a problem, not a solution.&lt;/h2&gt;
&lt;p&gt;Probably the first thing to keep in mind is not to try and shoe-horn a problem over some solution you threw together. Don't just go "I made a cool thing and now I want to sell it to become a millionaire" as you'll likely be disappointed.&lt;/p&gt;
&lt;p&gt;For example &lt;a href="https://en.wikipedia.org/wiki/Garrett_Camp"&gt;Garrett Camp&lt;/a&gt;, co-founder of Uber, had the realisation of Uber when a &lt;a href="https://www.businessinsider.com.au/uber-travis-kalanick-bio-2014-1?r=US&amp;amp;IR=T"&gt;private driver for a night out cost him $800&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The reason problems are important is that they help you identify customers and determine market size. If your problem is unique to you alone, you're off to a bad start. If your problem occurs to people who do X while doing Y and Z &lt;a href="https://en.wikipedia.org/wiki/List_of_idioms_of_improbability#In_English"&gt;once in a blue moon&lt;/a&gt;, again not particularly practical as it ends up being too &lt;a href="https://en.wikipedia.org/wiki/Niche_market"&gt;niche&lt;/a&gt; and too infrequent. The combination of frequency and severity of the problem help guide how important the problem is to be solved.&lt;/p&gt;
&lt;p&gt;A reasonable follow-up question to this is now though: How do I find a problem to help come up with an idea?&lt;/p&gt;
&lt;p&gt;Honestly, think about what you do in every day life. Think about things you don't like and things that you think could be done better. Think about your skills in relation to these things and what you could contribute. If "I am unhappy with the cost of private drivers" helped create Uber, what is the next big idea that starts off so simple?&lt;/p&gt;
&lt;p&gt;Now you might have a problem and an idea to solve it...&lt;/p&gt;
&lt;h2 id="my-friendsfamilyuber-driver-say-my-idea-is-amazing"&gt;My friends/family/Uber driver say my idea is amazing!&lt;/h2&gt;
&lt;p&gt;I don't want to disappoint you but unless this they are your target audience, &lt;a href="https://en.wikipedia.org/wiki/Grain_of_salt"&gt;take it with a grain of salt&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I'm not saying their feedback is entirely irrelevant but they might know you too well to be critical in their feedback or not know you well enough to care. You don't want to fall into the trap that everything is amazing that you go all the way up to launch and it falls flat on its face.&lt;/p&gt;
&lt;p&gt;If you're not launched yet, you will want to be talking to potential customers and see if they really have the problem. Propose a simplified version of your solution and see if that resonates with them. Don't just try with one potential customer, try it with as many as you can.&lt;/p&gt;
&lt;p&gt;There is a popular diagram that I think helps explain this really well.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/project-management-swing-diagram.png" alt="Project Management Swing diagram - showing different stages of a project and how things are done/interpreted as if it was a swing hanging from a tree"&gt;&lt;/p&gt;
&lt;p&gt;Your understanding of the problem and solution might be any of the swings on the first row. If customers really only needed the last swing on the second row, you will likely miss the mark - this is why it is critical to talk to your customers. Your friends/family/Uber driver likely will not help point you in the right direction unless they are your customers!&lt;/p&gt;
&lt;p&gt;The diagram does show the client explaining the problem is the first swing but this is the art of good ideas and good execution. Good ideas solve customer problems. Good execution prevent all the bad examples in that diagram.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Ideas are important. Great ideas come from problems. Problems can come from anything you do in your everyday life. Validate your problems with potential customers.&lt;/p&gt;
&lt;p&gt;After all of that, now you can go start executing your idea!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/so-you-want-to-launch-a-product-1-CpgNjk2E54p7W.gif" alt="Shia LaBeouf saying &amp;quot;What are you waiting for - do it!&amp;quot;"&gt;&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>Using MiniProfiler with MongoDB</title>
			<link>https://turnerj.com/blog/mongodb-loves-miniprofiler</link>
			<description>MongoFramework helps connect MiniProfiler and MongoDB together</description>
			<enclosure url="https://turnerj.com/blog/images/social/mongodb-loves-miniprofiler.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/mongodb-loves-miniprofiler</guid>
			<pubDate>Sat, 08 Jun 2019 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;If you're familiar with .NET, you may have heard of an awesome project called &lt;a href="https://miniprofiler.com/"&gt;MiniProfiler&lt;/a&gt; made by the awesome folks at Stack Overflow.&lt;/p&gt;
&lt;p&gt;If you're unfamiliar, you may have gathered from the name it has something to do with profiling code - you wouldn't be wrong!&lt;/p&gt;
&lt;p&gt;MiniProfiler isn't designed to profile every method call - it is however designed to profile the calls you specifically want to know about. These include things like database calls, your controllers or your views. You can optionally profile any other code you want through MiniProfiler's API.&lt;/p&gt;
&lt;p&gt;MongoDB - some love it, some hate it but regardless I use it and I like it for my projects. What I don't like with MongoDB is the C# driver so as an ongoing project of mine, I have built my own wrapper for MongoDB C# driver called &lt;a href="https://github.com/TurnerSoftware/MongoFramework"&gt;MongoFramework&lt;/a&gt;.
I have &lt;a href="https://turnerj.com/blog/mongo-what-now"&gt;written about this library previously here&lt;/a&gt; so I won't go into much detail besides saying it makes dealing with MongoDB similar to dealing with Entity Framework.&lt;/p&gt;
&lt;p&gt;MongoFramework however is how MiniProfiler and MongoDB meet. You see, MiniProfiler has official packages for profiling EF6 and EF Core but doesn't actually support profiling of MongoDB queries. As I had already written a wrapper around the official MongoDB driver, I was already in a good place to extend my own integration to support profiling.&lt;/p&gt;
&lt;p&gt;MongoFramework uses a diagnostic layer inspired by how MiniProfiler actually connects into EF Core. This diagnostic layer is per connection for MongoFramework and as an interface, can easily be swapped out for any other diagnostic tool. The diagnostic layer is invoked at every entity read/write as well as the creation of indexes.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/miniprofiler-dialog.png" alt="MiniProfiler profiling dialog showing a MongoDB column"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/miniprofiler-mongodb-query.png" alt="MiniProfiler showing an actual MongoDB query"&gt;&lt;/p&gt;
&lt;p&gt;So if you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;are building a project in .NET&lt;/li&gt;
&lt;li&gt;are using MongoDB for persistence via MongoFramework&lt;/li&gt;
&lt;li&gt;are using or want to use MiniProfiler&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then this might be exactly what you are looking for! Check it out on &lt;a href="https://www.nuget.org/packages/MongoFramework.Profiling.MiniProfiler/"&gt;NuGet&lt;/a&gt; and &lt;a href="https://github.com/TurnerSoftware/MongoFramework/"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>Bad design is everywhere</title>
			<link>https://turnerj.com/blog/bad-design-is-everywhere</link>
			<description>From code, to user interfaces and even doors, bad design is everywhere.</description>
			<enclosure url="https://turnerj.com/blog/images/social/bad-design-is-everywhere.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/bad-design-is-everywhere</guid>
			<pubDate>Tue, 23 Apr 2019 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;Ever encountered poorly written code? How about a confusing UI in a program or web page? What about accidentally pulling a "Push" door? Each of these cases is a form of bad design through poor User Experience (UX) in their different mediums for their different users.&lt;/p&gt;
&lt;p&gt;As a developer, graphic designer, systems engineer, back end, front end developer, hardware developer, managers etc, we all play a part in the overall experience in whatever we are making. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Working with spaghetti code has a poor UX for the developer that needs to deal with it.&lt;/li&gt;
&lt;li&gt;Code or servers that performs unnecessarily slow (think waiting minutes for a web page to load) has poor UX, regardless of how nice your loading screen is.&lt;/li&gt;
&lt;li&gt;A website without appropriate accessibility considerations has a poor UX for those that rely on these aspects (eg. &lt;code&gt;alt&lt;/code&gt;/&lt;code&gt;aria&lt;/code&gt; attributes).&lt;/li&gt;
&lt;li&gt;Confusing/ambiguous language regarding a form or dialog, increasing the chances that the wrong option is selected. Even if the right option is selected, better language can improve a poor UX.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I mentioned earlier about accidentally pulling a "Push" door, this is actually quite a common &lt;em&gt;faux pas&lt;/em&gt; and comes from subtle design aspects of a door. Having a handle instinctively implies "Pulling" so unless a door can open both directions, only one side should have a handle. This issue with doors, whether you "Push" or "Pull", is called &lt;a href="https://99percentinvisible.org/article/norman-doors-dont-know-whether-push-pull-blame-design/"&gt;Norman Doors&lt;/a&gt;, named after &lt;a href="https://en.wikipedia.org/wiki/Donald_Norman"&gt;Donald Norman&lt;/a&gt;. He created a popular book &lt;a href="https://en.wikipedia.org/wiki/The_Design_of_Everyday_Things"&gt;The Design of Everyday Things&lt;/a&gt; which goes into detail about &lt;a href="https://en.wikipedia.org/wiki/Affordance"&gt;"affordance"&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/User-centered_design"&gt;user-centered design&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/bad-design-is-everywhere-1-l0HlJJqClMNs0I6Dm.gif" alt="An animated example of pulling a &amp;quot;Push&amp;quot; door because of it having a handle."&gt;&lt;/p&gt;
&lt;h2 id="affordance-in-design"&gt;Affordance in Design&lt;/h2&gt;
&lt;p&gt;Affordance can be understood as "the action that is most implicit". Going with the door example earlier, a handle &lt;em&gt;affords&lt;/em&gt; pulling while a plate &lt;em&gt;affords&lt;/em&gt; pushing.&lt;/p&gt;
&lt;p&gt;The idea through implicitly known actions is that you don't need to teach users how to interact/use/understand everything. The less you need to teach a user, the better the experience.&lt;/p&gt;
&lt;p&gt;It is important to note that not every action is implicit to everyone - users come from different backgrounds and have different experiences. You don't teach advanced programming topics to absolute beginner programmers. Similarly, you don't design a command line interface for a non-technical user. You need to design for the consuming user.&lt;/p&gt;
&lt;h2 id="user-centered-design"&gt;User-centered Design&lt;/h2&gt;
&lt;p&gt;Thinking about the user-first in a design process, you would be asking questions like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Who are the users of the feature/change/document etc?&lt;/li&gt;
&lt;li&gt;What are the user's goals?&lt;/li&gt;
&lt;li&gt;What is their experience level with this type of feature/change/document?&lt;/li&gt;
&lt;li&gt;What functions does the feature/change/document need to perform?&lt;/li&gt;
&lt;li&gt;What is the environment/context is the user in?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let's look at an example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An employee and manager (the Users) have access to an Intranet which will have a new "Time sheet" system added.&lt;/li&gt;
&lt;li&gt;An employee would use it to put in time sheets (Employee Goal). A manager uses it to manage resources (Manager Goal).&lt;/li&gt;
&lt;li&gt;All employees know how to use such a system (Employee Experience Level) but not all managers do (Manager Experience Level).&lt;/li&gt;
&lt;li&gt;A manager might need to print a Work-In-Progress report (Manager Function). An employee needs to enter time for specific clients (Employee Function).&lt;/li&gt;
&lt;li&gt;Employees are often off-site (Employee Environment/Context).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When designing interfaces etc, breaking down a user like this is extremely helpful for identifying non-obvious requirements.&lt;/p&gt;
&lt;h2 id="developers-are-users-too"&gt;Developers are users too!&lt;/h2&gt;
&lt;p&gt;For developers, you actually have an additional user that isn't likely documented - your developer colleagues! Writing something only a handful of your colleagues would understand due to its complexity is pretty bad. Also writing something in a completely different platform than anyone has worked on before isn't going to win you any friends when something breaks at 3am and they need to debug it.&lt;/p&gt;
&lt;h2 id="everyday-ways-to-improve-ux"&gt;Everyday ways to improve UX&lt;/h2&gt;
&lt;p&gt;Depending on where you work or what you work on, you may or may not be "officially" involved with the UX of what you are building. I'm here to tell you that doesn't matter!&lt;/p&gt;
&lt;p&gt;It starts as simply as asking questions and looking at requirements from a different perspective. If you're a DB admin and know a slow query can be improved, tell your backend developers. If you are a backend developer that knows just learnt about GraphQL, tell your frontend developers (if they didn't already know). If your frontend developers see that fields 3, 5 and 7 in a particular form could be misinterpreted, talk to your boss about changing that. If you're a Senior Developer, helping the Junior Developers on your team write easier to read code.&lt;/p&gt;
&lt;p&gt;Each little bit of UX improvement like faster queries, cleaner code, more flexible APIs and less ambiguous fields can help developers and end-users alike.&lt;/p&gt;
&lt;p&gt;Don't think because your job title says XYZ that you can't help in other ways! All our work combined forms the complete User Experience.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Thanks for reading!&lt;/p&gt;
&lt;p&gt;This post was partially inspired by &lt;a href="https://www.youtube.com/watch?v=yY96hTb8WgI"&gt;the Vox/99% Invisible video "Its not you. Bad doors are everywhere."&lt;/a&gt; which talks about bad design in real life. Check out that video for more details about user-centered design.&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>Building a Polite Web Crawler</title>
			<link>https://turnerj.com/blog/building-a-polite-web-crawler</link>
			<description>The culmination of multiple of my libraries</description>
			<enclosure url="https://turnerj.com/blog/images/social/building-a-polite-web-crawler.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/building-a-polite-web-crawler</guid>
			<pubDate>Sat, 13 Apr 2019 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;Web crawling is the act of having a program or script accessing a website, capturing content and discovering any pages linked to from that content. On the surface it really is only performing HTTP requests and parsing HTML, both things that can be quite easily accomplished in a variety of languages and frameworks.&lt;/p&gt;
&lt;p&gt;Web crawling is an extremely important tool for search engines or anyone wanting to perform analysis of a website. The act of crawling a site though can consume a lot of resources for the site operator depending how the site is crawled.&lt;/p&gt;
&lt;p&gt;For example, if you crawl an 1000 page site in a few seconds, you've likely caused a not insignificant amount of server load for low-bandwidth hosting. What if you crawled a slow-loading page but your crawler didn't handle it properly, continuously re-querying the same page. What if you are just crawling pages that shouldn't be crawled. These things can lead to very upset website operators.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/building-a-polite-web-crawler-1-kkpcRessCvNyo.gif" alt="The character Maurice Moss from the TV Show &amp;quot;IT Crowd&amp;quot; throwing his computer monitor."&gt;&lt;/p&gt;
&lt;p&gt;In a previous article, &lt;a href="https://turnerj.com/blog/no-robots-allowed"&gt;I wrote about the Robots.txt file&lt;/a&gt; and how that can help address these problems from the website operator's perspective. Web crawlers should (but don't have to) abide by the rules governed in that file to prevent getting blocked. In addition to the Robots.txt file, there are some other things crawlers should do to avoid being blocked.&lt;/p&gt;
&lt;p&gt;When crawling a website on a large scale, especially for commercial purposes, it is a good idea to provide a custom User Agent, allowing website operators a chance to restrict what pages can be crawled.&lt;/p&gt;
&lt;p&gt;Crawl frequency is another aspect you will want to to refine to allow you to crawl a site fast enough without being a performance burden. It is highly likely you will want to limit crawling to a handful of requests a second. It is also a good idea to track how long requests are taking and to start throttling the crawler to compensate for potential site load issues.&lt;/p&gt;
&lt;h2 id="web-crawling-with-manners"&gt;Web crawling with manners&lt;/h2&gt;
&lt;p&gt;I spend my days programming in the world of .NET and had a need for a web crawler for a project of mine. There are some popular web crawlers already out there including &lt;a href="https://github.com/sjdirect/abot/"&gt;Abot&lt;/a&gt; and &lt;a href="https://github.com/dotnetcore/DotnetSpider"&gt;DotnetSpider&lt;/a&gt; however for different reasons they didn't suit my needs.&lt;/p&gt;
&lt;p&gt;I originally did have Abot setup in my project however I have been porting my project to .NET Core and it didn't support it. The library also uses a no longer support version of a library that does parsing of Robots.txt files.&lt;/p&gt;
&lt;p&gt;With DotnetSpider, it does support .NET Core but it is designed around an entire different process of using it with message queues, model binding and built-in DB writing. These are cool features but excessive for my own needs.&lt;/p&gt;
&lt;p&gt;I wanted a simple crawler, supporting async/await, with .NET Core support thus &lt;a href="https://github.com/TurnerSoftware/InfinityCrawler"&gt;InfinityCrawler&lt;/a&gt; was born!&lt;/p&gt;
&lt;p&gt;I'll be honest, I don't know why I called it InfinityCrawler - it sounded cool at the time so I just went with it.&lt;/p&gt;
&lt;p&gt;This crawler is in .NET Standard and builds upon both my &lt;a href="https://github.com/TurnerSoftware/SitemapTools"&gt;SitemapTools&lt;/a&gt; and &lt;a href="https://github.com/TurnerSoftware/RobotsExclusionTools"&gt;RobotsExclusionTools&lt;/a&gt; libraries. It uses the Sitemap library to help seed the list of URLs it should start crawling.&lt;/p&gt;
&lt;p&gt;It has built in support for crawl frequency including obeying frequency defined in the Robots.txt file. It can detect slow requests and auto throttle itself to avoid thrashing the website as well as detect when performance improves and return back to normal.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;using InfinityCrawler;

var crawler = new Crawler();
var results = await crawler.Crawl(siteUri, new CrawlSettings
{
	UserAgent = "Your Awesome Crawler User Agent Here"
});
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;InfinityCrawler, while available for use in any .NET project, it still is in its early stages. I am happy with its core functionality but likely will go through a few stages of restructure as well as expanding on the testing.&lt;/p&gt;
&lt;p&gt;I am personally pretty proud of &lt;a href="https://github.com/TurnerSoftware/InfinityCrawler/blob/1cf1eebc6b2b2f204cb6cdc189ffa33e1001af16/src/InfinityCrawler/TaskHandlers/ParallelAsyncTaskHandler.cs"&gt;how I implemented the async/await part&lt;/a&gt; but would love to talk to anyone that is an expert in this area with .NET to check my implementation and give pointers on how to improve it.&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>Halt and Hangfire</title>
			<link>https://turnerj.com/blog/halt-and-hangfire</link>
			<description>Scheduling background and recurring jobs with Hangfire</description>
			<enclosure url="https://turnerj.com/blog/images/social/halt-and-hangfire.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/halt-and-hangfire</guid>
			<pubDate>Tue, 09 Apr 2019 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;I have a website and I want to schedule a task to run every X minutes. Majority of the time, you would reach for &lt;a href="https://en.wikipedia.org/wiki/Cron"&gt;Cron&lt;/a&gt;, throw together a fancy CRON expression and you would be on your way.&lt;/p&gt;
&lt;p&gt;This is great for a world where you deploy to a Linux server but isn't how you approach scheduling tasks on Windows. In Windows, you have the Task Scheduler and while it is powerful, I have always found it a bit cumbersome. In any case, using either Cron or Task Scheduler, you have achieved your goal of having a scheduled task run.&lt;/p&gt;
&lt;p&gt;Now, like many projects, requirements change and you not only need a scheduled task but adhoc background tasks too. That's fine, you can work out some way of instantiating another "instance" of your code in the background to do work. It might be something dodgy like a website doing a request to itself and not waiting for a response or something as simple as running a command on the shell. Again, your problems are "solved".&lt;/p&gt;
&lt;p&gt;Another change to requirements comes around (third times the charm, am I right?) and now you need a dashboard to view and manage these tasks...&lt;/p&gt;
&lt;p&gt;I can't speak to how you would achieve this in other frameworks however if you are using .NET, you are in luck thanks to &lt;a href="https://www.hangfire.io/"&gt;Hangfire&lt;/a&gt; by &lt;a href="https://twitter.com/odinserj"&gt;Sergey Odinokov&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="whats-hangfire"&gt;What's Hangfire?&lt;/h2&gt;
&lt;p&gt;Hangfire is a library that allows you to have both scheduled and adhoc background tasks in your application, backed by persistent storage. These tasks can then be viewed through the Hangfire dashboard to see what is running, what will run and what has failed.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/hangfire-queue.png" alt="Screenshot of queues in Hangfire"&gt;&lt;/p&gt;
&lt;p&gt;Hangfire supports automatic retrying of tasks and can link into both your error logging and dependency injection systems making it easy to connect to your application.&lt;/p&gt;
&lt;p&gt;One of my favourite features of Hangfire however is that due to how it is built, it automatically supports distributed tasks across multiple servers. With this in mind, it is good to have many small tasks to make the most use of this.&lt;/p&gt;
&lt;p&gt;It isn't limited to ASP.NET applications either, you can have your Hangfire server be a Windows Service for all it cares!&lt;/p&gt;
&lt;p&gt;Hangfire is free for personal and commercial use however has premium features like batching/grouping tasks together.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/hangfire-batch.png" alt="Screenshot of batching tasks in Hangfire"&gt;&lt;/p&gt;
&lt;h2 id="example"&gt;Example&lt;/h2&gt;
&lt;p&gt;Before anything else, you will need to add &lt;a href="https://www.nuget.org/packages?q=hangfire"&gt;Hangfire from Nuget&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For an ASP.NET Core application, you would need to update your &lt;code&gt;ConfigureServices&lt;/code&gt; method in your Startup class to include the following:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;public void ConfigureServices(IServiceCollection services)
{
	// ...

	services.AddHangfire(c =&amp;gt;
	{
		c.UseSqlServerStorage("YOUR_CONNECTION_STRING");
	});

	// ...
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While my example is using SQL Server, Hangfire supports various other storage systems including &lt;a href="https://github.com/sergeyzwezdin/Hangfire.Mongo"&gt;MongoDB&lt;/a&gt; or &lt;a href="https://www.hangfire.io/pro/#hangfireproredis"&gt;Redis&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now in the &lt;code&gt;Configure&lt;/code&gt; method of your Startup class, you need to actually trigger the server and dashboard (though dashboard is optional).&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;app.UseHangfireDashboard("/hangfire");
app.UseHangfireServer();
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now when your application begins, the Hangfire server will start and be able to process tasks. You will also be able to view the Hangfire dashboard at the path you specified on &lt;code&gt;UseHangfireDashboard&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So that gets the server start, what about the tasks that are meant to run in the background?&lt;/p&gt;
&lt;h3 id="scheduled-tasks"&gt;Scheduled Tasks&lt;/h3&gt;
&lt;p&gt;In your &lt;code&gt;Configure&lt;/code&gt; method in your Startup class, you will want to add something like the following for a scheduled task:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;RecurringJob.AddOrUpdate(() =&amp;gt; Console.Write("Look ma, a recurring task!"), "0 * * * *");
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While that is a simple example, you do need to keep your background tasks simple because the expression is serialised. With how I use Hangfire for scheduled tasks, it looks more like:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;RecurringJob.AddOrUpdate&amp;lt;MyBackgroundTaskClass&amp;gt;(instance =&amp;gt; instance.RunTask("Whatever", "arguments", 1, "like"), "MY CRON EXPRESSION")
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This would create an instance of &lt;code&gt;MyBackgroundTaskClass&lt;/code&gt; (with DI support) and call the &lt;code&gt;RunTask&lt;/code&gt; method with the specified arguments. You can pass various arguments to the tasks however for compatibility and simplicity, it is best to provide only essential arguments in the simplest form. This is because the data is serialised and some types serialise easier than others. Personally, I would pass IDs referencing items in the DB that the task should then query itself.&lt;/p&gt;
&lt;p&gt;In that example, the class &lt;code&gt;MyBackgroundTaskClass&lt;/code&gt; nor the &lt;code&gt;RunTask&lt;/code&gt; method need to be anything special. They don't need special attributes or does the class need to inherit from a specific class or interface.&lt;/p&gt;
&lt;h3 id="adhoc-tasks"&gt;Adhoc Tasks&lt;/h3&gt;
&lt;p&gt;For adhoc tasks anywhere in your application, it is just as simple:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-csharp"&gt;BackgroundJob.Enqueue&amp;lt;MyBackgroundTaskClass&amp;gt;(instance =&amp;gt; instance.RunTask("Whatever", "arguments", 1, "like"))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Same rules apply like the recurring job - you will want to keep the arguments simple rather than passing through complex objects.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Hangfire provides a simple way for you to manage background tasks, scheduled and adhoc, for your .NET application. I'm only scratching the surface in this post, there are many more nuanced pieces of functionality that make Hangfire great.&lt;/p&gt;
&lt;p&gt;I highly recommend &lt;a href="https://docs.hangfire.io/en/latest/getting-started/index.html"&gt;going through the documentation&lt;/a&gt; for a more in-depth dive into Hangfire.&lt;/p&gt;
&lt;p&gt;Hope you enjoyed reading this and will give &lt;a href="https://www.hangfire.io/"&gt;Hangfire&lt;/a&gt; a look in your next project!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://turnerj.com/images/halt-and-hangfire-1-1179nmuBk7YPg4.gif" alt="The character Joe MacMillan in the TV Show &amp;quot;Halt and Catch Fire&amp;quot; saying &amp;quot;I didn't build this. I don't own it.&amp;quot;"&gt;&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
		<item>
			<title>The troubles of dealing with Time Zones</title>
			<link>https://turnerj.com/blog/when-you-diy-a-time-zone-library</link>
			<description>Don't DIY your own Time Zone library</description>
			<enclosure url="https://turnerj.com/blog/images/social/when-you-diy-a-time-zone-library.jpg" length="0" type="image" />
			<guid>https://turnerj.com/blog/when-you-diy-a-time-zone-library</guid>
			<pubDate>Wed, 03 Apr 2019 00:00:00 GMT</pubDate>
			<content:encoded>&lt;p&gt;You are an experienced developer who works on a Calendar application, allowing users to add events to their calendar including time and location. A new JIRA issue comes in for you to let users know when there is X amount of time before the event starts. To do this accurately with events potentially being in different regions, you need to think about time zones.&lt;/p&gt;
&lt;p&gt;You being the experienced developer you know there has to be a list of time zones by country, all you need is that and calculate the difference between the timezones. When you find &lt;a href="https://en.wikipedia.org/wiki/List_of_time_zones_by_country"&gt;that list&lt;/a&gt;, you realise it is a bit more complicated than that with a number of countries having 2 or more time zones but no sweat, you have the location information of the event and the time zone the current user is in. You find &lt;a href="https://en.wikipedia.org/wiki/List_of_tz_database_time_zones"&gt;the more detailed list of time zones&lt;/a&gt; and continue on your development journey.&lt;/p&gt;
&lt;p&gt;Time zones are typically in 30-minute increments for most countries with the exception of &lt;a href="https://en.wikipedia.org/wiki/UTC%2B05:45"&gt;Nepal&lt;/a&gt; which has been at offset +05:45 since 1986. Not a problem though, you wouldn't have constrained your system to only hold the data in 30 minute increments - you probably have it being able to support individual minute increments.&lt;/p&gt;
&lt;p&gt;You might recall that &lt;a href="https://en.wikipedia.org/wiki/Daylight_saving_time"&gt;weird thing that happens twice a year&lt;/a&gt; which throws the time back or forward an hour depending on the time of year and hemisphere you're in. That starts to complicate things because now you need to have data on &lt;a href="https://en.wikipedia.org/wiki/Daylight_saving_time_by_country"&gt;all the countries that follow daylight savings time as well as which direction they go&lt;/a&gt; at different points in the year. Did I mention that DST doesn't occur on the same day for everyone?&lt;/p&gt;
&lt;p&gt;Hopefully you have realised by now that typing this data into your code or database would take way too long so you're looking up something you can import into your code, you know, &lt;em&gt;if&lt;/em&gt; the data ever changes. You find that the Internet Assigned Numbers Authority (IANA) has such &lt;a href="https://www.iana.org/time-zones"&gt;a database that you can download&lt;/a&gt;. Brilliant thing with this is that it contains DST information too! It seems like your luck has turned and this will still be easy.&lt;/p&gt;
&lt;p&gt;Unfortunately you also realise that this database was modified very recently and has an &lt;a href="https://mm.icann.org/mailman/listinfo/tz-announce"&gt;&amp;quot;Announcement&amp;quot; mailing list&lt;/a&gt;, that doesn't bode well for a once-off import. No problem though, that is a CRON job calling an importer script in the scheme of things.&lt;/p&gt;
&lt;p&gt;You're gonna call it here and say this is good enough. You don't want to worry about political time zone issues, you don't need to worry about historic time zone issues and you definitely don't need to worry about interplanetary time zone issues. You tidy up the rest of your code, resolve the JIRA issue and go on your merry way.&lt;/p&gt;
&lt;p&gt;At least your project wasn't needing to calculate the seconds between two points in time, dependent on time zone...&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;This post is inspired by a fantastic &lt;a href="https://www.youtube.com/watch?v=-5wpm-gesOY"&gt;Computerphile video about time zones&lt;/a&gt; by &lt;a href="https://twitter.com/tomscott"&gt;Tom Scott&lt;/a&gt;. His example goes into some of the more nitty gritty issues like the problems with historic times or leap seconds and I highly recommend you watch it if this article was remotely interesting.&lt;/p&gt;
</content:encoded>
			<comments xmlns="http://purl.org/rss/1.0/modules/slash/">0</comments>
		</item>
	</channel>
</rss>