The pain points of C# source generators - Turnerj (aka. James Turner)

Update (February 2022): Debugging source generators is a lot better and an update on transient dependencies.

I've recently completed my first foray into writing a C# source generator for Schema.NET. There is a lot to like about source generators however there are a few things I wish I understood more before diving into it.

For those that are unaware, source generators are a new feature added to C# whereby one can analyse existing source code and generate new source code all from C# itself. One area where this is of interest is serialization - being able to generate an ideal serializer at compile time prevents the need of using reflection at runtime.

In Schema.NET, we had hundreds of classes and interfaces that mapped to Schema.org types. While we had our own tool to generate these, the generated files sat in our Git repository creating a lot of noise when trying to change our tooling behaviour. Source generators would allow us to remove these files and have them exist only as part of the compiled binary. The move to source generators was also a good time to refactor the generating logic itself, making it easier to add new features later.

Pain Point 1: Debugging Source Generators

Honestly I expected the debugging process to be:

Put a breakpoint in the source generator code
Press the "Debug" button in Visual Studio
Code stops at the breakpoint

Unfortunately, it isn't that simple. The source generator runs during compilation however the debugging experience starts after meaning our break point would never be hit. After some research, it seems there are two different methods suggested.

Invoke the debugger from the source generator

Found this solution from Nick's .NET Travels. Inside our source generator, likely in the "Initialize" method, we can invoke the debugger to attach to the current process with the following:

#if DEBUG
if (!Debugger.IsAttached)
{
    Debugger.Launch();
}
#endif

What we are doing here is using the preprocessor directive #if to conditionally include this code if the build configuration is "Debug". When we are in the "Debug" configuration, we check if the debugger is already attached and if not, attach it via Debugger.Launch(). After the debugger launches, it comes up with a prompt about where to debug it (I chose a new instance of Visual Studio). From here, the code will be paused on the Debugger.Launch() line and this new instance of Visual Studio will listen for any breakpoints you may add.

I probably spent a good few hours using this method and while it works, it is not a great experience. For starters, the prompt I mention, it was appearing multiple times during a debugging session. I'm not sure if the issue related to different target frameworks building simultaneously or maybe some timeout logic being handled by the build process. Additionally I had Visual Studio crash a few times in either instance of Visual Studio I had open.

Don't take my word for it, others have had similar difficulties.

Run the source generator manually

A source generator itself is effectively like any other class - we can instantiate and call the initialization methods ourselves. There is a detailed document in the Roslyn repo that covers all sorts of things with regards to source generators. One of the sections specifically covers testing source generators.

Here is a modified version of their example that shows the general gist:

Compilation inputCompilation = CreateCompilation(&#64;"
namespace MyCode
{
    public class Program
    {
        public static void Main(string[] args)
        {
        }
    }
}
");

CustomGenerator generator = new CustomGenerator();

// Create the driver that will control the generation, passing in our generator
GeneratorDriver driver = CSharpGeneratorDriver.Create(generator);

// Run the generation pass
driver.RunGeneratorsAndUpdateCompilation(inputCompilation, out var outputCompilation, out var diagnostics);

static Compilation CreateCompilation(string source)
    => CSharpCompilation.Create("compilation",
        new[] { CSharpSyntaxTree.ParseText(source) },
        new[] { MetadataReference.CreateFromFile(typeof(Binder).GetTypeInfo().Assembly.Location) },
        new CSharpCompilationOptions(OutputKind.ConsoleApplication));

Basically this creates a compilation that the source generator can run against. This method can be quite verbose as, depending on your source generator itself, you may require a lot of boilerplate source code for your generator to work upon.

In my case with Schema.NET, I'm generating hundreds of classes based on some JSON so I have minimal boilerplate. I could have gone this route however I decided on a more direct approach:

var generator = new SchemaSourceGenerator();
generator.Initialize(new Microsoft.CodeAnalysis.GeneratorInitializationContext());
generator.Execute(new Microsoft.CodeAnalysis.GeneratorExecutionContext());

My generator didn't care about any existing syntax tree - its job was to just pump out new classes and interfaces. This method does have a bit of a fatal flaw in that calling most (any?) of the methods on GeneratorInitializationContext or GeneratorExecutionContext may fail. These types are not instantiated with their different properties correctly configured which is something that more verbose way above did. For my SchemaSourceGenerator, I needed to comment out context.AddSource(sourceName, sourceText) so it wouldn't throw an exception.

My recommendation is for anyone working on a source generator, either have a separate console application to debug your source generator or create a special unit test. Do it properly though and have the more verbose compilation code as shown in the earlier example so you don't need to modify your source generator to run it.

Pain Point 2: No Async/Await

The methods exposed by source generators (Initialize and Execute) do not return tasks so you can't invoke async APIs. According to the Roslyn team this is by design as the IO for reading/writing files is handled by the compiler.

For Schema.NET, we do a HTTP request to get the JSON we need to build. There are reasons this isn't a good idea but this is what we do and it works well for us. The HttpClient has only had async APIs for a long while and while that is changing, source generators must target .NET Standard 2.0 so we can't leverage that change.

My first iteration of getting the source generator to work was effectively wrapping my code in a Task.Run() call:

public void Initialize(GeneratorInitializationContext context) => Task.Run(async () =>
{
    ...

    SchemaObjects = await schemaService.GetObjectsAsync();
}).GetAwaiter().GetResult();

This admittedly did work but I really didn't like it - it felt like such a kludge solution. There is a lot of information available about when and where you should be using Task.Run() - Stephen Cleary has a good blog post or two about it. While a source generator is likely a new special case where it depends, I still decided to change it. I ended up with calling .GetAwaiter().GetResult() directly on the method of mine that was async instead.

public void Initialize(GeneratorInitializationContext context)
{
    ...

    SchemaObjects = schemaService.GetObjectsAsync().GetAwaiter().GetResult();
}

I'll be honest - I don't know if this is technically better in this scenario but I know it works.

Pain Point 3: Transient Dependencies

An issue with dependencies was something I wasn't expecting at all when I started with my source generator - why should it be? Every other library and application I've written in C# in the last few years follows a fairly predictable pattern of using a <PackageReference> to define which package and version. The basics of including a package reference like that for source generators is still the same, it is just all the other bits it now also requires.

For Schema.NET, our source generator was parsing JSON so we needed a serializer. We were previously using Newtonsoft.Json for our tool however in this refactor, we were also moving to using System.Text.Json for the parsing of the initial schema data from Schema.org. This dependency needs to only exist for the generator, not the library the generator is creating classes etc for. Normally you can just specify PrivateAssets="all" on the package reference and that's it but for source generators, you need to specify a few more things:

<ItemGroup>
    <PackageReference Include="System.Text.Json" Version="5.0.1" GeneratePathProperty="true" PrivateAssets="all" />
</ItemGroup>

<PropertyGroup>
    <GetTargetPathDependsOn>$(GetTargetPathDependsOn);GetDependencyTargetPaths</GetTargetPathDependsOn>
</PropertyGroup>

<Target Name="GetDependencyTargetPaths">
    <ItemGroup>
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Text_Json)\lib\netstandard2.0\System.Text.Json.dll" IncludeRuntimeDependency="false" />
    </ItemGroup>
</Target>

Not too bad right? Well, what if I told you that you needed to do this for all dependencies. By that I mean every dependency in the dependency tree which for us was:

Microsoft.Bcl.AsyncInterfaces, 5.0.0
System.Buffers, 4.5.1
System.Memory, 4.5.4
- System.Numerics.Vectors, 4.4.0
System.Numerics.Vectors, 4.5.0
System.Runtime.CompilerServices.Unsafe, 5.0.0
System.Text.Encodings.Web, 5.0.0
System.Threading.Tasks.Extensions, 4.5.4

Our example would look more like:

<ItemGroup>
    <PackageReference Include="System.Text.Json" Version="5.0.1" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="Microsoft.Bcl.AsyncInterfaces" Version="5.0.0" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="System.Runtime.CompilerServices.Unsafe" Version="5.0.0" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="System.Threading.Tasks.Extensions" Version="4.5.4" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="System.Text.Encodings.Web" Version="5.0.1" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="System.Buffers" Version="4.5.1" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="System.Memory" Version="4.5.4" GeneratePathProperty="true" PrivateAssets="all" />
    <PackageReference Include="System.Numerics.Vectors" Version="4.4.0" GeneratePathProperty="true" PrivateAssets="all" />
</ItemGroup>

<PropertyGroup>
    <GetTargetPathDependsOn>$(GetTargetPathDependsOn);GetDependencyTargetPaths</GetTargetPathDependsOn>
</PropertyGroup>

<Target Name="GetDependencyTargetPaths">
    <ItemGroup>
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Text_Json)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGMicrosoft_Bcl_AsyncInterfaces)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Runtime_CompilerServices_Unsafe)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Threading_Tasks_Extensions)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Buffers)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Memory)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Numerics_Vectors)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
        <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Text_Encodings_Web)\lib\netstandard2.0\*.dll" IncludeRuntimeDependency="false" />
    </ItemGroup>
</Target>

If any of these dependencies pick up any new dependencies themselves, they need to be included too - this can happen with patch version changes like between System.Text.Encodings.Web going from 5.0.0 to 5.0.1 where it picked up a few new dependencies.

Currently for Schema.NET, I'm only specifying System.Text.Json and System.Text.Encodings.Web directly which allows our builds to work on our CI but Visual Studio complains during the build. I raised an issue with the Roslyn team about this extra weird behaviour though it seems to amount for a difference between builds triggered by .NET Framework (Visual Studio and MSBuild) and .NET Core (dotnet build).

My biggest gripe here though is: Why doesn't the compiler just do this for us?

The compiler knows all our dependencies so with some sort of flag to indicate that this is a source generator, the compiler should do all this work for us. The burden to make sure we keep track of all transient dependencies when any dependency gets an update is something I don't want to do.

Potential Transient Dependency Workaround

While not a perfect solution, if you are like me and really don't like specifying every package reference in the dependency tree like that, you can automate it somewhat with a custom MSBuild target.

<ItemGroup>
    <PackageReference Include="System.Text.Json" Version="5.0.1" PrivateAssets="all" />
</ItemGroup>

<PropertyGroup>
    <GetTargetPathDependsOn>$(GetTargetPathDependsOn);GetDependencyTargetPaths</GetTargetPathDependsOn>
</PropertyGroup>

<Target Name="GetDependencyTargetPaths" AfterTargets="ResolvePackageDependenciesForBuild">
    <ItemGroup>
        <TargetPathWithTargetPlatformMoniker Include="@(ResolvedCompileFileDefinitions)" IncludeRuntimeDependency="false" />
    </ItemGroup>
</Target>

This "works" in the sense that ResolveCompileFileDefinitions does contain a list of our transient dependencies so everything that needs to be passed in is passed in. The problem with this solution is that ResolveCompileFileDefinitions contains more than the specific dependencies we are wanting and could have undesired behaviour.

Ideally I'd like something like this to be an automatic target for source generator projects but perfected to target only private dependencies so they are bundled correctly.

Conclusion: Was migrating to source generators worth it?

Yes.

Switching to source generators, combined with my refactor, added 700 lines of code while removing 69,203 lines of code. My pull request affected 765 files, the vast majority being generated classes and interfaces that no longer need to sit in the repository.

The refactor of our generation code also sets us up nicely for the future where we can support pending Schema.org types (something that has been requested by a few people).

While these pain points are annoying, source generators are a great feature that I hope getting tool updates to improve the developer experience.