Turnerj Won't somebody please think of the bytes!

Ref-structs are technically [Obsolete]

Dec 19, 2022

TL;DR: the Obsolete attribute on ref-structs tripped up my code coverage reports

It might not be the most recent feature to C#, coming in C# 7.2, but ref-struct support is awesome. For those that don't know, a ref-struct is a type that is allocated on the stack and can't escape to the managed heap. The most ubiquitous form of this is Span - a powerful type that can help avoid allocations in a number of scenarios and provide a consistent way to access different types of sequential data.

The restriction of not escaping to the managed heap means that you can't store ref-structs like Span on a class or even a normal struct. If you want to pass around a ref-struct as a member of another type, that type will itself need to be a ref-struct.

I'm a heavy Span user, particular for my work with tokenizers and other parsing techniques. One design pattern I've used with tokenizers before is to hold the Span on my type so with that in mind, my tokenizers might also be a ref-struct.

Here is a basic example of what one might look like:

ref struct CustomReader
{
    private const byte EndOfFile = 0x00;

    public readonly ReadOnlySpan<byte> Data;
    private int Offset;

    public CustomReader(ReadOnlySpan<byte> data)
    {
        Data = data;
        Offset = 0;
    }

    public bool MoveNext(out Token token)
    {
        if (Offset + 1 >= Data.Length)
        {
            token = default;
            return false;
        }

        token = Data[Offset] switch
        {
            (byte)' ' or (byte)'\t' => ReadWhitespace(),
            (byte)'#' => ReadComment(),
            _ => ReadValue()
        };
        return true;
    }

    private byte Current
    {
        get
        {
            if (Offset < Data.Length)
            {
                return Data[Offset];
            }
            return EndOfFile;
        }
    }

    private Token ReadWhitespace()
    {
        var startIndex = Offset;
        Offset++;
        while (true)
        {
            switch (Current)
            {
                case (byte)' ':
                case (byte)'\t':
                    Offset++;
                    continue;
                default:
                    return new Token(TokenType.Whitespace, Data[startIndex..Offset]);
            }
        }
    }

    private Token ReadComment()
    {
        var startIndex = Offset;
        Offset++;
        var newLineIndex = Data[Offset..].IndexOfAny((byte)'\r', (byte)'\n');
        if (newLineIndex == -1)
        {
            Offset = Data.Length;
        }
        else
        {
            Offset += newLineIndex;
        }
        return new Token(TokenType.Comment, Data[startIndex..Offset]);
    }

    private Token ReadValue()
    {
        var startIndex = Offset;
        Offset++;
        while (true)
        {
            switch (Current)
            {
                case (byte)' ':
                case (byte)'\t':
                case (byte)'#':
                case EndOfFile:
                    return new Token(TokenType.Value, Data[startIndex..Offset]);
                default:
                    Offset++;
                    break;
            }
        }
    }
}

readonly ref struct Token
{
    public readonly TokenType TokenType;
    public readonly ReadOnlySpan<byte> Value;

    public Token(TokenType tokenType, ReadOnlySpan<byte> value)
    {
        TokenType = tokenType;
        Value = value;
    }
}

enum TokenType
{
    Whitespace,
    Comment,
    Value
}

I can then use a tokenizer like this fairly simply like so:

using System.Text;

var reader = new CustomReader("Hello World  # Test Comment"u8);

while (reader.MoveNext(out var token))
{
    Console.WriteLine($"{token.TokenType}: \"{Encoding.UTF8.GetString(token.Value)}\"");
}

// Output:
//  Value: "Hello"
//  Whitespace: " "
//  Value: "World"
//  Whitespace: "  "
//  Comment: "# Test Comment"

While this tokenizer is pretty simple, once you have more token types or obscure optimizations, things can get pretty complex pretty fast. I'd typically write a bunch of unit tests for such a tokenizer and leverage code coverage reports to ensure I've got good coverage across the code. For my open source libraries, I have a very similar set of rules about what to include and exclude from coverage.

Here is a typical CodeCoverage.runsettings file I might have:

<?xml version="1.0" encoding="utf-8"?>  
<RunSettings>  
  <DataCollectionRunSettings>  
    <DataCollectors>
      <DataCollector friendlyName="XPlat code coverage">
        <Configuration>
          <Format>cobertura</Format>
          <Exclude>[MongoFramework.Tests]*</Exclude>
          <Include>[MongoFramework]*,[MongoFramework.*]*</Include>
          <ExcludeByAttribute>Obsolete,GeneratedCodeAttribute,CompilerGeneratedAttribute</ExcludeByAttribute>
          <UseSourceLink>true</UseSourceLink>
          <SkipAutoProps>true</SkipAutoProps>
        </Configuration>
      </DataCollector>
    </DataCollectors>  
  </DataCollectionRunSettings>  
</RunSettings>

One of the specific rules I have is excluding code marked with an Obsolete attribute. I find this useful as if I've got to the point where I mark some code as obsolete, I don't really care about its coverage anymore. The key part here though is I mark the code as obsolete which is what brings me back to ref-structs.

When I was writing a particular tokenizer, very similar to the one above, I had the Obsolete attribute excluded from my code coverage reports. What I found though was my entire tokenizer was being excluded too which didn't make sense to me. Clearly my tokenizer isn't compiler generated, it wasn't using auto-props and wasn't in the wrong namespace or assembly - something else was tripping up the report.

I couldn't understand why but I had the thought of decompiling the source with JetBrain's dotPeek. I've used this tool a bunch for debugging some weird compilation issues before that I've had and thought it would give me some insight and it didn't disappoint.

Here is what my tokenizer looked like:

[IsByRefLike]
[Obsolete("Types with embedded references are not supported in this version of your compiler.", true)]
[CompilerFeatureRequired("RefStructs")]
internal ref struct CustomReader
{

I definitely didn't put that Obsolete attribute there so what is going on?

Why are ref-structs [Obsolete]?

The message in the Obsolete attribute probably gives it away - it is a measure for older compilers that don't know the rules around handling ref-structs.

Microsoft does explain why in the docs if you know where to look:

Having no other good alternatives that work in old compilers without servicing, an Obsolete attribute with a known string will be added to all ref-like structs. Compilers that know how to use ref-like types will ignore this particular form of Obsolete.

NOTE: it is not the goal to make it so that any use of ref-like types on old compilers fails 100%. That is hard to achieve and is not strictly necessary. For example there would always be a way to get around the Obsolete using dynamic code or, for example, creating an array of ref-like types through reflection.

In particular, if user wants to actually put an Obsolete or Deprecated attribute on a ref-like type, we will have no choice other than not emitting the predefined one since Obsolete attribute cannot be applied more than once.

While this was definitely a surprise to find out, I also think it is actually pretty clever. Being able to use something that older compilers would already know about to have them fail when using something they don't understand is smart. Also that the docs call out the scenario if you wanted to put your own Obsolete attribute on the type is good - that would have been another confusing issue if you couldn't add your own Obsolete attribute.

Unfortunately even knowing this information, I'm relatively limited in what I can do for my code coverage problem - if I want a ref-struct in my code coverage reports, I can't exclude Obsolete attributes anymore. I've got an open issue for Coverlet for trying to work around it that way, perhaps looking at the specific message on the attribute.

Alternatively, I can just not have my tokenizer as a ref-struct and instead of holding a ReadOnlySpan, it holds a ReadOnlyMemory, which doesn't have the same restrictions. This is the approach I've done on a recent tokenizer I wrote for my Robots Exclusion Tools library.

Anyway, if you find yourself scratching your head as to why your custom ref-struct isn't in a code coverage report, it might just be because ref-structs are technically obsolete.

This post is part of the 2022 C# advent calendar. Check out the other articles posted there as part of the event.