Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSHARP-5202: BSON Binary Vector Subtype Support #1581

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

BorisDog
Copy link
Contributor

No description provided.

@BorisDog BorisDog requested a review from a team as a code owner December 27, 2024 00:28
@BorisDog BorisDog requested review from JamesKovacs, papafe and sanych-sun and removed request for a team and JamesKovacs December 27, 2024 00:28
Copy link
Contributor

@papafe papafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it makes sense, I just got some notes/questions

src/MongoDB.Bson/ObjectModel/BsonVector.cs Outdated Show resolved Hide resolved
src/MongoDB.Bson/ObjectModel/BsonVector.cs Outdated Show resolved Hide resolved
src/MongoDB.Bson/Serialization/BsonVectorReader.cs Outdated Show resolved Hide resolved
src/MongoDB.Bson/Serialization/BsonVectorReader.cs Outdated Show resolved Hide resolved
src/MongoDB.Bson/Serialization/BsonVectorReader.cs Outdated Show resolved Hide resolved
BorisDog and others added 2 commits January 2, 2025 14:59
Co-authored-by: Ferdinando Papale <[email protected]>
@BorisDog BorisDog requested a review from papafe January 2, 2025 23:00
src/MongoDB.Bson/ObjectModel/VectorDataType.cs Outdated Show resolved Hide resolved
{
var itemType = type.GetElementType();
var vectorSerializerType = typeof(BsonVectorArraySerializer<>).MakeGenericType(itemType);
var vectorSerializer = (IBsonSerializer)Activator.CreateInstance(vectorSerializerType, DataType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to move this into BsonVectorArraySerializer.Create(Type itemType) static factory method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and more similar below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible as BsonVectorArraySerializer and the rest are generic classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can create non-generic static class just to have this factory methods. We are doing it in other places. (far example, NullableSerializer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/MongoDB.Bson/ObjectModel/VectorDataType.cs Outdated Show resolved Hide resolved
/// </summary>
/// <typeparam name="TItemCollection">The concrete type derived from <see cref="BsonVector{T}"/>.</typeparam>
/// <typeparam name="TItem">The .NET data type.</typeparam>
public sealed class BsonVectorSerializer<TItemCollection, TItem> : BsonVectorSerializerBase<TItemCollection, TItem>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we override Equals for the Serializer to make proper comparison of serializers? It might be important if Vestors could be used in LINQ statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss if it's safe to have comparison implemented in base class only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't override Equals if no private data was introduced.

public static byte[] WriteVectorData<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding)
where T : struct
{
var vectorDataBytes = MemoryMarshal.Cast<T, byte>(vectorData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we should ensure we are allowed to do that and throw in case wrong endian-platform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. Done.

{
internal static class BsonVectorWriter
{
public static byte[] WriteBsonVector<T>(BsonVector<T> bsonVector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The name of the method is a little confusing to me. If it's WriteXXX I expect to pass target into the method, like a stream of buffer where I want it to be written to. In this case, I think FormatXXX and ParseXXX would work better instead of Write and Read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done.

Copy link
Contributor Author

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add support for VectorSearch.
Test cluster setup is WIP.

src/MongoDB.Bson/ObjectModel/VectorDataType.cs Outdated Show resolved Hide resolved
src/MongoDB.Bson/ObjectModel/VectorDataType.cs Outdated Show resolved Hide resolved
{
var itemType = type.GetElementType();
var vectorSerializerType = typeof(BsonVectorArraySerializer<>).MakeGenericType(itemType);
var vectorSerializer = (IBsonSerializer)Activator.CreateInstance(vectorSerializerType, DataType);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible as BsonVectorArraySerializer and the rest are generic classes.

{
internal static class BsonVectorWriter
{
public static byte[] WriteBsonVector<T>(BsonVector<T> bsonVector)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done.

public static byte[] WriteVectorData<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding)
where T : struct
{
var vectorDataBytes = MemoryMarshal.Cast<T, byte>(vectorData);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. Done.

/// </summary>
/// <typeparam name="TItemCollection">The concrete type derived from <see cref="BsonVector{T}"/>.</typeparam>
/// <typeparam name="TItem">The .NET data type.</typeparam>
public sealed class BsonVectorSerializer<TItemCollection, TItem> : BsonVectorSerializerBase<TItemCollection, TItem>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

@BorisDog BorisDog requested a review from sanych-sun January 27, 2025 20:01
/// </summary>
/// <typeparam name="TItemCollection">The collection type.</typeparam>
/// <typeparam name="TItem">The .NET data type.</typeparam>
public abstract class BsonVectorSerializerBase<TItemCollection, TItem> : SerializerBase<TItemCollection>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move each serializer into it's own file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested classes are very simple just a few lines long. I find it more readable to keep all in the same file in this case. But we can discuss further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have often bundled very tightly related classes together in a single file. I'm fine with this.

/// </summary>
/// <typeparam name="TItemCollection">The concrete type derived from <see cref="BsonVector{T}"/>.</typeparam>
/// <typeparam name="TItem">The .NET data type.</typeparam>
public sealed class BsonVectorSerializer<TItemCollection, TItem> : BsonVectorSerializerBase<TItemCollection, TItem>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss if it's safe to have comparison implemented in base class only.

@BorisDog BorisDog requested review from rstam and sanych-sun January 28, 2025 22:03
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. But I have a lot of naming suggestions and a few refactoring suggestions. Let me know what you think.

/// <summary>
/// Represents a BSON vector.
/// </summary>
public abstract class BsonVectorBase<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better name for this class is BsonVector (no Base needed in the name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I've started with, but then I realized that there is some ambiguity with BsonVectorAttribute:
When trying to define the following poco without including using MongoDB.Bson.Serialization.Serializers; namespace, user unclear error without clear suggested resolution.

class POCO
{
     [BsonVector(...)]
     public int[] Vector { get; set; }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think [BsonVector(...)] should be [BsonVectorRepresentation(...)] which would eliminate this conflict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar class is BsonValue which is the abstract base class for multiple types of BSON values.

Here BsonVector is the base class for multiple types of BSON vectors.

Note that we did not name BsonValue as BsonValueBase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that without considering BsonVectorAttribute, BsonVector is more natural name.
But I still think it's preferable to reserve BsonVector for BsonVectorAttribute, as it's expected to appear in most of the use cases, while direct usages of BsonVectorBase are expected to be rare.

/// <summary>
/// Gets the vector.
/// </summary>
public ReadOnlyMemory<T> Vector { get; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec calls this property Data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

/// Represents the data type of BSON Vector.
/// </summary>
/// <seealso cref="BsonVectorBase{T}"/>
public enum BsonVectorDataType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename file to match enum name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.


namespace MongoDB.Bson.Serialization.Serializers
{
internal static class BsonVectorSerializerBase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this static class should be named just BsonVectorSerializer (no Base in the name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return (elements, padding, vectorDataType);
}

public static (ReadOnlyMemory<byte> VectorDataBytes, byte Padding, BsonVectorDataType VectorDataType) ReadBsonVectorAsBytes(ReadOnlyMemory<byte> vectorData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is As redundant in the name? Could be ReadBsonVectorBytes or ReadBsonVectorData.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorDataBytes could be shortened to Bytes if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

var vectorDataSpan = vectorData.Span;
var vectorDataType = (BsonVectorDataType)vectorDataSpan[0];

var paddingSizeBits = vectorDataSpan[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable could be named just padding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return (vectorData.Slice(2), paddingSizeBits, vectorDataType);
}

private static BsonVectorBase<T> CreateBsonVector<T>(T[] elements, byte padding, BsonVectorDataType vectorDataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a static factory method in the BsonVector<T> class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so: I didn't see a need for a such a factory outside this class. Also I think BsonVector itself should be just a data holder, decoupled from its factories.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename T to TItem for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{
internal static class BsonVectorWriter
{
public static byte[] BsonVectorToBytes<T>(BsonVectorBase<T> bsonVector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider naming this method WriteBytes or WriteToBytes.

The "from" is implied by the parameter types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return VectorDataToBytes(bsonVector.Vector.Span, bsonVector.DataType, padding);
}

public static byte[] VectorDataToBytes<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider naming WriteToBytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BorisDog BorisDog requested a review from rstam January 30, 2025 00:31
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I have so many requested naming changes.

But specially where things are entering the public API it is important to get the names right because we can't change them later.

/// <summary>
/// Represents a BSON vector.
/// </summary>
public abstract class BsonVectorBase<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think [BsonVector(...)] should be [BsonVectorRepresentation(...)] which would eliminate this conflict.

/// <summary>
/// Initializes a new instance of the BsonVector class.
/// </summary>
public BsonVectorBase(ReadOnlyMemory<T> data, BsonVectorDataType dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protected internal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think <T> should be <TItem> for consistency and as a hint of what T actually is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// Sets the representation for this field or property as BSON Vector and specifies the serialization options.
/// </summary>
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field)]
public sealed class BsonVectorAttribute : Attribute, IBsonMemberMapAttribute
Copy link
Contributor

@rstam rstam Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this attribute should be named BsonVectorRepresentationAttribute.

When we use:

[BsonVectorRepresentation(BsonVectorDataType.Int8)]
public byte[] A { get; set; }

what we are saying is that A is REPRESENTED as a BsonVector, not that it IS a BsonVector (it's not, it's a byte[]).

Other similarly named attributes that are used to control representation are BsonRepresentation and BsonGuidRepresentation.

Unfortunately there are some other attributes where we neglected to follow this naming principle. :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your argument that you wanted a short name for the attribute, but optimal names are not just about length, after all if that were true why not this:

[BV(BVDT.I8)]

I think having Representation in the attribute name makes it self explanatory that we are configuring the representation of A.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it's a BsonVector, just represented by byte[] :)

I think that within the given context, it's clear that [BsonVector(...)] refers to the bson representation of the property on the wire. I don't think that adding "Representation" actually adds any addition information. In fact I see BsonVectorAttribute closer to BsonSerializerAttribute.

BsonGuidRepresentation on the other hand switches between actual representation of the guid and BsonRepresentation switches between different bson representations. Which is not the case with BsonVector, which can be viewed as a private case of BsonRepresentation, like a hypothetical "BsonDoubleAttribute.

It even can argued that [BsonVectorRepresentation(BsonVectorDataType.Int8)] is more confusing. It might sound that this attribute configures the representation of the field when serialized as BsonVector type and leaves the question of the actual bson type open. For example user could expect to have:

[BsonVectorRepresentation(BsonVectorDataType.Int8)]
[BsonArrayRepresentation(BsonType.Binary)]
public byte[] A { get; set; }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it's a BsonVector, just represented by byte[] :)

Yes but... serializers have NO control over how things are represented in memory. That is totally defined by the C# data type that this is a serializer for (in this case byte[]). In the context of configuring serialization "representation" ALWAYS means how things are represented IN THE DATABASE.

In this case the in memory representation is byte[] and the database representation is BsonVector. The only thing that can be configured is the database representation.

I think the disconnect here is that I am looking at this from the perspective of "configuring the representation of the C# value in the database" and you are coming from the opposite direction. But as I said, in the case of serializer configuration it is always the database representation that is being configured.

Since I'm thinking about this in terms of "configuring serialization" which in turn means "configuring the database representation" that is why to me [BsonVector(...)] by itself makes no sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding:

[BsonVectorRepresentation(BsonVectorDataType.Int8)] 
[BsonArrayRepresentation(BsonType.Binary)] 
public byte[] A { get; set; }

the problem of a user applying contradictory attributes to the same member is not one we have solved.

Also, BsonArrayRepresentation doesn't exist but I guess you are using it as a hypothetical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to think about this some more...

What I want is a name that clearly implies "A is represented in the database as a BsonVector".

And not one that reads like "A is a BsonVector", because it is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the word "representation" is relevant to most of our bson attributes and config options. So it becomes redundant at some point. For example BsonTimeSpanOptions can be thought as BsonTimeSpanRepresentation, as it does configure how the BsonTime is represented.

I don't think having BsonVectorRepresentation is entirely wrong, just not very accurate. We are configuring the bson representation of the clr type, so the ideal name probably would be BsonRepresentation(vectorSubtype, vectorType, ...), and we are not configuring just the vector representation of the clr type.

I think BsonVector option is just more concise and easier to use. Especially given that bson vector is a new special type with a very specific use cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really have not been able to find my way to be comfortable with your choice of BsonVectorBase and [BsonVector(...)].

BsonVectorBase is just wrong, and you didn't choose it because you liked it, you chose it because the attribute name of [BsonVector(...)] created a partial conflict with it. I think we just need to rename the attribute instead of the BsonVector base class.

BsonVector "looks" like a type name just like BsonString (and it should be a type name instead of BsonVectorBase).

Using this:

[BsonVector(BsonVectorDataType.Int8)]
public byte[] V;

is analogous to the following invalid code:

[BsonString] // BsonString is analogous in this position to "BsonVector" in the previous example
public Enum E;

which "looks" like it says "E is a BsonString", when what we want to say is "E is represented as a BsonString", which we do like this:

[BsonRepresentation(BsonType.String)]
public Enum E;

I tried a thought experiment to how I would design configuring representing things as a BsonVector. The first thought I had that matches our existing attributes is:

[BsonRepresentation(BsonType.Binary, BsonSubType.Vector, BsonVectorDataType.Int8)]
public byte[] A;

which would be a new overload of BsonRepresentation. While that is accurate it's a bit redundant and could be shortened to by implying the binary subtype 9:

[BsonRepresentation(BsonVectorDataType.Int8)]
public byte[] A;

Since I thought you may not want to overload the existing BsonRepresentation I changed that to:

[BsonVectorRepresentation(BsonVectorDataType.Int8)]
public byte[] A;

Which is the same as your proposal except for the name:

[BsonVector(BsonVectorDataType.Int8)]
public byte[] A;

I have two big problems with [BsonVector(...)]:

  1. It reads like "A is a BsonVector" instead of "A is represented as a BsonVector"
  2. It creates the pseudo conflict with using BsonVector as a class name (hence your awkward BsonVectorBase class name)

I think my proposal:

[BsonVectorRepresentation(BsonVectorDataType.Int8)]
public byte[] A;

Is the best solution because it makes clear that we are configuring the representation of A to be a BsonVector. But I am open to other names as long as it's not BsonVector.

The concept I'm trying to capture is:

[RepresentedAsABsonVectorOf(BsonVectorDataType.Int8)]
public byte[] A;

or

[SerializedAsABsonVectorOf(BsonVectorDataType.Int8)]
public byte[] A;

so maybe we can start from there to find a new attribute name that is not [BsonVector].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't agree on a new name I would reluctantly LGTM this PR with strong objections to this name because everything else is looking good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a further proposal.

Instead of:

        [BsonVector(BsonVectorDataType.Int8)]
        public byte[] ValuesByte { get; set; }

we could do this:

        [SerializedAsInt8Vector]
        public byte[] ValuesByte { get; set; }

This is shorter, simpler, and includes the important semantic information that exactly what is happening here is that we are configuring the property to be "serialized as an Int8 vector".

Simlarly for [SerializedAsPackedBitVector] or [SerializedAsFloat32Vector]

/// </summary>
/// <typeparam name="TItemCollection">The collection type.</typeparam>
/// <typeparam name="TItem">The .NET data type.</typeparam>
internal abstract class BsonVectorSerializerBase<TItemCollection, TItem> : SerializerBase<TItemCollection>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by "collection" in the name TItemCollection because it implies that the type implements ICollection or at least IEnumerable but that's only true sometimes.

Can we use a more neutral term like "container" and name this TItemContainer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/// <summary>
/// Represents a serializer for BSON vector to/from a given collection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Represents a serializer for TItemContainer values represented as a BsonVector.

The part immediately after for should be what is being serializer, not how it is represented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}

public static void ValidateDataType<T>(BsonVectorDataType bsonVectorDataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename T to TItem for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

_ => throw new ArgumentOutOfRangeException(nameof(bsonVectorDataType), bsonVectorDataType, "Unsupported vector datatype.")
};

if (supportedType != typeof(T))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would read more logically swapping the sides:

if (typeof(TItem) != supportedType)

That's how you would say it in English.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look at it from a different perspective "What is the value of my variable", similar to if variable == constant.
Also typeof(T) is in a way constant here.

{
internal static class BsonVectorWriter
{
public static byte[] WriteToBytes<T>(BsonVectorBase<T> bsonVector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename T to TItem for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return WriteToBytes(bsonVector.Data.Span, bsonVector.DataType, padding);
}

public static byte[] WriteToBytes<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename T to TItem for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// <summary>
/// Represents a BSON vector.
/// </summary>
public abstract class BsonVectorBase<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar class is BsonValue which is the abstract base class for multiple types of BSON values.

Here BsonVector is the base class for multiple types of BSON vectors.

Note that we did not name BsonValue as BsonValueBase.

Copy link
Contributor Author

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review. I think it's much more polished now.

/// <summary>
/// Represents a BSON vector.
/// </summary>
public abstract class BsonVectorBase<T>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that without considering BsonVectorAttribute, BsonVector is more natural name.
But I still think it's preferable to reserve BsonVector for BsonVectorAttribute, as it's expected to appear in most of the use cases, while direct usages of BsonVectorBase are expected to be rare.

/// <summary>
/// Initializes a new instance of the BsonVector class.
/// </summary>
public BsonVectorBase(ReadOnlyMemory<T> data, BsonVectorDataType dataType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// Sets the representation for this field or property as BSON Vector and specifies the serialization options.
/// </summary>
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field)]
public sealed class BsonVectorAttribute : Attribute, IBsonMemberMapAttribute
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it's a BsonVector, just represented by byte[] :)

I think that within the given context, it's clear that [BsonVector(...)] refers to the bson representation of the property on the wire. I don't think that adding "Representation" actually adds any addition information. In fact I see BsonVectorAttribute closer to BsonSerializerAttribute.

BsonGuidRepresentation on the other hand switches between actual representation of the guid and BsonRepresentation switches between different bson representations. Which is not the case with BsonVector, which can be viewed as a private case of BsonRepresentation, like a hypothetical "BsonDoubleAttribute.

It even can argued that [BsonVectorRepresentation(BsonVectorDataType.Int8)] is more confusing. It might sound that this attribute configures the representation of the field when serialized as BsonVector type and leaves the question of the actual bson type open. For example user could expect to have:

[BsonVectorRepresentation(BsonVectorDataType.Int8)]
[BsonArrayRepresentation(BsonType.Binary)]
public byte[] A { get; set; }

/// Initializes a new instance of the <see cref="BsonVectorSerializerBase{TItemCollection, TItem}"/> class.
/// </summary>
/// <param name="bsonVectorDataType">Type of the bson vector data.</param>
public BsonVectorSerializerBase(BsonVectorDataType bsonVectorDataType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

context.Writer.WriteBinaryData(binaryData);
}

private static BsonVectorDataType GetDataType() =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find, it's a leftover, thanks!

}

/// <inheritdoc/>
public override sealed void Serialize(BsonSerializationContext context, BsonSerializationArgs args, TItemCollection bsonVector)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// <summary>
/// Initializes a new instance of the <see cref="BsonVectorToCollectionSerializer{TItemCollection, TItem}" /> class.
/// </summary>
public BsonVectorToCollectionSerializer(BsonVectorDataType bsonVectorDataType) :
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/// <inheritdoc/>
public override sealed TItemCollection Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{
internal static class BsonVectorWriter
{
public static byte[] WriteToBytes<T>(BsonVectorBase<T> bsonVector)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return WriteToBytes(bsonVector.Data.Span, bsonVector.DataType, padding);
}

public static byte[] WriteToBytes<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BorisDog BorisDog requested a review from rstam January 31, 2025 01:06
/// Sets the representation for this field or property as BSON Vector and specifies the serialization options.
/// </summary>
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field)]
public sealed class BsonVectorAttribute : Attribute, IBsonMemberMapAttribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really have not been able to find my way to be comfortable with your choice of BsonVectorBase and [BsonVector(...)].

BsonVectorBase is just wrong, and you didn't choose it because you liked it, you chose it because the attribute name of [BsonVector(...)] created a partial conflict with it. I think we just need to rename the attribute instead of the BsonVector base class.

BsonVector "looks" like a type name just like BsonString (and it should be a type name instead of BsonVectorBase).

Using this:

[BsonVector(BsonVectorDataType.Int8)]
public byte[] V;

is analogous to the following invalid code:

[BsonString] // BsonString is analogous in this position to "BsonVector" in the previous example
public Enum E;

which "looks" like it says "E is a BsonString", when what we want to say is "E is represented as a BsonString", which we do like this:

[BsonRepresentation(BsonType.String)]
public Enum E;

I tried a thought experiment to how I would design configuring representing things as a BsonVector. The first thought I had that matches our existing attributes is:

[BsonRepresentation(BsonType.Binary, BsonSubType.Vector, BsonVectorDataType.Int8)]
public byte[] A;

which would be a new overload of BsonRepresentation. While that is accurate it's a bit redundant and could be shortened to by implying the binary subtype 9:

[BsonRepresentation(BsonVectorDataType.Int8)]
public byte[] A;

Since I thought you may not want to overload the existing BsonRepresentation I changed that to:

[BsonVectorRepresentation(BsonVectorDataType.Int8)]
public byte[] A;

Which is the same as your proposal except for the name:

[BsonVector(BsonVectorDataType.Int8)]
public byte[] A;

I have two big problems with [BsonVector(...)]:

  1. It reads like "A is a BsonVector" instead of "A is represented as a BsonVector"
  2. It creates the pseudo conflict with using BsonVector as a class name (hence your awkward BsonVectorBase class name)

I think my proposal:

[BsonVectorRepresentation(BsonVectorDataType.Int8)]
public byte[] A;

Is the best solution because it makes clear that we are configuring the representation of A to be a BsonVector. But I am open to other names as long as it's not BsonVector.

The concept I'm trying to capture is:

[RepresentedAsABsonVectorOf(BsonVectorDataType.Int8)]
public byte[] A;

or

[SerializedAsABsonVectorOf(BsonVectorDataType.Int8)]
public byte[] A;

so maybe we can start from there to find a new attribute name that is not [BsonVector].

/// Sets the representation for this field or property as BSON Vector and specifies the serialization options.
/// </summary>
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field)]
public sealed class BsonVectorAttribute : Attribute, IBsonMemberMapAttribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't agree on a new name I would reluctantly LGTM this PR with strong objections to this name because everything else is looking good to me.

where TItem : struct
{
/// <summary>
/// Initializes a new instance of the BsonVector class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    /// Initializes a new instance of the BsonVectorBase class.

Although I wish the summary as it stood was correct!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

/// <summary>
/// Initializes a new instance of the BsonVector class.
/// </summary>
protected BsonVectorBase(ReadOnlyMemory<TItem> data, BsonVectorDataType dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be internal to prevent users from attempting to create new subclasses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done.

/// </summary>
public BsonVectorPackedBit(ReadOnlyMemory<byte> data, byte padding) : base(data, BsonVectorDataType.PackedBit)
{
if (padding < 0 || padding > 7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rider complains that padding can never be less than 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's time to try Rider, done.

}

/// <summary>
/// Represents a serializer for BSON vector to/from <see cref="Memory{TItem}"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Represents a serializer for <see cref="Memory{TItem}"/> represented as a BsonVector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private protected override ReadOnlySpan<TItem> GetItemsContainerSpan(TItem[] data) =>
data;

private protected override TItem[] CreateResult(TItem[] elements) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    private protected override TItem[] CreateResult(TItem[] items) => items;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private protected override ReadOnlySpan<TItem> GetItemsContainerSpan(Memory<TItem> data) =>
data.Span;

private protected override Memory<TItem> CreateResult(TItem[] elements) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    private protected override Memory<TItem> CreateResult(TItem[] items) =>
        new(items);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/// <summary>
/// Represents a serializer for <see cref="ReadOnlyMemory{TItem}"/>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Represents a serializer for <see cref="ReadOnlyMemory{TItem}"/> represented as a BsonVector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

data.Span;

private protected override ReadOnlyMemory<TItem> CreateResult(TItem[] elements) =>
new(elements);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    private protected override ReadOnlyMemory<TItem> CreateResult(TItem[] items) =>
        new(items);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BorisDog BorisDog requested a review from rstam February 1, 2025 00:27
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few typos remain.

/// </summary>
/// <param name="data">The vector data.</param>
/// <param name="dataType">Type of the vector data.</param>
internal protected BsonVectorBase(ReadOnlyMemory<TItem> data, BsonVectorDataType dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protected internal was my first thought also, but I looked it up and it's not what I first thought it was.

It should probably just be internal.

/// <summary>
/// Represents the data type of a BSON Vector.
/// </summary>
/// <seealso cref="BsonVectorBase{T}"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// <seealso cref="BsonVectorBase{TItem}"/>

namespace MongoDB.Bson.Serialization.Attributes
{
/// <summary>
/// Sets the representation for this field or property as a BSON Vector with the specified data type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Sets the representation for this field or property as a BSON Vector with the specified data type.

Add as for grammar.

/// <summary>
/// The float32
/// </summary>
Float32 = 0x27,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comments should end in a period.

/// <summary>
/// The int8
/// </summary>
Int8 = 0x03,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comments should end in a period.

/// </summary>
/// <param name="binaryData">The binary data.</param>
/// <returns>Vector data, padding and datatype</returns>
public static (TItem[] Items, byte Padding, BsonVectorDataType vectorDataType) ToBsonVectorAsArray<TItem>(this BsonBinaryData binaryData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't vectorDataType start with an upper case V?

return (items, padding, vectorDataType);
}

public static (ReadOnlyMemory<byte> Byte, byte Padding, BsonVectorDataType VectorDataType) ReadBsonVectorAsBytes(ReadOnlyMemory<byte> vectorData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't Byte be `Bytes``? (plural)

switch (vectorDataType)
{
case BsonVectorDataType.Float32:
return new BsonVectorFloat32(AsTypedArrayOrThrow<float>()) as BsonVectorBase<TItem>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation in these three cases.

_ => throw new ArgumentException(nameof(bsonVectorDataType), "Unsupported vector datatype.")
};

if (expectedItemType != typeof(TItem))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        if (typeof(TItem) != expectedItemType)

This makes more sense if it reads more like English: "if the item type is not equal to the expected item type"

_ => throw new ArgumentException(nameof(bsonVectorDataType), "Unsupported vector datatype.")
};

if (supportedType != typeof(TItem))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in with variables on the left and constants on the right.

But two things are different here:

  1. typeof(TItem) is not a constant, it's an expression involving the type variable TItem
  2. we also like to standardize on if (actualXyz != expectedXyz)

Technically typeof(TItem) will become a constant at runtime but ONLY after the generic method is Jitted for a particular TItem. But for us as programmers TItem is a type variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants