-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSHARP-5202: BSON Binary Vector Subtype Support #1581
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it makes sense, I just got some notes/questions
src/MongoDB.Bson/Serialization/Serializers/BsonVectorSerializer.cs
Outdated
Show resolved
Hide resolved
src/MongoDB.Bson/Serialization/Serializers/BsonVectorSerializer.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Ferdinando Papale <[email protected]>
{ | ||
var itemType = type.GetElementType(); | ||
var vectorSerializerType = typeof(BsonVectorArraySerializer<>).MakeGenericType(itemType); | ||
var vectorSerializer = (IBsonSerializer)Activator.CreateInstance(vectorSerializerType, DataType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to move this into BsonVectorArraySerializer.Create(Type itemType)
static factory method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and more similar below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not possible as BsonVectorArraySerializer
and the rest are generic classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we can create non-generic static class just to have this factory methods. We are doing it in other places. (far example, NullableSerializer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// </summary> | ||
/// <typeparam name="TItemCollection">The concrete type derived from <see cref="BsonVector{T}"/>.</typeparam> | ||
/// <typeparam name="TItem">The .NET data type.</typeparam> | ||
public sealed class BsonVectorSerializer<TItemCollection, TItem> : BsonVectorSerializerBase<TItemCollection, TItem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we override Equals for the Serializer to make proper comparison of serializers? It might be important if Vestors could be used in LINQ statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss if it's safe to have comparison implemented in base class only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't override Equals
if no private data was introduced.
public static byte[] WriteVectorData<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding) | ||
where T : struct | ||
{ | ||
var vectorDataBytes = MemoryMarshal.Cast<T, byte>(vectorData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we should ensure we are allowed to do that and throw in case wrong endian-platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. Done.
{ | ||
internal static class BsonVectorWriter | ||
{ | ||
public static byte[] WriteBsonVector<T>(BsonVector<T> bsonVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: The name of the method is a little confusing to me. If it's WriteXXX I expect to pass target into the method, like a stream of buffer where I want it to be written to. In this case, I think FormatXXX and ParseXXX would work better instead of Write and Read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
- PR comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add support for VectorSearch.
Test cluster setup is WIP.
{ | ||
var itemType = type.GetElementType(); | ||
var vectorSerializerType = typeof(BsonVectorArraySerializer<>).MakeGenericType(itemType); | ||
var vectorSerializer = (IBsonSerializer)Activator.CreateInstance(vectorSerializerType, DataType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not possible as BsonVectorArraySerializer
and the rest are generic classes.
{ | ||
internal static class BsonVectorWriter | ||
{ | ||
public static byte[] WriteBsonVector<T>(BsonVector<T> bsonVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
public static byte[] WriteVectorData<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding) | ||
where T : struct | ||
{ | ||
var vectorDataBytes = MemoryMarshal.Cast<T, byte>(vectorData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. Done.
/// </summary> | ||
/// <typeparam name="TItemCollection">The concrete type derived from <see cref="BsonVector{T}"/>.</typeparam> | ||
/// <typeparam name="TItem">The .NET data type.</typeparam> | ||
public sealed class BsonVectorSerializer<TItemCollection, TItem> : BsonVectorSerializerBase<TItemCollection, TItem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
/// </summary> | ||
/// <typeparam name="TItemCollection">The collection type.</typeparam> | ||
/// <typeparam name="TItem">The .NET data type.</typeparam> | ||
public abstract class BsonVectorSerializerBase<TItemCollection, TItem> : SerializerBase<TItemCollection> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move each serializer into it's own file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested classes are very simple just a few lines long. I find it more readable to keep all in the same file in this case. But we can discuss further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have often bundled very tightly related classes together in a single file. I'm fine with this.
/// </summary> | ||
/// <typeparam name="TItemCollection">The concrete type derived from <see cref="BsonVector{T}"/>.</typeparam> | ||
/// <typeparam name="TItem">The .NET data type.</typeparam> | ||
public sealed class BsonVectorSerializer<TItemCollection, TItem> : BsonVectorSerializerBase<TItemCollection, TItem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss if it's safe to have comparison implemented in base class only.
- Rebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. But I have a lot of naming suggestions and a few refactoring suggestions. Let me know what you think.
/// <summary> | ||
/// Represents a BSON vector. | ||
/// </summary> | ||
public abstract class BsonVectorBase<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better name for this class is BsonVector
(no Base
needed in the name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's what I've started with, but then I realized that there is some ambiguity with BsonVectorAttribute
:
When trying to define the following poco without including using MongoDB.Bson.Serialization.Serializers; namespace, user unclear error without clear suggested resolution.
class POCO
{
[BsonVector(...)]
public int[] Vector { get; set; }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think [BsonVector(...)]
should be [BsonVectorRepresentation(...)]
which would eliminate this conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar class is BsonValue
which is the abstract base class for multiple types of BSON values.
Here BsonVector
is the base class for multiple types of BSON vectors.
Note that we did not name BsonValue
as BsonValueBase
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that without considering BsonVectorAttribute
, BsonVector
is more natural name.
But I still think it's preferable to reserve BsonVector
for BsonVectorAttribute
, as it's expected to appear in most of the use cases, while direct usages of BsonVectorBase
are expected to be rare.
/// <summary> | ||
/// Gets the vector. | ||
/// </summary> | ||
public ReadOnlyMemory<T> Vector { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec calls this property Data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
/// Represents the data type of BSON Vector. | ||
/// </summary> | ||
/// <seealso cref="BsonVectorBase{T}"/> | ||
public enum BsonVectorDataType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename file to match enum name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
|
||
namespace MongoDB.Bson.Serialization.Serializers | ||
{ | ||
internal static class BsonVectorSerializerBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this static class should be named just BsonVectorSerializer
(no Base
in the name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return (elements, padding, vectorDataType); | ||
} | ||
|
||
public static (ReadOnlyMemory<byte> VectorDataBytes, byte Padding, BsonVectorDataType VectorDataType) ReadBsonVectorAsBytes(ReadOnlyMemory<byte> vectorData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is As
redundant in the name? Could be ReadBsonVectorBytes
or ReadBsonVectorData
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VectorDataBytes
could be shortened to Bytes
if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
var vectorDataSpan = vectorData.Span; | ||
var vectorDataType = (BsonVectorDataType)vectorDataSpan[0]; | ||
|
||
var paddingSizeBits = vectorDataSpan[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable could be named just padding
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return (vectorData.Slice(2), paddingSizeBits, vectorDataType); | ||
} | ||
|
||
private static BsonVectorBase<T> CreateBsonVector<T>(T[] elements, byte padding, BsonVectorDataType vectorDataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a static factory method in the BsonVector<T>
class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so: I didn't see a need for a such a factory outside this class. Also I think BsonVector itself should be just a data holder, decoupled from its factories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename T
to TItem
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
{ | ||
internal static class BsonVectorWriter | ||
{ | ||
public static byte[] BsonVectorToBytes<T>(BsonVectorBase<T> bsonVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider naming this method WriteBytes
or WriteToBytes
.
The "from" is implied by the parameter types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return VectorDataToBytes(bsonVector.Vector.Span, bsonVector.DataType, padding); | ||
} | ||
|
||
public static byte[] VectorDataToBytes<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider naming WriteToBytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I have so many requested naming changes.
But specially where things are entering the public API it is important to get the names right because we can't change them later.
/// <summary> | ||
/// Represents a BSON vector. | ||
/// </summary> | ||
public abstract class BsonVectorBase<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think [BsonVector(...)]
should be [BsonVectorRepresentation(...)]
which would eliminate this conflict.
/// <summary> | ||
/// Initializes a new instance of the BsonVector class. | ||
/// </summary> | ||
public BsonVectorBase(ReadOnlyMemory<T> data, BsonVectorDataType dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected internal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think <T>
should be <TItem>
for consistency and as a hint of what T
actually is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// Sets the representation for this field or property as BSON Vector and specifies the serialization options. | ||
/// </summary> | ||
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field)] | ||
public sealed class BsonVectorAttribute : Attribute, IBsonMemberMapAttribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this attribute should be named BsonVectorRepresentationAttribute
.
When we use:
[BsonVectorRepresentation(BsonVectorDataType.Int8)]
public byte[] A { get; set; }
what we are saying is that A
is REPRESENTED as a BsonVector
, not that it IS a BsonVector
(it's not, it's a byte[]
).
Other similarly named attributes that are used to control representation are BsonRepresentation
and BsonGuidRepresentation
.
Unfortunately there are some other attributes where we neglected to follow this naming principle. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your argument that you wanted a short name for the attribute, but optimal names are not just about length, after all if that were true why not this:
[BV(BVDT.I8)]
I think having Representation
in the attribute name makes it self explanatory that we are configuring the representation of A
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it's a BsonVector
, just represented by byte[]
:)
I think that within the given context, it's clear that [BsonVector(...)]
refers to the bson representation of the property on the wire. I don't think that adding "Representation" actually adds any addition information. In fact I see BsonVectorAttribute
closer to BsonSerializerAttribute
.
BsonGuidRepresentation
on the other hand switches between actual representation of the guid and BsonRepresentation
switches between different bson representations. Which is not the case with BsonVector
, which can be viewed as a private case of BsonRepresentation
, like a hypothetical "BsonDoubleAttribute
.
It even can argued that [BsonVectorRepresentation(BsonVectorDataType.Int8)]
is more confusing. It might sound that this attribute configures the representation of the field when serialized as BsonVector type and leaves the question of the actual bson type open. For example user could expect to have:
[BsonVectorRepresentation(BsonVectorDataType.Int8)]
[BsonArrayRepresentation(BsonType.Binary)]
public byte[] A { get; set; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it's a BsonVector, just represented by byte[] :)
Yes but... serializers have NO control over how things are represented in memory. That is totally defined by the C# data type that this is a serializer for (in this case byte[]
). In the context of configuring serialization "representation" ALWAYS means how things are represented IN THE DATABASE.
In this case the in memory representation is byte[]
and the database representation is BsonVector
. The only thing that can be configured is the database representation.
I think the disconnect here is that I am looking at this from the perspective of "configuring the representation of the C# value in the database" and you are coming from the opposite direction. But as I said, in the case of serializer configuration it is always the database representation that is being configured.
Since I'm thinking about this in terms of "configuring serialization" which in turn means "configuring the database representation" that is why to me [BsonVector(...)]
by itself makes no sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding:
[BsonVectorRepresentation(BsonVectorDataType.Int8)]
[BsonArrayRepresentation(BsonType.Binary)]
public byte[] A { get; set; }
the problem of a user applying contradictory attributes to the same member is not one we have solved.
Also, BsonArrayRepresentation
doesn't exist but I guess you are using it as a hypothetical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to think about this some more...
What I want is a name that clearly implies "A is represented in the database as a BsonVector".
And not one that reads like "A is a BsonVector", because it is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the word "representation" is relevant to most of our bson attributes and config options. So it becomes redundant at some point. For example BsonTimeSpanOptions
can be thought as BsonTimeSpanRepresentation
, as it does configure how the BsonTime is represented.
I don't think having BsonVectorRepresentation
is entirely wrong, just not very accurate. We are configuring the bson representation of the clr type, so the ideal name probably would be BsonRepresentation(vectorSubtype, vectorType, ...), and we are not configuring just the vector representation of the clr type.
I think BsonVector option is just more concise and easier to use. Especially given that bson vector is a new special type with a very specific use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really have not been able to find my way to be comfortable with your choice of BsonVectorBase
and [BsonVector(...)]
.
BsonVectorBase
is just wrong, and you didn't choose it because you liked it, you chose it because the attribute name of [BsonVector(...)]
created a partial conflict with it. I think we just need to rename the attribute instead of the BsonVector
base class.
BsonVector
"looks" like a type name just like BsonString
(and it should be a type name instead of BsonVectorBase
).
Using this:
[BsonVector(BsonVectorDataType.Int8)]
public byte[] V;
is analogous to the following invalid code:
[BsonString] // BsonString is analogous in this position to "BsonVector" in the previous example
public Enum E;
which "looks" like it says "E is a BsonString", when what we want to say is "E is represented as a BsonString", which we do like this:
[BsonRepresentation(BsonType.String)]
public Enum E;
I tried a thought experiment to how I would design configuring representing things as a BsonVector. The first thought I had that matches our existing attributes is:
[BsonRepresentation(BsonType.Binary, BsonSubType.Vector, BsonVectorDataType.Int8)]
public byte[] A;
which would be a new overload of BsonRepresentation. While that is accurate it's a bit redundant and could be shortened to by implying the binary subtype 9:
[BsonRepresentation(BsonVectorDataType.Int8)]
public byte[] A;
Since I thought you may not want to overload the existing BsonRepresentation
I changed that to:
[BsonVectorRepresentation(BsonVectorDataType.Int8)]
public byte[] A;
Which is the same as your proposal except for the name:
[BsonVector(BsonVectorDataType.Int8)]
public byte[] A;
I have two big problems with [BsonVector(...)]
:
- It reads like "A is a BsonVector" instead of "A is represented as a BsonVector"
- It creates the pseudo conflict with using
BsonVector
as a class name (hence your awkwardBsonVectorBase
class name)
I think my proposal:
[BsonVectorRepresentation(BsonVectorDataType.Int8)]
public byte[] A;
Is the best solution because it makes clear that we are configuring the representation of A to be a BsonVector. But I am open to other names as long as it's not BsonVector
.
The concept I'm trying to capture is:
[RepresentedAsABsonVectorOf(BsonVectorDataType.Int8)]
public byte[] A;
or
[SerializedAsABsonVectorOf(BsonVectorDataType.Int8)]
public byte[] A;
so maybe we can start from there to find a new attribute name that is not [BsonVector]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't agree on a new name I would reluctantly LGTM this PR with strong objections to this name because everything else is looking good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a further proposal.
Instead of:
[BsonVector(BsonVectorDataType.Int8)]
public byte[] ValuesByte { get; set; }
we could do this:
[SerializedAsInt8Vector]
public byte[] ValuesByte { get; set; }
This is shorter, simpler, and includes the important semantic information that exactly what is happening here is that we are configuring the property to be "serialized as an Int8 vector".
Simlarly for [SerializedAsPackedBitVector]
or [SerializedAsFloat32Vector]
/// </summary> | ||
/// <typeparam name="TItemCollection">The collection type.</typeparam> | ||
/// <typeparam name="TItem">The .NET data type.</typeparam> | ||
internal abstract class BsonVectorSerializerBase<TItemCollection, TItem> : SerializerBase<TItemCollection> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by "collection" in the name TItemCollection
because it implies that the type implements ICollection
or at least IEnumerable
but that's only true sometimes.
Can we use a more neutral term like "container" and name this TItemContainer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
/// <summary> | ||
/// Represents a serializer for BSON vector to/from a given collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Represents a serializer for TItemContainer values represented as a BsonVector.
The part immediately after for should be what is being serializer, not how it is represented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
} | ||
|
||
public static void ValidateDataType<T>(BsonVectorDataType bsonVectorDataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename T
to TItem
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
_ => throw new ArgumentOutOfRangeException(nameof(bsonVectorDataType), bsonVectorDataType, "Unsupported vector datatype.") | ||
}; | ||
|
||
if (supportedType != typeof(T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would read more logically swapping the sides:
if (typeof(TItem) != supportedType)
That's how you would say it in English.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I look at it from a different perspective "What is the value of my variable", similar to if variable == constant
.
Also typeof(T)
is in a way constant here.
{ | ||
internal static class BsonVectorWriter | ||
{ | ||
public static byte[] WriteToBytes<T>(BsonVectorBase<T> bsonVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename T
to TItem
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return WriteToBytes(bsonVector.Data.Span, bsonVector.DataType, padding); | ||
} | ||
|
||
public static byte[] WriteToBytes<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename T
to TItem
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// <summary> | ||
/// Represents a BSON vector. | ||
/// </summary> | ||
public abstract class BsonVectorBase<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar class is BsonValue
which is the abstract base class for multiple types of BSON values.
Here BsonVector
is the base class for multiple types of BSON vectors.
Note that we did not name BsonValue
as BsonValueBase
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough review. I think it's much more polished now.
/// <summary> | ||
/// Represents a BSON vector. | ||
/// </summary> | ||
public abstract class BsonVectorBase<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that without considering BsonVectorAttribute
, BsonVector
is more natural name.
But I still think it's preferable to reserve BsonVector
for BsonVectorAttribute
, as it's expected to appear in most of the use cases, while direct usages of BsonVectorBase
are expected to be rare.
/// <summary> | ||
/// Initializes a new instance of the BsonVector class. | ||
/// </summary> | ||
public BsonVectorBase(ReadOnlyMemory<T> data, BsonVectorDataType dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// Sets the representation for this field or property as BSON Vector and specifies the serialization options. | ||
/// </summary> | ||
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field)] | ||
public sealed class BsonVectorAttribute : Attribute, IBsonMemberMapAttribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it's a BsonVector
, just represented by byte[]
:)
I think that within the given context, it's clear that [BsonVector(...)]
refers to the bson representation of the property on the wire. I don't think that adding "Representation" actually adds any addition information. In fact I see BsonVectorAttribute
closer to BsonSerializerAttribute
.
BsonGuidRepresentation
on the other hand switches between actual representation of the guid and BsonRepresentation
switches between different bson representations. Which is not the case with BsonVector
, which can be viewed as a private case of BsonRepresentation
, like a hypothetical "BsonDoubleAttribute
.
It even can argued that [BsonVectorRepresentation(BsonVectorDataType.Int8)]
is more confusing. It might sound that this attribute configures the representation of the field when serialized as BsonVector type and leaves the question of the actual bson type open. For example user could expect to have:
[BsonVectorRepresentation(BsonVectorDataType.Int8)]
[BsonArrayRepresentation(BsonType.Binary)]
public byte[] A { get; set; }
/// Initializes a new instance of the <see cref="BsonVectorSerializerBase{TItemCollection, TItem}"/> class. | ||
/// </summary> | ||
/// <param name="bsonVectorDataType">Type of the bson vector data.</param> | ||
public BsonVectorSerializerBase(BsonVectorDataType bsonVectorDataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
context.Writer.WriteBinaryData(binaryData); | ||
} | ||
|
||
private static BsonVectorDataType GetDataType() => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find, it's a leftover, thanks!
} | ||
|
||
/// <inheritdoc/> | ||
public override sealed void Serialize(BsonSerializationContext context, BsonSerializationArgs args, TItemCollection bsonVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// <summary> | ||
/// Initializes a new instance of the <see cref="BsonVectorToCollectionSerializer{TItemCollection, TItem}" /> class. | ||
/// </summary> | ||
public BsonVectorToCollectionSerializer(BsonVectorDataType bsonVectorDataType) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
/// <inheritdoc/> | ||
public override sealed TItemCollection Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
{ | ||
internal static class BsonVectorWriter | ||
{ | ||
public static byte[] WriteToBytes<T>(BsonVectorBase<T> bsonVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return WriteToBytes(bsonVector.Data.Span, bsonVector.DataType, padding); | ||
} | ||
|
||
public static byte[] WriteToBytes<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// Sets the representation for this field or property as BSON Vector and specifies the serialization options. | ||
/// </summary> | ||
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field)] | ||
public sealed class BsonVectorAttribute : Attribute, IBsonMemberMapAttribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really have not been able to find my way to be comfortable with your choice of BsonVectorBase
and [BsonVector(...)]
.
BsonVectorBase
is just wrong, and you didn't choose it because you liked it, you chose it because the attribute name of [BsonVector(...)]
created a partial conflict with it. I think we just need to rename the attribute instead of the BsonVector
base class.
BsonVector
"looks" like a type name just like BsonString
(and it should be a type name instead of BsonVectorBase
).
Using this:
[BsonVector(BsonVectorDataType.Int8)]
public byte[] V;
is analogous to the following invalid code:
[BsonString] // BsonString is analogous in this position to "BsonVector" in the previous example
public Enum E;
which "looks" like it says "E is a BsonString", when what we want to say is "E is represented as a BsonString", which we do like this:
[BsonRepresentation(BsonType.String)]
public Enum E;
I tried a thought experiment to how I would design configuring representing things as a BsonVector. The first thought I had that matches our existing attributes is:
[BsonRepresentation(BsonType.Binary, BsonSubType.Vector, BsonVectorDataType.Int8)]
public byte[] A;
which would be a new overload of BsonRepresentation. While that is accurate it's a bit redundant and could be shortened to by implying the binary subtype 9:
[BsonRepresentation(BsonVectorDataType.Int8)]
public byte[] A;
Since I thought you may not want to overload the existing BsonRepresentation
I changed that to:
[BsonVectorRepresentation(BsonVectorDataType.Int8)]
public byte[] A;
Which is the same as your proposal except for the name:
[BsonVector(BsonVectorDataType.Int8)]
public byte[] A;
I have two big problems with [BsonVector(...)]
:
- It reads like "A is a BsonVector" instead of "A is represented as a BsonVector"
- It creates the pseudo conflict with using
BsonVector
as a class name (hence your awkwardBsonVectorBase
class name)
I think my proposal:
[BsonVectorRepresentation(BsonVectorDataType.Int8)]
public byte[] A;
Is the best solution because it makes clear that we are configuring the representation of A to be a BsonVector. But I am open to other names as long as it's not BsonVector
.
The concept I'm trying to capture is:
[RepresentedAsABsonVectorOf(BsonVectorDataType.Int8)]
public byte[] A;
or
[SerializedAsABsonVectorOf(BsonVectorDataType.Int8)]
public byte[] A;
so maybe we can start from there to find a new attribute name that is not [BsonVector]
.
/// Sets the representation for this field or property as BSON Vector and specifies the serialization options. | ||
/// </summary> | ||
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field)] | ||
public sealed class BsonVectorAttribute : Attribute, IBsonMemberMapAttribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't agree on a new name I would reluctantly LGTM this PR with strong objections to this name because everything else is looking good to me.
where TItem : struct | ||
{ | ||
/// <summary> | ||
/// Initializes a new instance of the BsonVector class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Initializes a new instance of the BsonVectorBase class.
Although I wish the summary as it stood was correct!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
/// <summary> | ||
/// Initializes a new instance of the BsonVector class. | ||
/// </summary> | ||
protected BsonVectorBase(ReadOnlyMemory<TItem> data, BsonVectorDataType dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be internal
to prevent users from attempting to create new subclasses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
/// </summary> | ||
public BsonVectorPackedBit(ReadOnlyMemory<byte> data, byte padding) : base(data, BsonVectorDataType.PackedBit) | ||
{ | ||
if (padding < 0 || padding > 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rider complains that padding can never be less than 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's time to try Rider, done.
} | ||
|
||
/// <summary> | ||
/// Represents a serializer for BSON vector to/from <see cref="Memory{TItem}"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Represents a serializer for <see cref="Memory{TItem}"/> represented as a BsonVector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
private protected override ReadOnlySpan<TItem> GetItemsContainerSpan(TItem[] data) => | ||
data; | ||
|
||
private protected override TItem[] CreateResult(TItem[] elements) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private protected override TItem[] CreateResult(TItem[] items) => items;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
private protected override ReadOnlySpan<TItem> GetItemsContainerSpan(Memory<TItem> data) => | ||
data.Span; | ||
|
||
private protected override Memory<TItem> CreateResult(TItem[] elements) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private protected override Memory<TItem> CreateResult(TItem[] items) =>
new(items);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
/// <summary> | ||
/// Represents a serializer for <see cref="ReadOnlyMemory{TItem}"/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Represents a serializer for <see cref="ReadOnlyMemory{TItem}"/> represented as a BsonVector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
data.Span; | ||
|
||
private protected override ReadOnlyMemory<TItem> CreateResult(TItem[] elements) => | ||
new(elements); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private protected override ReadOnlyMemory<TItem> CreateResult(TItem[] items) =>
new(items);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few typos remain.
/// </summary> | ||
/// <param name="data">The vector data.</param> | ||
/// <param name="dataType">Type of the vector data.</param> | ||
internal protected BsonVectorBase(ReadOnlyMemory<TItem> data, BsonVectorDataType dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected internal
was my first thought also, but I looked it up and it's not what I first thought it was.
It should probably just be internal
.
/// <summary> | ||
/// Represents the data type of a BSON Vector. | ||
/// </summary> | ||
/// <seealso cref="BsonVectorBase{T}"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <seealso cref="BsonVectorBase{TItem}"/>
namespace MongoDB.Bson.Serialization.Attributes | ||
{ | ||
/// <summary> | ||
/// Sets the representation for this field or property as a BSON Vector with the specified data type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Sets the representation for this field or property as a BSON Vector with the specified data type.
Add as
for grammar.
/// <summary> | ||
/// The float32 | ||
/// </summary> | ||
Float32 = 0x27, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comments should end in a period.
/// <summary> | ||
/// The int8 | ||
/// </summary> | ||
Int8 = 0x03, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comments should end in a period.
/// </summary> | ||
/// <param name="binaryData">The binary data.</param> | ||
/// <returns>Vector data, padding and datatype</returns> | ||
public static (TItem[] Items, byte Padding, BsonVectorDataType vectorDataType) ToBsonVectorAsArray<TItem>(this BsonBinaryData binaryData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't vectorDataType
start with an upper case V
?
return (items, padding, vectorDataType); | ||
} | ||
|
||
public static (ReadOnlyMemory<byte> Byte, byte Padding, BsonVectorDataType VectorDataType) ReadBsonVectorAsBytes(ReadOnlyMemory<byte> vectorData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't Byte
be `Bytes``? (plural)
switch (vectorDataType) | ||
{ | ||
case BsonVectorDataType.Float32: | ||
return new BsonVectorFloat32(AsTypedArrayOrThrow<float>()) as BsonVectorBase<TItem>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix indentation in these three cases.
_ => throw new ArgumentException(nameof(bsonVectorDataType), "Unsupported vector datatype.") | ||
}; | ||
|
||
if (expectedItemType != typeof(TItem)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (typeof(TItem) != expectedItemType)
This makes more sense if it reads more like English: "if the item type is not equal to the expected item type"
_ => throw new ArgumentException(nameof(bsonVectorDataType), "Unsupported vector datatype.") | ||
}; | ||
|
||
if (supportedType != typeof(TItem)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in with variables on the left and constants on the right.
But two things are different here:
- typeof(TItem) is not a constant, it's an expression involving the type variable
TItem
- we also like to standardize on
if (actualXyz != expectedXyz)
Technically typeof(TItem)
will become a constant at runtime but ONLY after the generic method is Jitted for a particular TItem
. But for us as programmers TItem
is a type variable.
No description provided.