-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BaseUnit generation for the prefixed units #1485
Conversation
- implemented in the QuantityJsonFilesParser (where the AddPrefixUnits occurs) - removed the skipped tests (with the "The BaseUnits are not yet supported by the prefix-generator" reason)
private static IEnumerable<KeyValuePair<string, Prefix[]>> GetAmountOfSubstancePrefixes() | ||
{ | ||
yield return new KeyValuePair<string, Prefix[]>("Mole", | ||
[Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Kilo, Prefix.Mega]); | ||
yield return new KeyValuePair<string, Prefix[]>("PoundMole", [Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Kilo]); | ||
} | ||
|
||
private static IEnumerable<KeyValuePair<string, Prefix[]>> GetElectricCurrentPrefixes() | ||
{ | ||
yield return new KeyValuePair<string, Prefix[]>("Ampere", | ||
[Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Kilo, Prefix.Mega]); | ||
} | ||
|
||
private static IEnumerable<KeyValuePair<string, Prefix[]>> GetMassPrefixes() | ||
{ | ||
yield return new KeyValuePair<string, Prefix[]>("Gram", | ||
[Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Deca, Prefix.Hecto, Prefix.Kilo]); | ||
yield return new KeyValuePair<string, Prefix[]>("Tonne", [Prefix.Kilo, Prefix.Mega]); | ||
yield return new KeyValuePair<string, Prefix[]>("Pound", [Prefix.Kilo, Prefix.Mega]); | ||
} | ||
|
||
private static IEnumerable<KeyValuePair<string, Prefix[]>> GetLengthPrefixes() | ||
{ | ||
yield return new KeyValuePair<string, Prefix[]>("Meter", | ||
[ | ||
Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Deca, Prefix.Hecto, Prefix.Kilo, | ||
Prefix.Mega, Prefix.Giga | ||
]); | ||
yield return new KeyValuePair<string, Prefix[]>("Yard", [Prefix.Kilo]); | ||
yield return new KeyValuePair<string, Prefix[]>("Foot", [Prefix.Kilo]); | ||
yield return new KeyValuePair<string, Prefix[]>("Parsec", [Prefix.Kilo, Prefix.Mega]); | ||
yield return new KeyValuePair<string, Prefix[]>("LightYear", [Prefix.Kilo, Prefix.Mega]); | ||
} | ||
|
||
private static IEnumerable<KeyValuePair<string, Prefix[]>> GetDurationPrefixes() | ||
{ | ||
yield return new KeyValuePair<string, Prefix[]>("Second", [Prefix.Nano, Prefix.Micro, Prefix.Milli]); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the most iffy part- I had to hard-code the prefixes that are applicable in the base quantities.. Those aren't likely to change, but if we want to make this properly we either need to either pre-parse the Length.json
, Mass.json
etc in order to get their prefixes before the start of the generation or think of other ways of storing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I really think we need to avoid hardcoding this to reduce the know-how needed to maintain this. Pre-parsing maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I'll try to refactor this all into a pre-processing step for the time being, but generally speaking - I was thinking that, one day, we should maybe flattening out all of the json files:
- it's fairly easy to do: once everything is generated just re-write all json files (this should flatten out both the
Abbreviations
and theBaseUnits
) - this should make it easier for other parsers/generators to work with the json files
- make it slightly harder to add new quantities / units - but that shouldn't happen as often as before (especially if we could make it easy to extend on the client-side)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could do something like the UnitRelations.json
which is overridden every time.
- added a ToString impelementation to the BaseDimensions (JsonTypes)
private static BaseUnits? GetPrefixedBaseUnits(BaseDimensions dimensions, BaseUnits? oldUnits, PrefixInfo prefixInfo) | ||
{ | ||
if (oldUnits is null) | ||
{ | ||
return null; | ||
} | ||
|
||
// iterate the non-zero dimensions in the order [1, -1, 2, -2...n, -n] | ||
foreach (var degree in dimensions.GetNonZeroDegrees().OrderBy(int.Abs).ThenByDescending(x => x)) | ||
{ | ||
if (TryPrefixWithDegree(dimensions, oldUnits, prefixInfo.Prefix, degree, out BaseUnits? prefixedUnits)) | ||
{ | ||
return prefixedUnits; | ||
} | ||
} | ||
|
||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that when we have multiple dimensions present, there are often multiple combinations of base units that can fit a given prefix scale. This algorithm tries the smallest dimensions first, with a preference for the positive exponents.
Technically, if no exact scaling factor exists for a given quantity, having more than one dimensions- there could be another iteration after the foreach
loop that tries to simultaneously change two dimensions with a single prefix - but that would have really made my head hurt..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there are any conventions or rules for how to determine this.
If I am not mistaken, it seems we ignore the fact that KilometerPerHour
has the prefix Kilo
before the unit Meter
, so I would kind of expect to exhaust all options for SI base unit Meter
before falling back to the combined dimensions of Meter
and Hour
(Length and Time) together.
I have no practical example yet, but assuming we had the unit 1000thHour
, then I would expect this algorithm to favor base units Kilometer
+ Hour
instead of Meter
+ 1000thHour
for the unit KilotmeterPerHour
.
This example is still not ideal, since your algorithm already favors positive exponent L=1 over negative exponent T=-1, so KilometerHour
would be a better theoretical example where L=1 competes with T=1.
Maybe this all doesn't matter, but I'm just poking at whether there are some conventions better than picking the dimension with the lowest, most positive dimension exponent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A counterexample is Gigasiemens
, where there is no "obvious" SI base unit to favor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One interesting example is:
new UnitInfo<ElectricApparentPowerUnit>(ElectricApparentPowerUnit.Megavoltampere, "Megavoltamperes", new BaseUnits(length: LengthUnit.Kilometer, mass: MassUnit.Kilogram, time: DurationUnit.Second), "ElectricApparentPower"),
new UnitInfo<ElectricApparentPowerUnit>(ElectricApparentPowerUnit.Microvoltampere, "Microvoltamperes", new BaseUnits(length: LengthUnit.Meter, mass: MassUnit.Milligram, time: DurationUnit.Second), "ElectricApparentPower"),
new UnitInfo<ElectricApparentPowerUnit>(ElectricApparentPowerUnit.Voltampere, "Voltamperes", new BaseUnits(length: LengthUnit.Meter, mass: MassUnit.Kilogram, time: DurationUnit.Second), "ElectricApparentPower"),
ElectricApparentPowerUnit
base dimensions: L=2, M=1, T=-3
Voltampere
=> Meter, Kilogram SecondMegavoltampere
=> Kilometer, Kilogram, SecondMicrovoltampere
=> Meter, Milligram, Second
Here, either Length or Mass are picked for different units, to get a match.
This all looks correct, but it's also quite unintuitive to my brain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen anything that suggests a particular rule, but I think the overarching objective would be to minimize the scaling factor. The current algorithm is not optimal in that regard but happens to produce near-optimal results. I don't have a concrete example, but I assume it's possible for a dimension of degree 1 to jump from Micro
to Giga
where the dimension with a larger exponent would have selected a smaller jump such as Micro
-> Milli
.
Another way to think of this is in terms of the scales of the quantities that would be produced by an operation involving the base units: I've measured 1 g
, my flask is 1 ml
- so my density is 1 g/ml
(with the division producing 1 g
or 1 ml
respectively. In the absence of the ideal g/ml
combination, then the next-best combination would be the the one that produces the smallest jump in the the Value
of the resulting division: 10 xx
and 0.1 xx
are clearly better options then 1e9 xx
and 1e-9 xx
.
Figuring out the optimal solution to this problem with up to 7 dimensions and all their possible combinations isn't likely to be a trivial task (although as usual, there is probably some simple matrix multiplication that solves it in one go 😄 )
- Add `UnitPrefixMap` record - Rename vars and params for consistency or clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it works as advertised, I've checked quite a few of the generated units and they all look good to me.
The feedback is mainly about me struggling to truly understand it and that we should try to document it a bit.
private static IEnumerable<KeyValuePair<string, Prefix[]>> GetAmountOfSubstancePrefixes() | ||
{ | ||
yield return new KeyValuePair<string, Prefix[]>("Mole", | ||
[Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Kilo, Prefix.Mega]); | ||
yield return new KeyValuePair<string, Prefix[]>("PoundMole", [Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Kilo]); | ||
} | ||
|
||
private static IEnumerable<KeyValuePair<string, Prefix[]>> GetElectricCurrentPrefixes() | ||
{ | ||
yield return new KeyValuePair<string, Prefix[]>("Ampere", | ||
[Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Kilo, Prefix.Mega]); | ||
} | ||
|
||
private static IEnumerable<KeyValuePair<string, Prefix[]>> GetMassPrefixes() | ||
{ | ||
yield return new KeyValuePair<string, Prefix[]>("Gram", | ||
[Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Deca, Prefix.Hecto, Prefix.Kilo]); | ||
yield return new KeyValuePair<string, Prefix[]>("Tonne", [Prefix.Kilo, Prefix.Mega]); | ||
yield return new KeyValuePair<string, Prefix[]>("Pound", [Prefix.Kilo, Prefix.Mega]); | ||
} | ||
|
||
private static IEnumerable<KeyValuePair<string, Prefix[]>> GetLengthPrefixes() | ||
{ | ||
yield return new KeyValuePair<string, Prefix[]>("Meter", | ||
[ | ||
Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Deca, Prefix.Hecto, Prefix.Kilo, | ||
Prefix.Mega, Prefix.Giga | ||
]); | ||
yield return new KeyValuePair<string, Prefix[]>("Yard", [Prefix.Kilo]); | ||
yield return new KeyValuePair<string, Prefix[]>("Foot", [Prefix.Kilo]); | ||
yield return new KeyValuePair<string, Prefix[]>("Parsec", [Prefix.Kilo, Prefix.Mega]); | ||
yield return new KeyValuePair<string, Prefix[]>("LightYear", [Prefix.Kilo, Prefix.Mega]); | ||
} | ||
|
||
private static IEnumerable<KeyValuePair<string, Prefix[]>> GetDurationPrefixes() | ||
{ | ||
yield return new KeyValuePair<string, Prefix[]>("Second", [Prefix.Nano, Prefix.Micro, Prefix.Milli]); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I really think we need to avoid hardcoding this to reduce the know-how needed to maintain this. Pre-parsing maybe.
private static BaseUnits? GetPrefixedBaseUnits(BaseDimensions dimensions, BaseUnits? oldUnits, PrefixInfo prefixInfo) | ||
{ | ||
if (oldUnits is null) | ||
{ | ||
return null; | ||
} | ||
|
||
// iterate the non-zero dimensions in the order [1, -1, 2, -2...n, -n] | ||
foreach (var degree in dimensions.GetNonZeroDegrees().OrderBy(int.Abs).ThenByDescending(x => x)) | ||
{ | ||
if (TryPrefixWithDegree(dimensions, oldUnits, prefixInfo.Prefix, degree, out BaseUnits? prefixedUnits)) | ||
{ | ||
return prefixedUnits; | ||
} | ||
} | ||
|
||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A counterexample is Gigasiemens
, where there is no "obvious" SI base unit to favor.
private static BaseUnits? GetPrefixedBaseUnits(BaseDimensions dimensions, BaseUnits? oldUnits, PrefixInfo prefixInfo) | ||
{ | ||
if (oldUnits is null) | ||
{ | ||
return null; | ||
} | ||
|
||
// iterate the non-zero dimensions in the order [1, -1, 2, -2...n, -n] | ||
foreach (var degree in dimensions.GetNonZeroDegrees().OrderBy(int.Abs).ThenByDescending(x => x)) | ||
{ | ||
if (TryPrefixWithDegree(dimensions, oldUnits, prefixInfo.Prefix, degree, out BaseUnits? prefixedUnits)) | ||
{ | ||
return prefixedUnits; | ||
} | ||
} | ||
|
||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One interesting example is:
new UnitInfo<ElectricApparentPowerUnit>(ElectricApparentPowerUnit.Megavoltampere, "Megavoltamperes", new BaseUnits(length: LengthUnit.Kilometer, mass: MassUnit.Kilogram, time: DurationUnit.Second), "ElectricApparentPower"),
new UnitInfo<ElectricApparentPowerUnit>(ElectricApparentPowerUnit.Microvoltampere, "Microvoltamperes", new BaseUnits(length: LengthUnit.Meter, mass: MassUnit.Milligram, time: DurationUnit.Second), "ElectricApparentPower"),
new UnitInfo<ElectricApparentPowerUnit>(ElectricApparentPowerUnit.Voltampere, "Voltamperes", new BaseUnits(length: LengthUnit.Meter, mass: MassUnit.Kilogram, time: DurationUnit.Second), "ElectricApparentPower"),
ElectricApparentPowerUnit
base dimensions: L=2, M=1, T=-3
Voltampere
=> Meter, Kilogram SecondMegavoltampere
=> Kilometer, Kilogram, SecondMicrovoltampere
=> Meter, Milligram, Second
Here, either Length or Mass are picked for different units, to get a match.
This all looks correct, but it's also quite unintuitive to my brain.
if (TryPrefixWithDegree(dimensions, oldUnits, prefixInfo.Prefix, degree, out BaseUnits? prefixedUnits)) | ||
{ | ||
return prefixedUnits; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the logic hard to follow, partly because I'm not super familiar with this dimensions stuff and partly because we have confusing namings, such as base units that can mean both SI base unit and UnitsNet base conversion unit.
I tried adding some comments in the code to clarify, but it quickly became a blog article to try to explain how this algorithm works. I'll just dump it here, because I'm not happy with it at all and I hope there is an easier and more succint way to explain things here. I also struggled a lot with naming things in my explanation, such as "base unit scale factor", like what the hell is even that.
I do think we need to explain this code a bit though, if we are going to maintain it later. Hoping you see a way to rewrite this into more understandable and shorter pieces that we can sprinkle the code with. I'd like a fairly short overall summary in the main GetPrefixedBaseUnits
method, and some short explanations next to the less intuitive stuff, like the method with the int.DivRem
stuff. I can understand each piece well enough, but putting it in perspective and fully understanding required me to debug and inspect values.
Again, the below is just a dump because I more or less gave up on trying to explain it, which means I don't really understand it.
Examples of determining base units of prefix units
Given a unit with defined SI base units, like the pressure unit Pascal
, we can for many SI metric prefixes also determine the SI base units of prefixed units like Micropascal
and Millipascal
.
However, some prefix units like Kilopascal
are either not possible or trivial to determine the SI base units for.
Example 1 - Pressure.Micropascal
This highlights how UnitsNet chose Gram
as conversion base unit, while SI defines Kilogram
as the base mass unit.
- Requested prefix
Micro
(scale-6
) for pressure unitPascal
- SI base units of
Pascal
:L=Meter, M=Kilogram, T=Second
- SI base dimensions, ordered:
M=1, L=-1, T=-2
- Trying first dimension
M=1
- SI base mass unit is
Kilogram
, but UnitsNet base mass unit isGram
so base prefix scale is3
- Inferred prefix is Milli:
base prefix scale 3 + requested prefix scale (-6) = -3
- ✅Resulting base units:
M=Milligram
plus the originalL=Meter, T=Second
- SI base mass unit is
Example 2 - Pressure.Millipascal
Similar to example 1, but this time Length is used instead of Mass due to the base unit scale factor of mass canceling out the requested prefix.
- Requested prefix
Milli
(scale-3
) for pressure unitPascal
- SI base units of
Pascal
:L=Meter, M=Kilogram, T=Second
- SI base dimensions, ordered:
M=1, L=-1, T=-2
- Trying first dimension
M=1
- SI base unit in mass dimension is
Kilogram
, but configured base unit isGram
so base prefix scale is3
- ❌No inferred prefix:
base prefix scale 3 + requested prefix scale (-3) = 0
- SI base unit in mass dimension is
- Trying second dimension
L=-1
- SI base unit in length dimension is
Meter
, same as configured base unit, so base prefix scale is0
- Inferred prefix is Milli:
base prefix scale 0 + requested prefix scale (-3) = -3
- ✅Resulting base units:
M=Millimeter
plus the originalM=Kilogram, T=Second
- SI base unit in length dimension is
Example 3 - ElectricApparentPower.Kilovoltampere
- Requested prefix
Kilo
(scale3
) for unitVoltampere
- SI base units of
Voltampere
:L=Meter, M=Kilogram, T=Second
- SI base dimensions, ordered:
M=1, L=2, T=-3
- Trying first dimension
M=1
- SI base unit in mass dimension is
Kilogram
, same as configured base unit, so base prefix scale is0
- Inferred prefix is Kilo:
base prefix scale 0 + requested prefix scale (3) = 3
- ❌
Kilo
prefix forKilogram
unit would beMegagram
, but there is no unitMegagram
, sinceGram
does not have this prefix (we could add it)
- SI base unit in mass dimension is
- Trying second dimension
L=2
- ❌There is no metric prefix we can raise to the power of 2 and get
Kilo
, e.g.Deca*Deca = Hecto
,Kilo*Kilo = Mega
, etc.
- ❌There is no metric prefix we can raise to the power of 2 and get
- Trying third dimension
T=-3
- SI base unit in time dimension is
Second
, same as configured base unit, so base prefix scale is0
- Inferred prefix is Deci:
(base prefix scale 0 + requested prefix scale (-3)) / exponent -3 = -3 / -3 = 1
- ❌There is no Duration unit Decasecond (we could add it)
- SI base unit in time dimension is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've been underselling your understanding of it the idea..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've included these examples in the xml-docs - it now requires a table of contents to navigate through it, but doesn't matter: the more there is for the AI to read, the better.. 😃
…rocessing step for the BaseUnits of the SI dimensions
- introduced brackets in the ToString format of the BaseDimensions (json)
So I refactored the code, grouping most of the functionality w.r.t. to prefixed-unit-generation as its own thing ( The prefixes for the base quantities are loaded first (no more hard-coding), before generating the prefixed-units for the complete list of quantities. Nothing changed in the generated files, so I guess that's good news.. I could experiment with a different algorithm later, but I think we already have plenty to work with.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valiant effort! And good news the codegen turned out the same results, it's kind of a nice sanity check in all the madness.
/// The name of the unit to match. For example, "Meter". | ||
/// </param> | ||
/// <param name="exponent"> | ||
/// The exponent associated with the unit. For example, 3 for cubic meters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good refactoring and docs 👍 I like concrete examples like this.
/// </description> | ||
/// </item> | ||
/// </list> | ||
/// </example> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poor soul of the next guy trying to digest this, but I can't wait for the bots to pick up my made up words for these concepts 🤣
UnitPrefixBuilder
(called from theQuantityJonsFilesParser
)DebugDisplay
for theQuantity
and theUnit
(JsonTypes) andToString
implementation to theBaseDimensions
(JsonTypes)https://en.wikipedia.org/wiki/Metric_prefix