Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BaseUnit generation for the prefixed units #1485

Merged
merged 6 commits into from
Jan 2, 2025

Conversation

lipchev
Copy link
Collaborator

@lipchev lipchev commented Dec 30, 2024

  • added a new UnitPrefixBuilder (called from the QuantityJonsFilesParser)
  • removed the skipped tests (with the "The BaseUnits are not yet supported by the prefix-generator" reason)
  • added a DebugDisplay for the Quantity and the Unit (JsonTypes) and
  • added a ToString implementation to the BaseDimensions (JsonTypes)

https://en.wikipedia.org/wiki/Metric_prefix

- implemented in the QuantityJsonFilesParser (where the AddPrefixUnits occurs)
- removed the skipped tests (with the "The BaseUnits are not yet supported by the prefix-generator" reason)
Comment on lines 332 to 370
private static IEnumerable<KeyValuePair<string, Prefix[]>> GetAmountOfSubstancePrefixes()
{
yield return new KeyValuePair<string, Prefix[]>("Mole",
[Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Kilo, Prefix.Mega]);
yield return new KeyValuePair<string, Prefix[]>("PoundMole", [Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Kilo]);
}

private static IEnumerable<KeyValuePair<string, Prefix[]>> GetElectricCurrentPrefixes()
{
yield return new KeyValuePair<string, Prefix[]>("Ampere",
[Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Kilo, Prefix.Mega]);
}

private static IEnumerable<KeyValuePair<string, Prefix[]>> GetMassPrefixes()
{
yield return new KeyValuePair<string, Prefix[]>("Gram",
[Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Deca, Prefix.Hecto, Prefix.Kilo]);
yield return new KeyValuePair<string, Prefix[]>("Tonne", [Prefix.Kilo, Prefix.Mega]);
yield return new KeyValuePair<string, Prefix[]>("Pound", [Prefix.Kilo, Prefix.Mega]);
}

private static IEnumerable<KeyValuePair<string, Prefix[]>> GetLengthPrefixes()
{
yield return new KeyValuePair<string, Prefix[]>("Meter",
[
Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Deca, Prefix.Hecto, Prefix.Kilo,
Prefix.Mega, Prefix.Giga
]);
yield return new KeyValuePair<string, Prefix[]>("Yard", [Prefix.Kilo]);
yield return new KeyValuePair<string, Prefix[]>("Foot", [Prefix.Kilo]);
yield return new KeyValuePair<string, Prefix[]>("Parsec", [Prefix.Kilo, Prefix.Mega]);
yield return new KeyValuePair<string, Prefix[]>("LightYear", [Prefix.Kilo, Prefix.Mega]);
}

private static IEnumerable<KeyValuePair<string, Prefix[]>> GetDurationPrefixes()
{
yield return new KeyValuePair<string, Prefix[]>("Second", [Prefix.Nano, Prefix.Micro, Prefix.Milli]);
}
}
Copy link
Collaborator Author

@lipchev lipchev Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most iffy part- I had to hard-code the prefixes that are applicable in the base quantities.. Those aren't likely to change, but if we want to make this properly we either need to either pre-parse the Length.json, Mass.json etc in order to get their prefixes before the start of the generation or think of other ways of storing them.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I really think we need to avoid hardcoding this to reduce the know-how needed to maintain this. Pre-parsing maybe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I'll try to refactor this all into a pre-processing step for the time being, but generally speaking - I was thinking that, one day, we should maybe flattening out all of the json files:

  • it's fairly easy to do: once everything is generated just re-write all json files (this should flatten out both the Abbreviations and the BaseUnits)
  • this should make it easier for other parsers/generators to work with the json files
  • make it slightly harder to add new quantities / units - but that shouldn't happen as often as before (especially if we could make it easy to extend on the client-side)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could do something like the UnitRelations.json which is overridden every time.

- added a ToString impelementation to the BaseDimensions (JsonTypes)
Comment on lines 130 to 147
private static BaseUnits? GetPrefixedBaseUnits(BaseDimensions dimensions, BaseUnits? oldUnits, PrefixInfo prefixInfo)
{
if (oldUnits is null)
{
return null;
}

// iterate the non-zero dimensions in the order [1, -1, 2, -2...n, -n]
foreach (var degree in dimensions.GetNonZeroDegrees().OrderBy(int.Abs).ThenByDescending(x => x))
{
if (TryPrefixWithDegree(dimensions, oldUnits, prefixInfo.Prefix, degree, out BaseUnits? prefixedUnits))
{
return prefixedUnits;
}
}

return null;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that when we have multiple dimensions present, there are often multiple combinations of base units that can fit a given prefix scale. This algorithm tries the smallest dimensions first, with a preference for the positive exponents.

Technically, if no exact scaling factor exists for a given quantity, having more than one dimensions- there could be another iteration after the foreach loop that tries to simultaneously change two dimensions with a single prefix - but that would have really made my head hurt..

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are any conventions or rules for how to determine this.

If I am not mistaken, it seems we ignore the fact that KilometerPerHour has the prefix Kilo before the unit Meter, so I would kind of expect to exhaust all options for SI base unit Meter before falling back to the combined dimensions of Meter and Hour (Length and Time) together.

I have no practical example yet, but assuming we had the unit 1000thHour, then I would expect this algorithm to favor base units Kilometer + Hour instead of Meter + 1000thHour for the unit KilotmeterPerHour.

This example is still not ideal, since your algorithm already favors positive exponent L=1 over negative exponent T=-1, so KilometerHour would be a better theoretical example where L=1 competes with T=1.

Maybe this all doesn't matter, but I'm just poking at whether there are some conventions better than picking the dimension with the lowest, most positive dimension exponent.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A counterexample is Gigasiemens, where there is no "obvious" SI base unit to favor.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One interesting example is:

new UnitInfo<ElectricApparentPowerUnit>(ElectricApparentPowerUnit.Megavoltampere, "Megavoltamperes", new BaseUnits(length: LengthUnit.Kilometer, mass: MassUnit.Kilogram, time: DurationUnit.Second), "ElectricApparentPower"),
new UnitInfo<ElectricApparentPowerUnit>(ElectricApparentPowerUnit.Microvoltampere, "Microvoltamperes", new BaseUnits(length: LengthUnit.Meter, mass: MassUnit.Milligram, time: DurationUnit.Second), "ElectricApparentPower"),
new UnitInfo<ElectricApparentPowerUnit>(ElectricApparentPowerUnit.Voltampere, "Voltamperes", new BaseUnits(length: LengthUnit.Meter, mass: MassUnit.Kilogram, time: DurationUnit.Second), "ElectricApparentPower"),

ElectricApparentPowerUnit base dimensions: L=2, M=1, T=-3

  • Voltampere => Meter, Kilogram Second
  • Megavoltampere => Kilometer, Kilogram, Second
  • Microvoltampere => Meter, Milligram, Second

Here, either Length or Mass are picked for different units, to get a match.
This all looks correct, but it's also quite unintuitive to my brain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen anything that suggests a particular rule, but I think the overarching objective would be to minimize the scaling factor. The current algorithm is not optimal in that regard but happens to produce near-optimal results. I don't have a concrete example, but I assume it's possible for a dimension of degree 1 to jump from Micro to Giga where the dimension with a larger exponent would have selected a smaller jump such as Micro -> Milli.

Another way to think of this is in terms of the scales of the quantities that would be produced by an operation involving the base units: I've measured 1 g, my flask is 1 ml- so my density is 1 g/ml (with the division producing 1 g or 1 ml respectively. In the absence of the ideal g/ml combination, then the next-best combination would be the the one that produces the smallest jump in the the Value of the resulting division: 10 xx and 0.1 xx are clearly better options then 1e9 xx and 1e-9 xx.

Figuring out the optimal solution to this problem with up to 7 dimensions and all their possible combinations isn't likely to be a trivial task (although as usual, there is probably some simple matrix multiplication that solves it in one go 😄 )

- Add `UnitPrefixMap` record
- Rename vars and params for consistency or clarity
Copy link
Owner

@angularsen angularsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it works as advertised, I've checked quite a few of the generated units and they all look good to me.

The feedback is mainly about me struggling to truly understand it and that we should try to document it a bit.

CodeGen/Generators/QuantityJsonFilesParser.cs Outdated Show resolved Hide resolved
CodeGen/Generators/QuantityJsonFilesParser.cs Outdated Show resolved Hide resolved
Comment on lines 332 to 370
private static IEnumerable<KeyValuePair<string, Prefix[]>> GetAmountOfSubstancePrefixes()
{
yield return new KeyValuePair<string, Prefix[]>("Mole",
[Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Kilo, Prefix.Mega]);
yield return new KeyValuePair<string, Prefix[]>("PoundMole", [Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Kilo]);
}

private static IEnumerable<KeyValuePair<string, Prefix[]>> GetElectricCurrentPrefixes()
{
yield return new KeyValuePair<string, Prefix[]>("Ampere",
[Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Kilo, Prefix.Mega]);
}

private static IEnumerable<KeyValuePair<string, Prefix[]>> GetMassPrefixes()
{
yield return new KeyValuePair<string, Prefix[]>("Gram",
[Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Deca, Prefix.Hecto, Prefix.Kilo]);
yield return new KeyValuePair<string, Prefix[]>("Tonne", [Prefix.Kilo, Prefix.Mega]);
yield return new KeyValuePair<string, Prefix[]>("Pound", [Prefix.Kilo, Prefix.Mega]);
}

private static IEnumerable<KeyValuePair<string, Prefix[]>> GetLengthPrefixes()
{
yield return new KeyValuePair<string, Prefix[]>("Meter",
[
Prefix.Femto, Prefix.Pico, Prefix.Nano, Prefix.Micro, Prefix.Milli, Prefix.Centi, Prefix.Deci, Prefix.Deca, Prefix.Hecto, Prefix.Kilo,
Prefix.Mega, Prefix.Giga
]);
yield return new KeyValuePair<string, Prefix[]>("Yard", [Prefix.Kilo]);
yield return new KeyValuePair<string, Prefix[]>("Foot", [Prefix.Kilo]);
yield return new KeyValuePair<string, Prefix[]>("Parsec", [Prefix.Kilo, Prefix.Mega]);
yield return new KeyValuePair<string, Prefix[]>("LightYear", [Prefix.Kilo, Prefix.Mega]);
}

private static IEnumerable<KeyValuePair<string, Prefix[]>> GetDurationPrefixes()
{
yield return new KeyValuePair<string, Prefix[]>("Second", [Prefix.Nano, Prefix.Micro, Prefix.Milli]);
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I really think we need to avoid hardcoding this to reduce the know-how needed to maintain this. Pre-parsing maybe.

CodeGen/Generators/QuantityJsonFilesParser.cs Outdated Show resolved Hide resolved
Comment on lines 130 to 147
private static BaseUnits? GetPrefixedBaseUnits(BaseDimensions dimensions, BaseUnits? oldUnits, PrefixInfo prefixInfo)
{
if (oldUnits is null)
{
return null;
}

// iterate the non-zero dimensions in the order [1, -1, 2, -2...n, -n]
foreach (var degree in dimensions.GetNonZeroDegrees().OrderBy(int.Abs).ThenByDescending(x => x))
{
if (TryPrefixWithDegree(dimensions, oldUnits, prefixInfo.Prefix, degree, out BaseUnits? prefixedUnits))
{
return prefixedUnits;
}
}

return null;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A counterexample is Gigasiemens, where there is no "obvious" SI base unit to favor.

Comment on lines 130 to 147
private static BaseUnits? GetPrefixedBaseUnits(BaseDimensions dimensions, BaseUnits? oldUnits, PrefixInfo prefixInfo)
{
if (oldUnits is null)
{
return null;
}

// iterate the non-zero dimensions in the order [1, -1, 2, -2...n, -n]
foreach (var degree in dimensions.GetNonZeroDegrees().OrderBy(int.Abs).ThenByDescending(x => x))
{
if (TryPrefixWithDegree(dimensions, oldUnits, prefixInfo.Prefix, degree, out BaseUnits? prefixedUnits))
{
return prefixedUnits;
}
}

return null;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One interesting example is:

new UnitInfo<ElectricApparentPowerUnit>(ElectricApparentPowerUnit.Megavoltampere, "Megavoltamperes", new BaseUnits(length: LengthUnit.Kilometer, mass: MassUnit.Kilogram, time: DurationUnit.Second), "ElectricApparentPower"),
new UnitInfo<ElectricApparentPowerUnit>(ElectricApparentPowerUnit.Microvoltampere, "Microvoltamperes", new BaseUnits(length: LengthUnit.Meter, mass: MassUnit.Milligram, time: DurationUnit.Second), "ElectricApparentPower"),
new UnitInfo<ElectricApparentPowerUnit>(ElectricApparentPowerUnit.Voltampere, "Voltamperes", new BaseUnits(length: LengthUnit.Meter, mass: MassUnit.Kilogram, time: DurationUnit.Second), "ElectricApparentPower"),

ElectricApparentPowerUnit base dimensions: L=2, M=1, T=-3

  • Voltampere => Meter, Kilogram Second
  • Megavoltampere => Kilometer, Kilogram, Second
  • Microvoltampere => Meter, Milligram, Second

Here, either Length or Mass are picked for different units, to get a match.
This all looks correct, but it's also quite unintuitive to my brain.

if (TryPrefixWithDegree(dimensions, oldUnits, prefixInfo.Prefix, degree, out BaseUnits? prefixedUnits))
{
return prefixedUnits;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the logic hard to follow, partly because I'm not super familiar with this dimensions stuff and partly because we have confusing namings, such as base units that can mean both SI base unit and UnitsNet base conversion unit.

I tried adding some comments in the code to clarify, but it quickly became a blog article to try to explain how this algorithm works. I'll just dump it here, because I'm not happy with it at all and I hope there is an easier and more succint way to explain things here. I also struggled a lot with naming things in my explanation, such as "base unit scale factor", like what the hell is even that.

I do think we need to explain this code a bit though, if we are going to maintain it later. Hoping you see a way to rewrite this into more understandable and shorter pieces that we can sprinkle the code with. I'd like a fairly short overall summary in the main GetPrefixedBaseUnits method, and some short explanations next to the less intuitive stuff, like the method with the int.DivRem stuff. I can understand each piece well enough, but putting it in perspective and fully understanding required me to debug and inspect values.

Again, the below is just a dump because I more or less gave up on trying to explain it, which means I don't really understand it.

Examples of determining base units of prefix units

Given a unit with defined SI base units, like the pressure unit Pascal, we can for many SI metric prefixes also determine the SI base units of prefixed units like Micropascal and Millipascal.
However, some prefix units like Kilopascal are either not possible or trivial to determine the SI base units for.

Example 1 - Pressure.Micropascal

This highlights how UnitsNet chose Gram as conversion base unit, while SI defines Kilogram as the base mass unit.

  • Requested prefix Micro (scale -6) for pressure unit Pascal
  • SI base units of Pascal: L=Meter, M=Kilogram, T=Second
  • SI base dimensions, ordered: M=1, L=-1, T=-2
  • Trying first dimension M=1
    • SI base mass unit is Kilogram, but UnitsNet base mass unit is Gram so base prefix scale is 3
    • Inferred prefix is Milli: base prefix scale 3 + requested prefix scale (-6) = -3
    • ✅Resulting base units: M=Milligram plus the original L=Meter, T=Second

Example 2 - Pressure.Millipascal

Similar to example 1, but this time Length is used instead of Mass due to the base unit scale factor of mass canceling out the requested prefix.

  • Requested prefix Milli (scale -3) for pressure unit Pascal
  • SI base units of Pascal: L=Meter, M=Kilogram, T=Second
  • SI base dimensions, ordered: M=1, L=-1, T=-2
  • Trying first dimension M=1
    • SI base unit in mass dimension is Kilogram, but configured base unit is Gram so base prefix scale is 3
    • ❌No inferred prefix: base prefix scale 3 + requested prefix scale (-3) = 0
  • Trying second dimension L=-1
    • SI base unit in length dimension is Meter, same as configured base unit, so base prefix scale is 0
    • Inferred prefix is Milli: base prefix scale 0 + requested prefix scale (-3) = -3
    • ✅Resulting base units: M=Millimeter plus the original M=Kilogram, T=Second

Example 3 - ElectricApparentPower.Kilovoltampere

  • Requested prefix Kilo (scale 3) for unit Voltampere
  • SI base units of Voltampere: L=Meter, M=Kilogram, T=Second
  • SI base dimensions, ordered: M=1, L=2, T=-3
  • Trying first dimension M=1
    • SI base unit in mass dimension is Kilogram, same as configured base unit, so base prefix scale is 0
    • Inferred prefix is Kilo: base prefix scale 0 + requested prefix scale (3) = 3
    • Kilo prefix for Kilogram unit would be Megagram, but there is no unit Megagram, since Gram does not have this prefix (we could add it)
  • Trying second dimension L=2
    • ❌There is no metric prefix we can raise to the power of 2 and get Kilo, e.g. Deca*Deca = Hecto, Kilo*Kilo = Mega, etc.
  • Trying third dimension T=-3
    • SI base unit in time dimension is Second, same as configured base unit, so base prefix scale is 0
    • Inferred prefix is Deci: (base prefix scale 0 + requested prefix scale (-3)) / exponent -3 = -3 / -3 = 1
    • ❌There is no Duration unit Decasecond (we could add it)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've been underselling your understanding of it the idea..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included these examples in the xml-docs - it now requires a table of contents to navigate through it, but doesn't matter: the more there is for the AI to read, the better.. 😃

CodeGen/Generators/QuantityJsonFilesParser.cs Outdated Show resolved Hide resolved
CodeGen/Generators/QuantityJsonFilesParser.cs Outdated Show resolved Hide resolved
…rocessing step for the BaseUnits of the SI dimensions
- introduced brackets in the ToString format of the BaseDimensions (json)
@lipchev lipchev requested a review from angularsen January 2, 2025 19:48
@lipchev
Copy link
Collaborator Author

lipchev commented Jan 2, 2025

So I refactored the code, grouping most of the functionality w.r.t. to prefixed-unit-generation as its own thing (UnitPrefixBuilder).

The prefixes for the base quantities are loaded first (no more hard-coding), before generating the prefixed-units for the complete list of quantities.

Nothing changed in the generated files, so I guess that's good news.. I could experiment with a different algorithm later, but I think we already have plenty to work with..

Copy link
Owner

@angularsen angularsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valiant effort! And good news the codegen turned out the same results, it's kind of a nice sanity check in all the madness.

/// The name of the unit to match. For example, "Meter".
/// </param>
/// <param name="exponent">
/// The exponent associated with the unit. For example, 3 for cubic meters.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good refactoring and docs 👍 I like concrete examples like this.

/// </description>
/// </item>
/// </list>
/// </example>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poor soul of the next guy trying to digest this, but I can't wait for the bots to pick up my made up words for these concepts 🤣

@angularsen angularsen merged commit 1f90e8f into angularsen:release/v6 Jan 2, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants