Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminating the difference in what an iterator yields #32

Open
ryanmolden opened this issue Dec 5, 2020 · 5 comments
Open

Eliminating the difference in what an iterator yields #32

ryanmolden opened this issue Dec 5, 2020 · 5 comments

Comments

@ryanmolden
Copy link

From your documentation you have this:

For iterators, operator*() and operator->() return a reference and a pointer to const std::pair<Key, T> instead of std::pair<const Key, T> making the value T not modifiable. To modify the value you have to call the value() method of the iterator to get a mutable reference. Example:

Is this something that could be eliminated? I.e. the divergence between ordered_map and unordered_map? I have code that iterates over an unordered_map in parallel using std::for_each. I need to modify the T values but with ordered_map I can't do this because the callback provided to for_each receives the dereferenced iterator, i.e. I end up with a const std::pair<Key, T> and thus have no way of calling value to get a modifiable T.

It also pops up in using the ranged-for loops:

for(auto& foo : some_ordered_map)

foo here is again const std::pair<Key, T> and thus I can't possibly call value even if I wanted to, meaning I have to change all ranged-for loops to be the older, non-ranged style.

@Tessil
Copy link
Owner

Tessil commented Dec 6, 2020

Hello,

Compared to a std::unordered_map which stores std::pair<const Key, T> the tsl::ordered_map has to store std::pair<Key, T> to be able to support move-only Key. As the std::unordered_map stores the pairs inside a node it can easily move around the pointer node, on the other hand the tsl::ordered_map stores the pairs inside a std::vector/std::deque and setting the Key as const inside the pair would force a call to the copy-constructor of Key when the pair must be moved.

The possible design choices I thought of:

  • Return a mutable reference to std::pair<Key, T>. This would allow the modification of T through .second but you can then also modify the Key which could lead the map to an undefined state. It's error-prone.
  • Return a std::pair<const Key, T>& by casting it from std::pair<Key, T>&. This is undefined behaviour though even if it'd be supported in most cases: https://stackoverflow.com/questions/14272141/is-casting-stdpairt1-t2-const-to-stdpairt1-const-t2-const-safe
  • Store std::pair<const Key, T> instead of std::pair<Key, T> as done by std::unordered_map and return a reference to it. Move-only keys would not be supported and the keys would need to be copied when the underlying std::vector grows.
  • The current solution, return a const std::pair<Key, T>& to disable the possibility of modifying Key and add a value() method to modify T.

I picked-up the current solution as a compromise to avoid the undefined behaviour or error-prone solutions aforementioned. There may be a better solution that I may have missed and any suggestion is welcome.

@doug-moen
Copy link

@Tessil Instead of returning a std::pair, the iterators could instead return a custom pair type, eg a tsl::ordered_map::pair value. This new type would replicate the structure of a std::pair, but add some additional structure that would allow for mutating the 'second' field.

@Tessil
Copy link
Owner

Tessil commented Dec 8, 2020

A custom pair would solve the range-based for-loop problem though it would not really be possible to expose a .first member. It'd break backward compatibility while still not being a drop-in iterator replacement for std::unordered_map::iterator.

A lot of issues are open on this .value() problem (Tessil/hopscotch-map#23, Tessil/hopscotch-map#49, Tessil/robin-map#4 and #28) and I understand it can be quite annoying and unintuitive when migrating to a tsl hash map. Maybe the best solution would be to just return a std::pair<Key, Value>& and put a warning that modifying the .first element of the pair is undefined behaviour but it can bring hard to track problems.

@mlogan
Copy link

mlogan commented May 21, 2021

Interesting. I switched to tsl::sparse_map from absl::flat_hash_map, which was transparent except that I had to add:

       const_cast<T*>(&(it->second));

in one spot, due to this issue. So absl::flat_hash_map doesn't present this difficulty despite being a flat hash table. Have you happened to look at how they did it?

@Tessil
Copy link
Owner

Tessil commented May 21, 2021

Hi,

From my understanding absl::flat_hash_map uses an union of std::pair<K, V> and std::pair<const K, V> when absl::container_internal::memory_internal::IsLayoutCompatible<K, V> value is true and takes advantage of:

// Accessing one of the union fields while the other is active is safe as
// long as they are layout-compatible, which is guaranteed by the definition of
// kMutableKeys. For C++11, the relevant section of the standard is
// https://timsong-cpp.github.io/cppwp/n3337/class.mem#19 (9.2.19)

(source)

If the two pairs are not layout compatible it is forced to copy K when resizing the map.

Example:

#include <functional>
#include <iostream>

#include "absl/container/flat_hash_map.h"

class A {
 public:
  A(int i) : m_i(i) { std::cout << "constructor" << std::endl; }

  A(const A& other) {
    std::cout << "copy constructor" << std::endl;
    m_i = other.m_i;
  }

  A(A&& other) noexcept {
    std::cout << "move constructor" << std::endl;
    m_i = other.m_i;
  }

  A& operator=(const A& other) {
    std::cout << "copy assignement" << std::endl;
    m_i = other.m_i;
    return *this;
  }

  A& operator=(A&& other) noexcept {
    std::cout << "move assignement" << std::endl;
    m_i = other.m_i;
    return *this;
  }

  virtual void foo() {}

  bool operator==(const A& other) const { return m_i == other.m_i; }

 public:
  int m_i;
};

namespace std {
template <>
struct hash<A> {
  size_t operator()(const A& a) const { return hash<int>()(a.m_i); }
};
}  // namespace std

int main() {
  std::cout << absl::container_internal::memory_internal::IsLayoutCompatible<
                   A, int>::value
            << std::endl;

  absl::flat_hash_map<A, int> map = {{A(1), 1}, {A(2), 2}};
  std::cout << "reserve" << std::endl;
  map.reserve(32);
}

Output:

0
constructor
move constructor
constructor
move constructor
copy constructor
copy constructor
copy constructor
reserve
copy constructor
copy constructor

If you remove the virtual method the layouts will be compatible and you then have:

1
constructor
move constructor
constructor
move constructor
copy constructor
move constructor
copy constructor
reserve
move constructor
move constructor

If you replace absl::flat_hash_map by tsl::robin_map you'll have the latest output in both cases as it always stores a std::pair<K, V> (but at the price of a value() method).

The crux of the problem is that I would like to store a std::pair<K, V> to be able to move K but the interface should ideally return a std::pair<const K, V>& and I don't think it's possible to do it in a standard-compliant way for every possible K and V.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants