Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw an error when collecting from iterators with inconsistent length #29458

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 27 additions & 3 deletions base/array.jl
Original file line number Diff line number Diff line change
Expand Up @@ -517,8 +517,25 @@ julia> collect(Float64, 1:2:5)
"""
collect(::Type{T}, itr) where {T} = _collect(T, itr, IteratorSize(itr))

_collect(::Type{T}, itr, isz::HasLength) where {T} = copyto!(Vector{T}(undef, Int(length(itr)::Integer)), itr)
_collect(::Type{T}, itr, isz::HasShape) where {T} = copyto!(similar(Array{T}, axes(itr)), itr)
function copyto_check_length!(dest::Array, src)
len = length(dest)
i = 0
for x in src
i == len &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like copyto! already does this check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but it doesn't do the one below, and it cannot be done with access to i.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then we should add a _copyto_impl! that takes a strict::Bool argument, using the current code from copyto!. Then it can be shared by both this code (passing true for strict) and copyto! (passing false).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. Though note that the error message will be less specific (like the current one in copyto!), since copyto! cannot assume that the problem comes from an invalid declared length. So I'm not sure it's really better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, either way. This does need to use eachindex though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if we keep this method, as it only applies to Array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it can also be called on OffsetArray and possibly on other arrays in corner cases. So I've taken the _copy_impl! approach.

throw(ErrorException("iterator returned more elements than its declared length"))
i += 1
@inbounds dest[i] = x
end
if i < len
throw(ErrorException("iterator returned fewer elements than its declared length"))
end
return dest
end

_collect(::Type{T}, itr, isz::HasLength) where {T} =
copyto_check_length!(Vector{T}(undef, Int(length(itr)::Integer)), itr)
_collect(::Type{T}, itr, isz::HasShape) where {T} =
copyto_check_length!(similar(Array{T}, axes(itr)), itr)
function _collect(::Type{T}, itr, isz::SizeUnknown) where T
a = Vector{T}()
for x in itr
Expand Down Expand Up @@ -561,7 +578,7 @@ collect(A::AbstractArray) = _collect_indices(axes(A), A)
collect_similar(cont, itr) = _collect(cont, itr, IteratorEltype(itr), IteratorSize(itr))

_collect(cont, itr, ::HasEltype, isz::Union{HasLength,HasShape}) =
copyto!(_similar_for(cont, eltype(itr), itr, isz), itr)
copyto_check_length!(_similar_for(cont, eltype(itr), itr, isz), itr)

function _collect(cont, itr, ::HasEltype, isz::SizeUnknown)
a = _similar_for(cont, eltype(itr), itr, isz)
Expand Down Expand Up @@ -618,6 +635,9 @@ function collect(itr::Generator)
else
y = iterate(itr)
if y === nothing
if isa(isz, Union{HasLength, HasShape}) && length(itr) != 0
throw(ErrorException("iterator returned fewer elements than its declared length"))
end
return _array_for(et, itr.iter, isz)
end
v1, st = y
Expand Down Expand Up @@ -667,6 +687,10 @@ function collect_to!(dest::AbstractArray{T}, itr, offs, st) where T
return collect_to!(new, itr, i+1, st)
end
end
i-1 < length(dest) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use lastindex.

throw(ErrorException("iterator returned fewer elements than its declared length"))
i-1 > length(dest) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the @inbounds dest[i] above has likely corrupted something or segfaulted. Should probably replace the while true with while i <= lastindex(dest) and verify here that iterate(itr, st) === nothing (or Base.isdone(itr, st) === true || iterate(itr, st) === nothing ?). Should then fix #29532.

Copy link
Member Author

@nalimilan nalimilan Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory I agree, I don't like this @inbounds call which isn't associated with an actual check of the index against the array. But it comes with an additional cost with hurts performance at lot since it disables SIMD (see discussion above. I've just tried it:

# After change
julia> x = Vector{Int}(undef, 1000);

julia> itr = 1:1000;

julia> @btime Base.collect_to!($x, $itr, 1, 0);
  676.981 ns (0 allocations: 0 bytes)

julia> y = copy(x);

julia> @btime Base.collect_to!($x, $y, 1, 1);
  847.618 ns (0 allocations: 0 bytes)

# Master
julia> x = Vector{Int}(undef, 1000);

julia> itr = 1:1000;

julia> @btime Base.collect_to!($x, $itr, 1, 0);

  86.233 ns (0 allocations: 0 bytes)

julia> y = copy(x);

julia> @btime Base.collect_to!($x, $y, 1, 1);
  90.980 ns (0 allocations: 0 bytes)

Maybe the compiler could be improved, but for now I guess we can't do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, that's bad.

throw(ErrorException("iterator returned more elements than its declared length"))
return dest
end

Expand Down
26 changes: 26 additions & 0 deletions test/arrayops.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2524,3 +2524,29 @@ Base.view(::T25958, args...) = args
@test t[end,end,1] == @view(t[end,end,1]) == @views t[end,end,1]
@test t[end,end,end] == @view(t[end,end,end]) == @views t[end,end,end]
end

@testset "collect on iterator with incorrect length" begin
# Iterator with declared length too large
struct InvalidIter1 end
Base.length(::InvalidIter1) = 2
Base.iterate(::InvalidIter1, i=1) = i > 1 ? nothing : (i, (i + 1))

@test_throws ErrorException collect(InvalidIter1())
@test_throws ErrorException collect(Any, InvalidIter1())
@test_throws ErrorException collect(Int, InvalidIter1())
@test_throws ErrorException [x for x in InvalidIter1()]
# Should also throw ErrorException
@test_broken length(Int[x for x in InvalidIter1()]) != 2

# Iterator with declared length too small
struct InvalidIter2 end
Base.length(::InvalidIter2) = 2
Base.iterate(::InvalidIter2, i=1) = i > 3 ? nothing : (i, (i + 1))

@test_throws ErrorException collect(InvalidIter2())
@test_throws ErrorException collect(Any, InvalidIter2())
@test_throws ErrorException collect(Int, InvalidIter2())
@test_throws ErrorException [x for x in InvalidIter2()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can test this, since it will do an invalid memory write.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, yeah, so the only way to catch this would be to drop the @inbounds in collect_to!, or put a i == len check as in copyto!/copyto_check_length!. Not sure that's worth it, but it sounds inconsistent to have it in one place and not the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add checks for some cases (when it can be efficient) than to have no checks at all, I'd say.

Copy link
Member Author

@nalimilan nalimilan Oct 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But are there reasons to care less about performance in copyto! than in collect_to!?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect_to! is used by comprehensions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does that mean it's OK for copyto! to be slow? :-/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't mean it's OK for copyto! to be slow, it just means we don't want regressions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was actually an honest question: maybe we should remove the check from copyto! to make it faster.

# Should also throw ErrorException
@test_broken length(Int[x for x in InvalidIter2()]) != 2
end