-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polygon_to_array returns shorter y dimension than source rasters #311
Comments
Hi @WY-CGhilardi thanks for posting. The default behavior of To start, could you try the following with your data? import geowombat as gw
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
poly_array = gw.polygon_to_array(training_polys, col="class", data=src, bounds_by="union")
print(poly_array) Are you able to share your data? |
Here's another test to see if the polygons are properly converted using a config context. import geowombat as gw
from geowombat.backends.rasterio_ import get_file_bounds
aoi_bounds = get_file_bounds(filenames, return_bounds=True, bounds_by='union')
with gw.config.update(ref_bounds=aoi_bounds):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
X, Xy, clf = fit(data=src, clf=pl, labels=training_polys, col='class') |
Currently ml functions don't work on time series stacks, this would also require time series training data. Instead you need to set with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5'],
stack_dim='band'
) as src:
X, Xy, clf = fit(data = src, clf = pl, labels = training_polys, col='class') |
I cannot share the data but it sounds like this is a issue with how the series is stacked. For completeness here are the two configs requested above. The first returns the same results as my previous call to with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
poly_array = gw.polygon_to_array(training_polys, col="class", data=src, bounds_by="union")
print(poly_array)
<xarray.DataArray 'array-74bf0873a2d08172017317f9392aa551' (band: 1, y: 16399,
x: 20596)>
dask.array<array, shape=(1, 16399, 20596), dtype=uint8, chunksize=(1, 1, 20596), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.3217508595230054, 0.0, 455302.1731317809, 0.0, -3.32175085...
crs: EPSG:6339
res: (3.3217508595230054, 3.321750859522998)
is_tiled: 1
The second returns the same error message as before but now with one extra row on the y axis ( from geowombat.backends.rasterio_ import get_file_bounds
aoi_bounds = get_file_bounds(filenames, return_bounds=True, bounds_by='union')
with gw.config.update(ref_bounds=aoi_bounds):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
X, Xy, clf = fit(data=src, clf=pl, labels=training_polys, col='class')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[11], line 9
4 with gw.config.update(ref_bounds=aoi_bounds):
5 with gw.open(
6 filenames,
7 band_names=['band1','band2','band3','band4','band5']
8 ) as src:
----> 9 X, Xy, clf = fit(data=src, clf=pl, labels=training_polys, col='class')
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/ml/classifiers.py:269, in Classifiers.fit(self, data, clf, labels, col, targ_name, targ_dim_name)
266 else:
267 data = self._add_time_dim(data)
--> 269 data, labels = self._prepare_labels(data, labels, col, targ_name)
270 X, Xna = self._prepare_predictors(data, targ_name)
271 clf = self._prepare_classifiers(clf)
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/ml/classifiers.py:77, in ClassifiersMixin._prepare_labels(self, data, labels, col, targ_name)
71 labels = polygon_to_array(labels, col=col, data=data)
73 labels = xr.concat([labels] * data.gw.ntime, dim="band").assign_coords(
74 coords={"band": data.time.values.tolist()}
75 )
---> 77 data.coords[targ_name] = (["time", "y", "x"], labels.values)
79 return data, labels
File /env/lib/python3.10/site-packages/xarray/core/coordinates.py:528, in Coordinates.__setitem__(self, key, value)
527 def __setitem__(self, key: Hashable, value: Any) -> None:
--> 528 self.update({key: value})
File /env/lib/python3.10/site-packages/xarray/core/coordinates.py:554, in Coordinates.update(self, other)
546 # Discard original indexed coordinates prior to merge allows to:
547 # - fail early if the new coordinates don't preserve the integrity of existing
548 # multi-coordinate indexes
549 # - drop & replace coordinates without alignment (note: we must keep indexed
550 # coordinates extracted from the DataArray objects passed as values to
551 # `other` - if any - as those are still used for aligning the old/new coordinates)
552 coords_to_align = drop_indexed_coords(set(other_coords) & set(other), self)
--> 554 coords, indexes = merge_coords(
555 [coords_to_align, other_coords],
556 priority_arg=1,
557 indexes=coords_to_align.xindexes,
558 )
560 # special case for PandasMultiIndex: updating only its dimension coordinate
561 # is still allowed but depreciated.
562 # It is the only case where we need to actually drop coordinates here (multi-index levels)
563 # TODO: remove when removing PandasMultiIndex's dimension coordinate.
564 self._drop_coords(self._names - coords_to_align._names)
File /env/lib/python3.10/site-packages/xarray/core/merge.py:556, in merge_coords(objects, compat, join, priority_arg, indexes, fill_value)
554 _assert_compat_valid(compat)
555 coerced = coerce_pandas_values(objects)
--> 556 aligned = deep_align(
557 coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value
558 )
559 collected = collect_variables_and_indexes(aligned, indexes=indexes)
560 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
File /env/lib/python3.10/site-packages/xarray/core/alignment.py:952, in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid, fill_value)
949 else:
950 out.append(variables)
--> 952 aligned = align(
953 *targets,
954 join=join,
955 copy=copy,
956 indexes=indexes,
957 exclude=exclude,
958 fill_value=fill_value,
959 )
961 for position, key, aligned_obj in zip(positions, keys, aligned):
962 if key is no_key:
File /env/lib/python3.10/site-packages/xarray/core/alignment.py:888, in align(join, copy, indexes, exclude, fill_value, *objects)
692 """
693 Given any number of Dataset and/or DataArray objects, returns new
694 objects with aligned indexes and dimension sizes.
(...)
878
879 """
880 aligner = Aligner(
881 objects,
882 join=join,
(...)
886 fill_value=fill_value,
887 )
--> 888 aligner.align()
889 return aligner.results
File /env/lib/python3.10/site-packages/xarray/core/alignment.py:575, in Aligner.align(self)
573 self.assert_no_index_conflict()
574 self.align_indexes()
--> 575 self.assert_unindexed_dim_sizes_equal()
577 if self.join == "override":
578 self.override_indexes()
File /env/lib/python3.10/site-packages/xarray/core/alignment.py:476, in Aligner.assert_unindexed_dim_sizes_equal(self)
474 add_err_msg = ""
475 if len(sizes) > 1:
--> 476 raise ValueError(
477 f"cannot reindex or align along dimension {dim!r} "
478 f"because of conflicting dimension sizes: {sizes!r}" + add_err_msg
479 )
ValueError: cannot reindex or align along dimension 'y' because of conflicting dimension sizes: {16400, 16401} (note: an index is found along that dimension with size=16401) |
I did also try adding the ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[13], line 1
----> 1 with gw.open(
2 filenames,
3 band_names=['band1','band2','band3','band4','band5'],
4 stack_dim='band'
5 ) as src:
6 src
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/core/api.py:531, in open.__init__(self, filename, band_names, time_names, stack_dim, bounds, bounds_by, resampling, persist_filenames, netcdf_vars, mosaic, overlap, nodata, scale_factor, offset, dtype, scale_data, num_workers, **kwargs)
518 self.data = gw_mosaic(
519 filename,
520 overlap=overlap,
(...)
526 **kwargs,
527 )
529 else:
530 # Stack images along the 'time' axis
--> 531 self.data = gw_concat(
532 filename,
533 stack_dim=stack_dim,
534 bounds_by=bounds_by,
535 resampling=resampling,
536 time_names=time_names,
537 band_names=band_names,
538 nodata=nodata,
539 overlap=overlap,
540 dtype=dtype,
541 netcdf_vars=netcdf_vars,
542 **kwargs,
543 )
544 self.__data_are_stacked = True
546 self.__data_are_separate = True
File ~/venvs/geowombat-env/lib/python3.10/site-packages/geowombat/backends/xarray_.py:738, in concat(filenames, stack_dim, bounds_by, resampling, time_names, band_names, nodata, dtype, netcdf_vars, overlap, warp_mem_limit, num_threads, tap, **kwargs)
735 src.coords['time'] = parse_filename_dates(filenames)
737 if band_names:
--> 738 src.coords['band'] = band_names
739 else:
740 if src.gw.sensor:
File /env/lib/python3.10/site-packages/xarray/core/coordinates.py:528, in Coordinates.__setitem__(self, key, value)
527 def __setitem__(self, key: Hashable, value: Any) -> None:
--> 528 self.update({key: value})
File /env/lib/python3.10/site-packages/xarray/core/coordinates.py:566, in Coordinates.update(self, other)
560 # special case for PandasMultiIndex: updating only its dimension coordinate
561 # is still allowed but depreciated.
562 # It is the only case where we need to actually drop coordinates here (multi-index levels)
563 # TODO: remove when removing PandasMultiIndex's dimension coordinate.
564 self._drop_coords(self._names - coords_to_align._names)
--> 566 self._update_coords(coords, indexes)
File /env/lib/python3.10/site-packages/xarray/core/coordinates.py:834, in DataArrayCoordinates._update_coords(self, coords, indexes)
832 coords_plus_data = coords.copy()
833 coords_plus_data[_THIS_ARRAY] = self._data.variable
--> 834 dims = calculate_dimensions(coords_plus_data)
835 if not set(dims) <= set(self.dims):
836 raise ValueError(
837 "cannot add coordinates with new dimensions to a DataArray"
838 )
File /env/lib/python3.10/site-packages/xarray/core/variable.py:2997, in calculate_dimensions(variables)
2995 last_used[dim] = k
2996 elif dims[dim] != size:
-> 2997 raise ValueError(
2998 f"conflicting sizes for dimension {dim!r}: "
2999 f"length {size} on {k!r} and length {dims[dim]} on {last_used!r}"
3000 )
3001 return dims
ValueError: conflicting sizes for dimension 'band': length 60 on <this-array> and length 5 on {'x': 'x', 'y': 'y', 'band': 'band'} |
Comment out band_names, it should be the same length as the total number of bands in all images |
removing with gw.open(
filenames,
stack_dim='band'
) as src:
X, Xy, clf = fit(data = src, clf = pl, labels = training_polys, col='class')
|
Do you have polygons that exceed the bounds of the images? |
Do you have polygons that exceed the bounds of the images? Or that contain missing data? |
No, the polygons were explicitly drawn within this AOI. Doubling checking the bounding boxes for both the geodataframe and xarray stack check out as well
No, the dataframe is fully populated and all have
|
This is a weird one. I've never run into it and can't reproduce without the data. Can you try seeing 'ref_image' to the first image in your list? https://geowombat.readthedocs.io/en/latest/tutorial-config.html#reference-settings-image |
@WY-CGhilardi could you post the full transform ( |
Apologies, I see that you already did what I requested. I'll look into this and see if I can reproduce. |
I was able to reproduce with fake data, so I should have what I need to diagnose. import geowombat as gw
import dask.array as da
import numpy as np
import geopandas as gpd
import xarray as xr
from affine import Affine
from shapely.geometry import box
num_time = 12
num_bands = 5
num_rows = 16400
num_cols = 20596
res = (3.3217508595230054, 3.321750859522998)
left, bottom, right, top = (455303.8340072106, 5061403.560293189, 523715.29295908695, 5115876.952638507)
x = np.linspace(left, right, num_cols)
y = np.linspace(top, bottom, num_rows)
data = xr.DataArray(
da.random.random(
(num_time, num_bands, num_rows, num_cols), chunks=(-1, -1, 128, 128)
),
dims=('time', 'band', 'y', 'x'),
coords={
'time': range(1, num_time + 1),
'band': [f"band{b}" for b in range(1, num_bands + 1)],
'y': y,
'x': x,
},
attrs={
'transform': list(Affine(
res[0], 0.0, left, 0.0, -res[1], top
)),
'crs': 'EPSG:6339',
'res': res,
'nodatavals': (0.0, 0.0, 0.0, 0.0, 0.0),
'_FillValue': 0.0,
},
)
poly_left, poly_bottom, poly_right, poly_top = 472449.7271, 5063777.276, 510789.9898, 5106120.4848
poly1 = box(poly_left, poly_top - 1000, poly_left + 1000, poly_top)
poly2 = box(poly_right - 1000, poly_bottom, poly_right, poly_bottom + 1000)
training_polys = gpd.GeoDataFrame(
data=[0, 1],
columns=['class'],
geometry=[poly1, poly2],
crs='EPSG:6339',
) poly_array = gw.polygon_to_array(
training_polys,
col="class",
data=data,
bounds_by="union",
) print(data)
xarray.DataArray 'random_sample-b383b64724d8ff4e461e9131e8739640'
time: 12 band: 5 y: 16400 x: 20596
print(poly_array)
xarray.DataArray 'array-c8fefabbbbd90f2375e1a36a7114dc8c'
band: 1 y: 16399 x: 20596 |
I think this might just be a precision issue. The adjusted height was 16399.9999987, which was converted to 16399 instead of 16400 with |
I didn't see that geowombat exposed a src.gw.bounds
(455302.1731317809, 5061401.89941776, 523716.9538345167, 5115878.613513936)
##as compared with (455303.8340072106, 5061403.560293189, 523715.29295908695, 5115876.952638507) from above I was not able to successfully get the reference image config working correctly. The files I am accessing are remote on a cloud provider so not sure if that is the issue or something else.
|
I am still seeing #pip install git+https://github.com/jgrss/geowombat@jgrss/issue_310
<xarray.DataArray 'array-74bf0873a2d08172017317f9392aa551' (band: 1, y: 16399,
x: 20596)>
dask.array<array, shape=(1, 16399, 20596), dtype=uint8, chunksize=(1, 1, 20596), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.3217508595230054, 0.0, 455302.1731317809, 0.0, -3.32175085...
crs: EPSG:6339
res: (3.3217508595230054, 3.321750859522998)
is_tiled: 1 |
Hmm, geowombat |
Could you run your polygon conversion with the following and let me know what you get? with gw.config.update(ref_res=3.2175): and with gw.config.update(ref_res=3.22): |
Let me start off with a big thank you to both of you for working through this with me, I really appreciate it with gw.config.update(ref_res=3.2175):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
src
labelss = polygon_to_array( training_polys, 'class', src)
labelss
<xarray.DataArray 'array-eb576767189c8f0f6be73fcd77ffb4ee' (band: 1, y: 16930,
x: 21263)>
dask.array<array, shape=(1, 16930, 21263), dtype=uint8, chunksize=(1, 1, 20597), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.2175, 0.0, 455302.1731317809, 0.0, -3.2175, 5115878.613513...
crs: EPSG:6339
res: (3.2175, 3.2175)
is_tiled: 1 with gw.config.update(ref_res=3.22):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
src
labelss = polygon_to_array( training_polys, 'class', src)
labelss
<xarray.DataArray 'array-e7a6b86caea789cacaa30e07821d1756' (band: 1, y: 16917,
x: 21247)>
dask.array<array, shape=(1, 16917, 21247), dtype=uint8, chunksize=(1, 1, 20597), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.22, 0.0, 455302.1731317809, 0.0, -3.22, 5115878.613513936)
crs: EPSG:6339
res: (3.22, 3.22)
is_tiled: 1
Yes, the with gw.open(filenames[0]) as src:
print(src.gw.bounds)
(455302.1731317809, 5061401.89941776, 523716.9538345167, 5115878.613513936) |
Using the fake data generated above, the array size is print(data.shape)
print(data.gw.bounds)
(12, 5, 16400, 20596)
(455300.5122563511, 5061400.23854233, 523718.61470994644, 5115880.274389366) and the training polygon bounding box is print(training_polys.total_bounds.tolist())
[472449.7271, 5063777.276, 510789.9898, 5106120.4848] Those numbers should match what you have, as I simply copied what you posted. Using the latest ( poly_array = gw.polygon_to_array(
training_polys,
col="class",
data=data,
)
print(poly_array.shape)
(1, 16400, 20596) |
I'm not sure if it's a formatting/syntax error, but in your last post it didn't look like you wrapped the polygon conversion within the context. Are you running the following test? with gw.config.update(ref_res=3.2175):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
# This needs to be indented here in order to use the config resolution
labels = polygon_to_array(training_polys, col='class', data=src)
print(src)
print(labels) If that prints aligned shapes, can you try with gw.config.update(ref_res=3.2175):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
X, Xy, clf = fit(data=src, clf=pl, labels=training_polys, col='class') I don't have the same raster file you're using, but if I resample the array then the polygon array shape aligns. data_resamp = data.gw.transform_crs(dst_res=(3.2175, 3.2175))
poly_array = gw.polygon_to_array(
training_polys,
col="class",
data=data_resamp ,
)
print(data_resamp.shape)
(5, 16933, 21265)
print(poly_array.shape)
(1, 16933, 21265) |
That is just a formatting issue, I can confirm I still return mis-aligned arrays with the configs Using with gw.config.update(ref_res=3.2175):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
# This needs to be indented here in order to use the config resolution
labels = polygon_to_array(training_polys, col='class', data=src)
print(src) #16931 x 21264
print(labels) #16930 x 21263 <xarray.DataArray (time: 12, band: 5, y: 16931, x: 21264)>
dask.array<concatenate, shape=(12, 5, 16931, 21264), dtype=uint16, chunksize=(1, 5, 1, 20597), chunktype=numpy.ndarray>
Coordinates:
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* time (time) int64 1 2 3 4 5 6 7 8 9 10 11 12
* band (band) <U5 'band1' 'band2' 'band3' 'band4' 'band5'
Attributes: (12/14)
transform: (3.2175, 0.0, 455302.1731317809, 0.0, -3.2175, 51158...
crs: 6339
res: (3.2175, 3.2175)
is_tiled: 1
nodatavals: (0.0, 0.0, 0.0, 0.0, 0.0)
_FillValue: 0.0
... ...
filename: ['1_2023_6339.tif', '2_2023_6339.tif', '3_2023_6339....
resampling: nearest
AREA_OR_POINT: Area
COLORINTERP: Blue
_data_are_separate: 1
_data_are_stacked: 1
<xarray.DataArray 'array-eb576767189c8f0f6be73fcd77ffb4ee' (band: 1, y: 16930,
x: 21263)>
dask.array<array, shape=(1, 16930, 21263), dtype=uint8, chunksize=(1, 1, 20597), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.2175, 0.0, 455302.1731317809, 0.0, -3.2175, 5115878.613513...
crs: EPSG:6339
res: (3.2175, 3.2175)
is_tiled: 1
Using with gw.config.update(ref_res=3.22):
with gw.open(
filenames,
band_names=['band1','band2','band3','band4','band5']
) as src:
# This needs to be indented here in order to use the config resolution
labels = polygon_to_array(training_polys, col='class', data=src)
print(src) #16918 x 21247
print(labels) #16917 x 21247 <xarray.DataArray (time: 12, band: 5, y: 16918, x: 21247)>
dask.array<concatenate, shape=(12, 5, 16918, 21247), dtype=uint16, chunksize=(1, 5, 1, 20597), chunktype=numpy.ndarray>
Coordinates:
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* time (time) int64 1 2 3 4 5 6 7 8 9 10 11 12
* band (band) <U5 'band1' 'band2' 'band3' 'band4' 'band5'
Attributes: (12/14)
transform: (3.22, 0.0, 455302.1731317809, 0.0, -3.22, 5115878.6...
crs: 6339
res: (3.22, 3.22)
is_tiled: 1
nodatavals: (0.0, 0.0, 0.0, 0.0, 0.0)
_FillValue: 0.0
... ...
filename: ['1_2023_6339.tif', '2_2023_6339.tif', '3_2023_6339....
resampling: nearest
AREA_OR_POINT: Area
COLORINTERP: Blue
_data_are_separate: 1
_data_are_stacked: 1
<xarray.DataArray 'array-e7a6b86caea789cacaa30e07821d1756' (band: 1, y: 16917,
x: 21247)>
dask.array<array, shape=(1, 16917, 21247), dtype=uint8, chunksize=(1, 1, 20597), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 5.116e+06 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 4.553e+05 4.553e+05 4.553e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.22, 0.0, 455302.1731317809, 0.0, -3.22, 5115878.613513936)
crs: EPSG:6339
res: (3.22, 3.22)
is_tiled: 1 It seems that resampling does work to align the array. Note I just did this on a single image here, I was running into OOM errors with the full stack. with gw.open(filenames[0]) as src:
data_resamp = src.gw.transform_crs(dst_res=(3.2175, 3.2175))
poly_array = gw.polygon_to_array(
training_polys,
col="class",
data=data_resamp ,
)
print(data_resamp.shape)
print(poly_array.shape) (5, 16932, 21264)
(1, 16932, 21264) |
* simplify mosaic procedure * adding mosiac and save unit tests * fix path tifs * resolve ml install issues * handle 0s in X[targ_name] * add todo * reverting to main * pin scikit fix * add tests * throw error if stacked by time resolves #311 * droping test - meaningless output --------- Co-authored-by: jgrss <[email protected]>
I have been poking around this some more locally and noticed something this morning. For a given image, the returned NOTE this is a slightly clipped down version of the files discussed above for ease of working on laptop gdalinfo myfile1_clipped.tif
Driver: GTiff/GeoTIFF
Size is 16462, 16376
..... truncated Full gdalinfo output if helpful```bash gdalinfo myfile1_clipped.tif Driver: GTiff/GeoTIFF Size is 16462, 16376 Coordinate System is: PROJCRS["NAD83(2011) / UTM zone 10N", BASEGEOGCRS["NAD83(2011)", DATUM["NAD83 (National Spatial Reference System 2011)", ELLIPSOID["GRS 1980",6378137,298.257222101, LENGTHUNIT["metre",1]]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433]], ID["EPSG",6318]], CONVERSION["UTM zone 10N", METHOD["Transverse Mercator", ID["EPSG",9807]], PARAMETER["Latitude of natural origin",0, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8801]], PARAMETER["Longitude of natural origin",-123, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8802]], PARAMETER["Scale factor at natural origin",0.9996, SCALEUNIT["unity",1], ID["EPSG",8805]], PARAMETER["False easting",500000, LENGTHUNIT["metre",1], ID["EPSG",8806]], PARAMETER["False northing",0, LENGTHUNIT["metre",1], ID["EPSG",8807]]], CS[Cartesian,2], AXIS["(E)",east, ORDER[1], LENGTHUNIT["metre",1]], AXIS["(N)",north, ORDER[2], LENGTHUNIT["metre",1]], USAGE[ SCOPE["Engineering survey, topographic mapping."], AREA["United States (USA) - between 126┬░W and 120┬░W onshore and offshore - California; Oregon; Washington."], BBOX[30.54,-126,49.09,-119.99]], ID["EPSG",6339]] Data axis to CRS axis mapping: 1,2 Origin = (468984.716852614830714,5115798.602993652224541) Pixel Size = (3.321750859523004,-3.321750859523010) Metadata: AREA_OR_POINT=Area COLORINTERP=Blue Image Structure Metadata: INTERLEAVE=PIXEL Corner Coordinates: Upper Left ( 468984.717, 5115798.603) (123d24' 7.06"W, 46d11'42.21"N) Lower Left ( 468984.717, 5061401.611) (123d23'54.38"W, 45d42'19.70"N) Upper Right ( 523667.380, 5115798.603) (122d41'35.76"W, 46d11'43.27"N) Lower Right ( 523667.380, 5061401.611) (122d41'45.44"W, 45d42'20.75"N) Center ( 496326.048, 5088600.107) (123d 2'50.66"W, 45d57' 3.46"N) Band 1 Block=16462x1 Type=UInt16, ColorInterp=Gray Min=1.000 Max=10000.000 Minimum=1.000, Maximum=10000.000, Mean=311.924, StdDev=500.680 NoData Value=0 Metadata: STATISTICS_APPROXIMATE=YES STATISTICS_MAXIMUM=10000 STATISTICS_MEAN=311.92363571779 STATISTICS_MINIMUM=1 STATISTICS_STDDEV=500.68043708223 STATISTICS_VALID_PERCENT=80.34 Band 2 Block=16462x1 Type=UInt16, ColorInterp=Undefined Min=1.000 Max=10000.000 Minimum=1.000, Maximum=10000.000, Mean=373.399, StdDev=495.399 NoData Value=0 Metadata: STATISTICS_APPROXIMATE=YES STATISTICS_MAXIMUM=10000 STATISTICS_MEAN=373.39868300658 STATISTICS_MINIMUM=1 STATISTICS_STDDEV=495.39942964762 STATISTICS_VALID_PERCENT=80.34 Band 3 Block=16462x1 Type=UInt16, ColorInterp=Undefined Min=1.000 Max=9961.000 Minimum=1.000, Maximum=9961.000, Mean=366.853, StdDev=518.729 NoData Value=0 Metadata: STATISTICS_APPROXIMATE=YES STATISTICS_MAXIMUM=9961 STATISTICS_MEAN=366.8531565228 STATISTICS_MINIMUM=1 STATISTICS_STDDEV=518.72915463185 STATISTICS_VALID_PERCENT=80.34 Band 4 Block=16462x1 Type=UInt16, ColorInterp=Undefined NoData Value=0 Band 5 Block=16462x1 Type=UInt16, ColorInterp=Undefined NoData Value=0 ```with gw.open(
myfile1_clipped.tif
) as src:
print(src.shape)
(5, 16375, 16462) EDIT: I tried a vanilla rasterio read and that agrees with the GDAL outputs #check metadata import rasterio
with rasterio.open(filenames[0]) as rast_src:
print(rast_src.meta)
{'driver': 'GTiff', 'dtype': 'uint16', 'nodata': 0.0, 'width': 16462, 'height': 16376, 'count': 5, 'crs': CRS.from_epsg(6339), 'transform': Affine(3.321750859523004, 0.0, 468984.71685261483,
0.0, -3.3217508595230103, 5115798.602993652)} #actually read a band and report array shape with rasterio.open(filenames[0]) as rast_src:
arr = rast_src.read(1)
print(arr.shape)
(16376, 16462) EDIT 2: rioxarray matches with raw rasterio da = rioxarray.open_rasterio(filenames[0] )
da.shape
(5, 16376, 16462) |
Sorry for the delay @WY-CGhilardi. I have a random data clone of your image, with
With changes in #319, the opened array results in: with gw.open("test.tif") as src:
print(src)
<xarray.DataArray (band: 5, y: 16376, x: 16462)> Size: 1GB
dask.array<open_rasterio-fa4e67f0ae8d81c28193cae47ef688b7<this-array>, shape=(5, 16376, 16462), dtype=uint8, chunksize=(5, 256, 256), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 40B 1 2 3 4 5
* x (x) float64 132kB 4.69e+05 4.69e+05 ... 5.237e+05 5.237e+05
* y (y) float64 131kB 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
Attributes: (12/13)
transform: (3.321750859523004, 0.0, 468984.717, 0.0, -3.3217508...
crs: 32610
res: (3.321750859523004, 3.32175085952301)
is_tiled: 1
nodatavals: (0.0, 0.0, 0.0, 0.0, 0.0)
_FillValue: 0.0
... ...
offsets: (0.0, 0.0, 0.0, 0.0, 0.0)
filename: test.tif
resampling: nearest
AREA_OR_POINT: Area
_data_are_separate: 0
_data_are_stacked: 0 |
I pulled down the with gw.open(
filenames,
) as src:
print(src)
<xarray.DataArray (time: 12, band: 5, y: 16376, x: 16462)> Size: 32GB
dask.array<concatenate, shape=(12, 5, 16376, 16462), dtype=uint16, chunksize=(1, 5, 1, 16462), chunktype=numpy.ndarray>
Coordinates:
* x (x) float64 132kB 4.69e+05 4.69e+05 ... 5.237e+05 5.237e+05
* y (y) float64 131kB 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* time (time) int64 96B 1 2 3 4 5 6 7 8 9 10 11 12
* band (band) int64 40B 1 2 3 4 5
Attributes: (12/14)
transform: (3.321750859523004, 0.0, 468984.71685261483, 0.0, -3...
crs: 6339
res: (3.321750859523004, 3.3217508595230103)
is_tiled: 1
nodatavals: (0.0, 0.0, 0.0, 0.0, 0.0)
_FillValue: 0.0
... ...
filename: ['myfile1.tif', 'myfile2.tif...
resampling: nearest
AREA_OR_POINT: Area
COLORINTERP: Blue
_data_are_separate: 1
_data_are_stacked: 1 strangely, the resulting labels = polygon_to_array( training_polys,'class', src)
print(labels)
<xarray.DataArray 'array-010aa44b3977375e822586d17ce8e9a8' (band: 1, y: 16375,
x: 16462)> Size: 270MB
dask.array<array, shape=(1, 16375, 16462), dtype=uint8, chunksize=(1, 1, 16462), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 8B 1
* y (y) float64 131kB 5.116e+06 5.116e+06 ... 5.061e+06 5.061e+06
* x (x) float64 132kB 4.69e+05 4.69e+05 ... 5.237e+05 5.237e+05
Attributes:
transform: (3.321750859523004, 0.0, 468984.71685261483, 0.0, -3.32175085...
crs: EPSG:6339
res: (3.321750859523004, 3.3217508595230103)
is_tiled: 1 |
Version info:
$ pip list | grep geowombat geowombat 2.1.19
Originally ran into this issue when following this example in the docs
I have a time series of 5 banded imagery
This returns the following array with dimensions
20596
by16400
I have a
geodataframe
I am loading from ageopackage
When I try to fit a model, the resulting rasterized array is short on the y axis by one row
Looking through function definitions I was able to reproduce the mismatch with the
polygon_to_array
function. This returns a20596
by16399
dimension arrayI realize I am not explicitly setting the
ref_res
config parameter like in the example, but it seems to be trying to just matchsrc
parameters, which seems like perfectly reasonable behavior. I am not sure if this just fails because of the non-square resolution.The text was updated successfully, but these errors were encountered: