Ćwiczenie 1 - Wprowadzenie¶
In [1]:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib.colors as clt
import geodatasets
/tmp/ipykernel_10242/1167241068.py:1: DeprecationWarning: Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0), (to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries) but was not found to be installed on your system. If this would cause problems for you, please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466 import pandas as pd
In [2]:
#get path is imported from geodatasets library
from geodatasets import get_path
path_to_data = get_path("nybb")
gdf = gpd.read_file(path_to_data)
#gdf.head()
gdf.plot(column = 'BoroName')
Out[2]:
<Axes: >
Operacje na danych wektorowych¶
In [3]:
#calculate areas
gdf["area"] = gdf.area
gdf["area"]
Out[3]:
0 1.623822e+09 1 3.045214e+09 2 1.937478e+09 3 6.364712e+08 4 1.186926e+09 Name: area, dtype: float64
In [4]:
cn = gdf.centroid
cn.plot()
Out[4]:
<Axes: >
In [5]:
gdf["bfr"] = gdf.buffer(10000)
gdf["bfr"].plot()
Out[5]:
<Axes: >
Ujednolicenie układu współrzędnych¶
In [6]:
pkt = gpd.read_file("data/ny.gpkg", layer = "supermarket", driver = "GPKG")
ax = gdf["geometry"].plot()
pkt.plot(ax=ax, color="black")
Out[6]:
<Axes: >
In [7]:
gdf.crs
Out[7]:
<Projected CRS: EPSG:2263> Name: NAD83 / New York Long Island (ftUS) Axis Info [cartesian]: - X[east]: Easting (US survey foot) - Y[north]: Northing (US survey foot) Area of Use: - name: United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk. - bounds: (-74.26, 40.47, -71.8, 41.3) Coordinate Operation: - name: SPCS83 New York Long Island zone (US Survey feet) - method: Lambert Conic Conformal (2SP) Datum: North American Datum 1983 - Ellipsoid: GRS 1980 - Prime Meridian: Greenwich
W jakim układzie współrzędnych są dane punktowe (pkt)?
In [8]:
#assign to pkt crs from gdf
pkt = pkt.to_crs(gdf.crs)
In [9]:
ax = gdf["geometry"].plot()
pkt.plot(ax=ax, color="black", markersize = 0.5)
Out[9]:
<Axes: >
Złączenia przestrzenne¶
In [10]:
gdf
Out[10]:
BoroCode | BoroName | Shape_Leng | Shape_Area | geometry | area | bfr | |
---|---|---|---|---|---|---|---|
0 | 5 | Staten Island | 330470.010332 | 1.623820e+09 | MULTIPOLYGON (((970217.022 145643.332, 970227.... | 1.623822e+09 | POLYGON ((903234.894 123347.784, 903178.057 12... |
1 | 4 | Queens | 896344.047763 | 3.045213e+09 | MULTIPOLYGON (((1029606.077 156073.814, 102957... | 3.045214e+09 | POLYGON ((1066963.473 157602.686, 1067059.264 ... |
2 | 3 | Brooklyn | 741080.523166 | 1.937479e+09 | MULTIPOLYGON (((1021176.479 151374.797, 102100... | 1.937478e+09 | POLYGON ((962679.120 165570.385, 962651.330 16... |
3 | 1 | Manhattan | 359299.096471 | 6.364715e+08 | MULTIPOLYGON (((981219.056 188655.316, 980940.... | 6.364712e+08 | POLYGON ((980499.119 178448.735, 979864.868 17... |
4 | 2 | Bronx | 464392.991824 | 1.186925e+09 | MULTIPOLYGON (((1012821.806 229228.265, 101278... | 1.186926e+09 | POLYGON ((992724.911 240962.362, 992700.941 24... |
In [11]:
pkt.head()
Out[11]:
full_id | brand | name | geometry | |
---|---|---|---|---|
0 | n419360013 | None | Waverly Urban Market | POINT (993552.584 189584.172) |
1 | n445184616 | Stop & Shop | Stop & Shop | POINT (1196535.721 368631.849) |
2 | n502791662 | None | Yong Fa | POINT (1032646.155 210176.107) |
3 | n566894297 | None | None | POINT (1022173.126 240706.578) |
4 | n568230825 | None | Aron's Kissena Farms | POINT (1035770.502 204714.347) |
In [12]:
sel = gdf[["BoroCode", "BoroName", "geometry"]]
pkt_with_ny = pkt.sjoin(sel, how="inner", predicate='intersects')
pkt_with_ny.head()
Out[12]:
full_id | brand | name | geometry | index_right | BoroCode | BoroName | |
---|---|---|---|---|---|---|---|
0 | n419360013 | None | Waverly Urban Market | POINT (993552.584 189584.172) | 2 | 3 | Brooklyn |
2 | n502791662 | None | Yong Fa | POINT (1032646.155 210176.107) | 1 | 4 | Queens |
3 | n566894297 | None | None | POINT (1022173.126 240706.578) | 4 | 2 | Bronx |
4 | n568230825 | None | Aron's Kissena Farms | POINT (1035770.502 204714.347) | 1 | 4 | Queens |
5 | n584125647 | Whole Foods Market | Whole Foods Market | POINT (986644.761 206981.429) | 3 | 1 | Manhattan |
Punkty w poligonie¶
In [13]:
supermakets_in_boro = pkt_with_ny.groupby('BoroCode').count()
supermakets_in_boro
Out[13]:
full_id | brand | name | geometry | index_right | BoroName | |
---|---|---|---|---|---|---|
BoroCode | ||||||
1 | 223 | 68 | 216 | 223 | 223 | 223 |
2 | 65 | 21 | 55 | 65 | 65 | 65 |
3 | 231 | 47 | 220 | 231 | 231 | 231 |
4 | 259 | 65 | 248 | 259 | 259 | 259 |
5 | 1 | 0 | 1 | 1 | 1 | 1 |
In [14]:
#złączenie przestrzenne oraz obliczenie punktów w poligonie
sel = gdf[["BoroCode", "BoroName", "geometry"]]
nb_pkt = pkt.sjoin(sel, how="inner", predicate='intersects').groupby('BoroCode').count()
nb_pkt["counts"] = nb_pkt["full_id"]
nb_pkt
Out[14]:
full_id | brand | name | geometry | index_right | BoroName | counts | |
---|---|---|---|---|---|---|---|
BoroCode | |||||||
1 | 223 | 68 | 216 | 223 | 223 | 223 | 223 |
2 | 65 | 21 | 55 | 65 | 65 | 65 | 65 |
3 | 231 | 47 | 220 | 231 | 231 | 231 | 231 |
4 | 259 | 65 | 248 | 259 | 259 | 259 | 259 |
5 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
Złączenie na podstawie atrybutów¶
In [15]:
gdf2 = gdf
gdf2.set_index("BoroCode", drop=True)
Out[15]:
BoroName | Shape_Leng | Shape_Area | geometry | area | bfr | |
---|---|---|---|---|---|---|
BoroCode | ||||||
5 | Staten Island | 330470.010332 | 1.623820e+09 | MULTIPOLYGON (((970217.022 145643.332, 970227.... | 1.623822e+09 | POLYGON ((903234.894 123347.784, 903178.057 12... |
4 | Queens | 896344.047763 | 3.045213e+09 | MULTIPOLYGON (((1029606.077 156073.814, 102957... | 3.045214e+09 | POLYGON ((1066963.473 157602.686, 1067059.264 ... |
3 | Brooklyn | 741080.523166 | 1.937479e+09 | MULTIPOLYGON (((1021176.479 151374.797, 102100... | 1.937478e+09 | POLYGON ((962679.120 165570.385, 962651.330 16... |
1 | Manhattan | 359299.096471 | 6.364715e+08 | MULTIPOLYGON (((981219.056 188655.316, 980940.... | 6.364712e+08 | POLYGON ((980499.119 178448.735, 979864.868 17... |
2 | Bronx | 464392.991824 | 1.186925e+09 | MULTIPOLYGON (((1012821.806 229228.265, 101278... | 1.186926e+09 | POLYGON ((992724.911 240962.362, 992700.941 24... |
In [16]:
gdf2_attr = gdf2.join(nb_pkt["counts"])
gdf2_attr
Out[16]:
BoroCode | BoroName | Shape_Leng | Shape_Area | geometry | area | bfr | counts | |
---|---|---|---|---|---|---|---|---|
0 | 5 | Staten Island | 330470.010332 | 1.623820e+09 | MULTIPOLYGON (((970217.022 145643.332, 970227.... | 1.623822e+09 | POLYGON ((903234.894 123347.784, 903178.057 12... | NaN |
1 | 4 | Queens | 896344.047763 | 3.045213e+09 | MULTIPOLYGON (((1029606.077 156073.814, 102957... | 3.045214e+09 | POLYGON ((1066963.473 157602.686, 1067059.264 ... | 223.0 |
2 | 3 | Brooklyn | 741080.523166 | 1.937479e+09 | MULTIPOLYGON (((1021176.479 151374.797, 102100... | 1.937478e+09 | POLYGON ((962679.120 165570.385, 962651.330 16... | 65.0 |
3 | 1 | Manhattan | 359299.096471 | 6.364715e+08 | MULTIPOLYGON (((981219.056 188655.316, 980940.... | 6.364712e+08 | POLYGON ((980499.119 178448.735, 979864.868 17... | 231.0 |
4 | 2 | Bronx | 464392.991824 | 1.186925e+09 | MULTIPOLYGON (((1012821.806 229228.265, 101278... | 1.186926e+09 | POLYGON ((992724.911 240962.362, 992700.941 24... | 259.0 |
Typ geometrii¶
Zbiór danych może zawierać kilka kolumn definiujących geometrię obiektów. Funkcja set_geometry ustawia, która kolumna będzie definiować geometrię.
In [17]:
#do gdf dodam kolumne w ktorej wylicze centroidy
gdf['centroid'] = gdf.centroid
gdf['boundary'] = gdf.boundary
gdf
Out[17]:
BoroCode | BoroName | Shape_Leng | Shape_Area | geometry | area | bfr | centroid | boundary | |
---|---|---|---|---|---|---|---|---|---|
0 | 5 | Staten Island | 330470.010332 | 1.623820e+09 | MULTIPOLYGON (((970217.022 145643.332, 970227.... | 1.623822e+09 | POLYGON ((903234.894 123347.784, 903178.057 12... | POINT (941639.450 150931.991) | MULTILINESTRING ((970217.022 145643.332, 97022... |
1 | 4 | Queens | 896344.047763 | 3.045213e+09 | MULTIPOLYGON (((1029606.077 156073.814, 102957... | 3.045214e+09 | POLYGON ((1066963.473 157602.686, 1067059.264 ... | POINT (1034578.078 197116.604) | MULTILINESTRING ((1029606.077 156073.814, 1029... |
2 | 3 | Brooklyn | 741080.523166 | 1.937479e+09 | MULTIPOLYGON (((1021176.479 151374.797, 102100... | 1.937478e+09 | POLYGON ((962679.120 165570.385, 962651.330 16... | POINT (998769.115 174169.761) | MULTILINESTRING ((1021176.479 151374.797, 1021... |
3 | 1 | Manhattan | 359299.096471 | 6.364715e+08 | MULTIPOLYGON (((981219.056 188655.316, 980940.... | 6.364712e+08 | POLYGON ((980499.119 178448.735, 979864.868 17... | POINT (993336.965 222451.437) | MULTILINESTRING ((981219.056 188655.316, 98094... |
4 | 2 | Bronx | 464392.991824 | 1.186925e+09 | MULTIPOLYGON (((1012821.806 229228.265, 101278... | 1.186926e+09 | POLYGON ((992724.911 240962.362, 992700.941 24... | POINT (1021174.790 249937.980) | MULTILINESTRING ((1012821.806 229228.265, 1012... |
In [18]:
gdf.plot()
Out[18]:
<Axes: >
In [19]:
#zmieniam geometrie na punkty
gdf = gdf.set_geometry("centroid")
gdf.plot()
Out[19]:
<Axes: >
In [20]:
gdf["bfr_centroid"] = gdf.buffer(5000)
gdf["bfr_centroid"].plot(color = "green")
Out[20]:
<Axes: >
In [21]:
gdf = gdf.set_geometry("boundary")
gdf.plot()
Out[21]:
<Axes: >
In [22]:
#ustawiam geometrię z powrotem na warstwę poligonową
gdf = gdf.set_geometry("geometry")
gdf.plot()
Out[22]:
<Axes: >
Wyświetlanie¶
In [23]:
gdf.plot(column = "area", cmap="Reds", legend = True)
Out[23]:
<Axes: >
Wyświetlanie kilku warstw¶
In [24]:
#ax = ax okresla co bedzie wyswietlone w podkladzie, wyswietli poligony a na nie nalozy punkty
ax = gdf["geometry"].plot()
gdf["centroid"].plot(ax = ax, color = "black")
Out[24]:
<Axes: >
In [25]:
ax = gdf["bfr"].plot(color = "grey")
gdf["boundary"].plot(ax=ax, color="white", linewidth = 0.3)
gdf["bfr_centroid"].plot(ax=ax, color="green")
gdf["centroid"].plot(ax=ax, color="red")
Out[25]:
<Axes: >
Kolejność wyświetlania warstw¶
Przy wyświetlaniu wielu warstw warto użyć parametru zorder, który wskazuje kolejność wyświetlania warstw. W poniższym przykładzie przypisz zorder=3 do bfr_centroid oraz zorder=4 do zmiennej centroid.
In [26]:
ax = gdf["bfr"].plot(color = "grey")
gdf["boundary"].plot(ax=ax, color="white", linewidth = 0.3, zorder=2)
gdf["bfr_centroid"].plot(ax=ax, color="green", zorder=3)
gdf["centroid"].plot(ax=ax, color="red", zorder=4)
Out[26]:
<Axes: >
Zapisywanie do pliku¶
In [31]:
ax = gdf["bfr"].plot(color = "grey")
gdf["boundary"].plot(ax=ax, color="white", linewidth = 0.3, zorder=2)
gdf["bfr_centroid"].plot(ax=ax, color="green", zorder=3)
gdf["centroid"].plot(ax=ax, color="red", zorder=4)
plt.savefig('fig1.png', dpi=300)
In [27]:
## Mapy interaktywne
gdf.explore()
Out[27]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [28]:
gdf.explore(
column="BoroName", # make choropleth based on "BoroName" column
tooltip="BoroName", # show "BoroName" value in tooltip (on hover)
popup=True, # show all values in popup (on click)
tiles="CartoDB positron", # use "CartoDB positron" tiles
cmap="Set1", # use "Set1" matplotlib colormap
style_kwds=dict(color="black"), # use black outline
)
Out[28]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Zadanie samodzielne¶
- Z warstwy ny.gpkg wczytaj warstwę punktową theatre.
- Ujednolić układ współrzędnych między wartstą theatre a warstwą poligonową gdf (granice dzielnic w NYC).
- Połącz obie warstwy
- W której dzielnicy znajduje się najwięcej, a w której najmniej teatrów?
- Przedstaw liczbę teatrów w poszczególnych dzielnicach na mapie. Upewnij się, że wszystkie dzielnice są wyświetlone na mapie.