Spatial Relationships and Join#

Spatial Predicates#

Spatial predicates are functions that define the relationships between geometric objects.
They help answer questions like: Do these features intersect? Is one contained within another? Do they just touch at the boundary?

These relationships are essential for spatial analysis, especially when working with spatial joins, where attributes from one layer are assigned to features in another based on how they relate spatially.

Common Spatial Predicates#

Spatial predicates are functions that evaluate geometric relationships between two shapes (points, lines, polygons).
They are the foundation for spatial queries, filtering, and joins.

Here’s a table of the most commonly used binary predicates, with explanations:

Predicate

Description

intersects

Returns True if geometries share at least one point — they overlap in any way (borders or interiors). Equivalent to not disjoint.

disjoint

Returns True if geometries do not share any points — they are completely separated in space.

within

Returns True if geometry A is completely inside geometry B — all points of A (interior and boundary) lie within B. Equivalent to B.contains(A).

contains

Returns True if geometry A completely contains geometry B — no part of B lies outside A, and at least one point of B is strictly inside A. Equivalent to B.within(A).

touches

Returns True if geometries touch only at the boundary, without any overlap in interior space.

overlaps

Returns True if geometries partially overlap — they share an area but neither one fully contains the other. (Only applies to same-type geometries like polygon-polygon).

crosses

Returns True if geometries cross each other — they intersect in a way where the result has a lower dimension. For example, a line crossing through a polygon or two lines intersecting at a point.

Understanding spatial predicates helps you write more precise spatial queries and perform meaningful spatial joins and filtering based on how features relate to each other in space.

НАРИСОВАТЬ СВОИ КАРТИНКИ?

Spatial Join#

A spatial join combines two layers by attaching the attributes of one layer to another, based on their spatial relationship — for example, assigning each point to the district it falls within, or checking whether a feature lies inside a buffer zone.

But a spatial join is more than just attaching data where geometries intersect.
To make these joins meaningful and accurate, you need to understand the type of spatial relationship between features:
Are they overlapping? One inside another? Do they only touch at the edge?

That’s where spatial predicates come in — they’re the core tools that let us describe and test topological relationships between geometries.

Прочитаем данные о ДТП

accidents = gpd.read_file('data/spb_dtp.gpkg')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 accidents = gpd.read_file('data/spb_dtp.gpkg')

NameError: name 'gpd' is not defined

Пересечем данные о ДТП с зонами доступности школ

accidents_utm = accidents.to_crs(target_crs)


accidents_in_school_zones = gpd.sjoin(
    accidents_utm,
    schools_buffer,
    how="inner",
    predicate="intersects"
)


accidents_count = accidents_in_school_zones.groupby('osmid').size()

accidents_count = accidents_count.reset_index()
accidents_count.columns = ['osmid', 'acc_count']

schools_utm_acc = schools_utm.merge(accidents_count, on='osmid',  how='left')
schools_utm_acc.explore(column='acc_count',tiles='cartodbpositron')

Spatial Filtering#

One of the most common tasks in spatial analysis is selecting features that meet a specific spatial condition.
For example, you might want to select all points that fall within a given polygon — this is known as spatial filtering, or location-based filtering.

It allows you to narrow down your data based on spatial relationships, such as proximity, containment, or intersection.

Let’s try it in practice:
We’ll filter all cafés in Saint Petersburg that are located within 500 meters of a metro station.

Подготовим данные

#Прочитаем данные о станциях метро
metro = gpd.read_file('data/spb_metro.geojson')

Построим буферы 500-метровой доступности вокруг станций (предварительно проверим систему координат и перепроецируем, если нужно)

metro.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Данные в географической системе координат, нам это не подходит, поэтому перепроецируем их перед тем как строить буферы

utm_crs = metro.estimate_utm_crs()   # автоматически определяем UTM-зону
metro_utm = metro.to_crs(utm_crs)  

Построим буферы

metro_buff_500 = metro_utm.buffer(500)
metro_buff_500.explore(tiles='CartoDB positron')
Make this Notebook Trusted to load map: File -> Trust Notebook

Выгрузим информацию о кафе из OSM, отфильтруем только точки и перепроецируем

tags = {"amenity": "cafe"}
cafes = ox.features_from_place("Санкт-Петербург, Россия", tags)

#фильтрация точек
cafes = cafes[cafes.geometry.type == 'Point']

#перепроецирование
cafes_utm = cafes.to_crs(utm_crs) 


cafes_utm.explore(tiles='CartoDB positron')
Make this Notebook Trusted to load map: File -> Trust Notebook

Теперь отфильтруем только те кафе, которые строго находятся внутри 500-метровых буферов вокруг станций метро

# Создаем один общий буфер (объединение всех зон)
combined_buffer = metro_buff_500.unary_union

# Фильтруем кафе внутри общего буфера
cafes_within = cafes_utm[cafes_utm.geometry.within(combined_buffer)]
/var/folders/ry/9bb7wrz54vq_kn2ytlj6ynzm0000gn/T/ipykernel_43255/1029581691.py:2: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  combined_buffer = metro_buff_500.unary_union
cafes_within.explore(tiles='CartoDB positron')
Make this Notebook Trusted to load map: File -> Trust Notebook

Мы рассмотрели ещё один важный приём пространственного анализа – фильтрацию объектов по принадлежности области. Он опирается на предикаты пространственных отношений (within, contains, и др.) и позволяет извлекать информацию об объектах в заданных границах. Это очень мощный инструмент, который используется в самых разных задачах ГИС, от простых запросов («найти все объекты внутри…») до сложных географических вычислений.

2. Overlay (Intersection)#

An overlay operation allows you to combine multiple spatial layers and analyze how their geometries interact.
One of the most commonly used types of overlay is intersection.

The intersection operation returns the shared area where geometries from two different layers overlap.
This is particularly useful when you want to identify or analyze zones of overlap — for example, finding where flood zones intersect with residential areas, or where parks intersect with administrative boundaries.

Overlay operations help answer questions like:

  • Which features fall within a certain area?

  • What regions are affected by overlapping conditions?

Just like with buffering, make sure the coordinate systems of both layers are compatible (ideally in the same projected CRS) before performing overlay operations.

Пересечение буферов и зданий, для выделения тех, которые попадают в зону доступности школ

building_schools_intersection = gpd.overlay(building_utm, schools_buffer, how='intersection')

building_schools_intersection.explore(tiles='cartodbpositron')

Смотрим на таблицу атрибутов получившегося результата

building_schools_intersection.head()

Важно! Тут есть дабликаты, так кака одно здание может попадать в несколько зон доступности

Посчитаем количество объектов и количество уникальных объектов:

total_count = len(building_schools_intersection)
print(total_count)

unique_count = building_schools_intersection['osmid_1'].nunique()
print(unique_count)
# Проверка на дубликаты по полю 'building_id'
duplicates = building_schools_intersection[building_schools_intersection.duplicated(subset='osmid_1', keep=False)]

# Подсчёт количества дубликатов
duplicates_count = duplicates['osmid_1'].nunique()

# Подсчёт количества дубликатов для каждого ID
duplicate_counts_byID = duplicates['osmid_1'].value_counts()

# Распределение: сколько зданий имеет конкретное количество повторов
duplicates_distribution = duplicate_counts_byID.value_counts().sort_index()

# Построение гистограммы
plt.figure(figsize=(10, 6))
plt.bar(duplicates_distribution.index, duplicates_distribution.values, color='skyblue', edgecolor='black')
plt.title('Распределение зданий по количеству дубликатов', fontsize=14)
plt.xlabel('Количество повторов для одного здания', fontsize=12)
plt.ylabel('Количество зданий', fontsize=12)
plt.xticks(duplicates_distribution.index)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

Для каждого дома агрегируем данные и записываем, какие именно школы попадают в зону 500-метровой доступности

# Группируем по идентификатору дома и создаем список школ для каждого дома
building_schools_aggregated = building_schools_intersection.groupby('osmid_1').agg({
    'name_2': lambda x: list(x),  # Создаем список школ, с которыми пересекается дом
    'geometry': 'first'  # Берем первую геометрию для дома (будет одинаковой для всех дубликатов)
}).reset_index()

# Преобразуем обратно в GeoDataFrame
building_schools_aggregated = gpd.GeoDataFrame(building_schools_aggregated, geometry='geometry', crs=target_crs)

# Проверяем результат
building_schools_aggregated.head()
building_schools_aggregated.explore(tiles='cartodbpositron')

Summary#