Basic Geoprocessing Tools

Basic Geoprocessing Tools#

Geoprocessing refers to a set of tools used to analyze and modify geospatial data. These tools allow you to:

Perform spatial operations
Analyze spatial relationships between features
Create new datasets based on existing ones

In this section, we’ll explore several commonly used geoprocessing tools:

Buffering – creating zones around features at a specified distance
Clip – trimming one dataset using the shape of another
Dissolve – merging geometries based on a shared attribute

Import libraries#

import pandas as pd
import geopandas as gpd
import osmnx as ox

pandas (pandas) — a powerful Python library for data analysis and manipulation. It provides easy-to-use data structures, such as DataFrame, which is ideal for working with tabular (non-spatial) data like CSV files, spreadsheets, or database tables.
GeoPandas (geopandas) — an extension of pandas that makes working with geospatial data easy. It builds on the familiar DataFrame structure and adds support for spatial operations, geometry columns, and reading/writing spatial file formats like Shapefile, GeoJSON, and GeoPackage.

1. Buffering#

Buffering creates an area around geometric objects at a specified distance.
It’s useful for identifying zones of influence — for example, areas around bus stops, roads, or buildings.

A buffer zone is simply the area surrounding a geometry (point, line, or polygon) within a given distance. Buffers are commonly used to highlight nearby surroundings or define areas of impact.

Important note about coordinate systems:
If your data is in a geographic CRS (latitude/longitude), the buffer distance is interpreted in degrees, not meters.
To create realistic distance-based buffers (e.g., 500 meters), you should first reproject your data into a projected CRS (like UTM), where the units are in meters.

Let’s create a 500-meter buffer around schools

tags = {'amenity': 'school'}   

schools = ox.features_from_place('Центральный район, Санкт-Петербург', tags) 

schools_utm = schools.to_crs(schools.estimate_utm_crs())

schools_utm['buffer_geometry'] = schools_utm.buffer(500)

# Геодатафрейм с сохраненными атрибутами
schools_buffer = gpd.GeoDataFrame(schools_utm, geometry='buffer_geometry', crs=schools_utm.crs)

schools_buffer.explore(tiles='cartodbpositron')

Make this Notebook Trusted to load map: File -> Trust Notebook

2. Clip#

The clip operation is used to extract portions of geometries that fall within a specified boundary.
It’s similar to using a cookie cutter — you “cut out” the parts of one layer that lie within the shape of another.

For example, you might clip a layer of roads or land cover to the boundary of a specific region or administrative unit. This helps limit your data to just the area of interest, making maps and analysis more focused and efficient.

Clipping is especially useful when working with large datasets and you only need to analyze a specific geographic subset.

⚠️ Just like with overlay operations, both layers should be in the same coordinate reference system before performing a clip.

#Создадим точку - Сквер Низами

point_df = pd.DataFrame({"name": 'Сквер Низами', 'lat':59.963768, 'lon': 30.314455}, [1])
point_gdf_utm = gpd.GeoDataFrame(point_df, geometry=gpd.points_from_xy(point_df['lon'], point_df['lat']), crs=4326).to_crs(target_crs)

buffer_500_point = point_gdf_utm.buffer(800)
clipped_buildings = building_utm.clip(buffer_500_point)

clipped_buildings.explore(tiles='cartodbpositron')

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 4
      1 #Создадим точку - Сквер Низами
      3 point_df = pd.DataFrame({"name": 'Сквер Низами', 'lat':59.963768, 'lon': 30.314455}, [1])
----> 4 point_gdf_utm = gpd.GeoDataFrame(point_df, geometry=gpd.points_from_xy(point_df['lon'], point_df['lat']), crs=4326).to_crs(target_crs)
      7 buffer_500_point = point_gdf_utm.buffer(800)
      8 clipped_buildings = building_utm.clip(buffer_500_point)

NameError: name 'target_crs' is not defined

3. Dissolve#

The dissolve operation merges geometries based on shared values in a specified attribute column.
It’s commonly used to combine adjacent features that belong to the same category — for example, merging individual districts into a single administrative region.

When you dissolve features, all geometries with the same value in the chosen column are grouped and combined into one. This reduces the number of features and can simplify your dataset for higher-level analysis or visualization.

Typical use cases include:

Combining neighborhoods into boroughs
Merging land parcels by owner
Aggregating zones by land use type

You can also aggregate attribute data (e.g., sum or mean) while dissolving, making it a powerful tool for both geometry and attribute-level simplification.

muni = gpd.read_file('data/admin_borders.gpkg', layer='muni_borders')

muni.head()

regions = muni.dissolve(by='region')
regions.head()

fo = muni.dissolve(by='fo', aggfunc={
        'pop': 'sum',
    })
fo.head()

# Determine the map center based on one of the GeoDataFrames (e.g., muni)
map_center = fo.geometry.unary_union.centroid

# Create a Folium map centered on the area
m = folium.Map(zoom_start=10, tiles='cartodbpositron')

# # Add `muni` layer
# folium.GeoJson(
#     muni,
#     name="Municipalities",
#     style_function=lambda x: {'color': 'blue', 'weight': 1, 'fillColor': 'blue', 'fillOpacity': 0.2}
# ).add_to(m)

# Add `region` layer
folium.GeoJson(
    regions,
    name="Regions",
).add_to(m)

# Add `fo` layer
folium.GeoJson(
    fo,
    name="Federal Okrugs",
    style_function=lambda x: {
        'color': 'gray',       
        'weight': 3,             
        'fillOpacity': 0      
    }
).add_to(m)

# Add a layer control for toggling layers
folium.LayerControl().add_to(m)


#Смотрим на карту
m