Table Detection Pipeline#
This document describes the internal detection pipeline used by
SheetTableDetector to find tables in technical drawings. It is intended
for developers working on or extending the detection logic.
Note
For user-facing documentation on working with detected tables, see Table Navigation.
Pipeline Overview#
For each view in the sheet, detect_view_tables finds tables through three
independent branches:
tables_from_tables — collects existing
Tableannotation objects (including those nested in CompositeEntity annotations).tables_from_notes — converts
Symbolentities (TypeNote with a rectangular frame) into single-cellDetectedTableinstances.tables_from_geometries — the full edge-based detection pipeline, itself composed of three phases:
Phase 1 — Candidate Preparation (
CandidatePreparationConfig): extract H/V line segments from CompositeEntity objects and apply connectivity density filtering. For the background view, direct geometries are also included (after excluding reference frame lines).Phase 2 — Table Generation (
TableGenerationConfig): build connected components from filtered edges, detect cells in each structure, produce rawDetectedTableobjects.Phase 3 — Table Validation (
TableSelectionConfig): validate each raw table (cell existence, sheet coverage ratio, content ratio) to produce the final validated geometry tables.
The results of all three branches are merged and returned. Finally, the title block is identified among validated background-view tables as the one closest to the bottom-right corner.
All visualizations below are generated from the GICLEUR drawing (sheet 0)
using TableDetectionVisualizer.
Starting point: the raw drawing sheet with all views, geometries, and annotations.
tables_from_tables — Original table annotations#
Existing Table annotation objects are collected from the view using
ViewCollector.type.by_types(Table, include_nested=True), which also captures
Table objects nested inside CompositeEntity annotations. These are pre-structured
tables from the drawing file format and bypass the detection pipeline entirely.
Each colored rectangle represents an original Table annotation from the drawing file.
tables_from_notes — Note frame tables#
Symbol entities of sub-type TypeNote with a rectangular frame
(frame.entity_type == "RECTANGLE") are collected across all views using
ViewCollector.type.by_types(Symbol, include_nested=True). Each matching symbol
is converted into a single-cell DetectedTable via _table_from_framed_entity().
These note frame tables bypass the entire edge-based pipeline.
Each colored rectangle represents a note frame table detected from a Symbol entity.
tables_from_geometries — Edge-based detection#
This branch extracts candidate edge sets from the drawing and filters them to keep only
lines likely to belong to table grids, then converts them into DetectedTable objects.
tables_from_geometries.content_geometries#
A drawing view may contain table lines in two places:
Direct geometries (background view only): the view’s own
geometrieslist, after excluding reference frame lines identified byGridReferenceExtractor.CompositeEntity objects: nested annotation groups that carry their own
geometriesandentities. Each CompositeEntity becomes a separate candidate set.
View
├── geometries → (edges, regular_annotations, "background_geometries")
└── annotations
├── CompositeEntity₀ → (comp.geometries, comp.entities, "composite_entity_0")
├── CompositeEntity₁ → (comp.geometries, comp.entities, "composite_entity_1")
└── ...
For background views, GridReferenceExtractor.classify_geometries() identifies
the reference frame grid. These geometry ids are excluded so that the large grid borders
are never mistaken for table lines.
All content geometries from the background view (after reference frame exclusion).
tables_from_geometries.hv_lines#
For each candidate set, only horizontal and vertical LineSegment2D edges are kept.
Diagonal lines, arcs, and other curve types are discarded since tables are composed
exclusively of axis-aligned lines.
line_segments = (
Collector(entities=content_geometries)
.get_line_segments(direction="both", tolerance=line_segment_tolerance)
.entities
)
Blue lines are the H/V segments kept; gray lines are the original content geometries.
tables_from_geometries.connectivity_filter#
The ConnectivityDensityFilter removes scattered lines that don’t belong to dense
table grids. It uses an adaptive neighborhood analysis.
Adaptive thresholds based on drawing dimensions:
# For a landscape drawing (width > height):
x_threshold = drawing_width * threshold_ratio * 1.2 # e.g. 1000 * 0.2 * 1.2 = 240
y_threshold = drawing_height * threshold_ratio # e.g. 700 * 0.2 = 140
The default threshold_ratio is 0.2 (20% of drawing dimensions).
Neighborhood analysis: for each line, count how many other lines fall within its elliptical neighborhood:
normalized_distance = ((x1 - x2) / x_threshold) ** 2 + ((y1 - y2) / y_threshold) ** 2
are_neighbors = normalized_distance <= 1.0
Statistical threshold: compute the minimum neighbor count from the distribution:
min_neighbors = max(1, int(avg_neighbors - std_neighbors))
Lines with fewer neighbors than this threshold are excluded.
Note
If filtering produces fewer than min_edges_for_table (4) lines, the filter falls
back to returning all original lines unfiltered.
Colored lines are kept (dense clusters); lighter lines are excluded (isolated).
The connectivity debug visualization below provides a detailed view of the filtering process for a single candidate set. It shows:
Each line labeled with its index and neighbor count (e.g.
L5 (12n)means line 5 has 12 neighbors).Kept lines (green) vs rejected lines (red/pink) based on the statistical threshold.
Elliptical neighborhoods drawn around representative lines, showing the adaptive search area used to count neighbors.
Connection lines (thin gray) between neighboring kept lines, illustrating the connectivity graph.
A statistics panel showing the computed thresholds (X/Y threshold, average neighbors, standard deviation, minimum required neighbors) and the filtering summary (total/kept/rejected counts).
When there are multiple candidate sets, one debug visualization is generated per set.
Connectivity debug for one candidate set: lines are labeled with their neighbor count, elliptical neighborhoods show the search area, and the statistics panel summarizes the filtering parameters and results.
tables_from_geometries.connected_components#
Each Edge with a LineSegment2D primitive is converted to a TableLine.
Perpendicular and collinear connections are established, then a depth-first traversal
groups structural lines into connected components. Each component is filtered by minimum
line count, bounding box size, and structural validity (at least 1 H and 1 V line).
Valid components become TableStructure objects.
Each color represents a distinct connected component (potential table).
tables_from_geometries.raw_generated_tables#
Each TableStructure detects its cells by analyzing the grid formed by its H/V lines.
Cells smaller than min_cell_size are discarded. The structure handles merged cells
(where internal borders are missing) and populates cell content from annotations that
fall within cell boundaries.
If at least one cell is found, a DetectedTable is created. This step shows all
raw generated tables before any validation filtering.
All raw generated tables from the edge-based pipeline, before validation.
tables_from_geometries.valid_generated_tables#
Each raw generated table is validated against three criteria: cell existence, sheet coverage ratio (max 80%), and content ratio (min 30%). Tables that fail any check are discarded. This step shows only the tables that passed validation.
Only validated tables from the edge-based pipeline (after filtering).
all_detected_tables#
Summary of all tables from all three branches (tables_from_tables,
tables_from_notes, tables_from_geometries), overlaid on the full sheet with
distinct colors per table and labels showing the source of each table.
All detected tables from every source, with colored bounding boxes and source labels.
title_block#
The title block is selected among validated background-view tables as the one closest to the bottom-right corner of the drawing (measured by Euclidean distance between bottom-right corners).
The selected title block is highlighted; the red dot marks the drawing’s bottom-right corner.
Configuration & Thresholds#
The tables_from_geometries pipeline has three phases, each controlled by its own
configuration dataclass. All parameters have sensible defaults.
Phase 1 — CandidatePreparationConfig#
Parameter |
Default |
Purpose |
|---|---|---|
|
0.5 |
Tolerance for H/V line segment extraction |
|
(default) |
Nested |
Phase 2 — TableGenerationConfig#
Parameter |
Default |
Purpose |
|---|---|---|
|
0.5 |
General tolerance for line processing and collinear connections |
|
0.5 |
Tolerance for perpendicular intersection detection |
|
10.0 |
Minimum bounding box dimension for a valid table structure |
|
2.0 |
Minimum cell dimensions |
|
4 |
Minimum edges to attempt table detection |
|
4 |
Minimum lines per connected component |
Phase 3 — TableSelectionConfig#
Parameter |
Default |
Purpose |
|---|---|---|
|
0.8 |
Maximum table/sheet area ratio before rejection |
|
0.3 |
Minimum fraction of non-empty cells for a valid table |
|
2 |
Minimum cells; smaller tables use stricter content threshold |
|
True |
Enable cell existence check |
|
True |
Enable sheet coverage check |
|
True |
Enable content ratio check |