Table Detection Pipeline#

This document describes the internal detection pipeline used by SheetTableDetector to find tables in technical drawings. It is intended for developers working on or extending the detection logic.

Note

For user-facing documentation on working with detected tables, see Table Navigation.

Pipeline Overview#

For each view in the sheet, detect_view_tables finds tables through three independent branches:

  1. tables_from_tables — collects existing Table annotation objects (including those nested in CompositeEntity annotations).

  2. tables_from_notes — converts Symbol entities (TypeNote with a rectangular frame) into single-cell DetectedTable instances.

  3. tables_from_geometries — the full edge-based detection pipeline, itself composed of three phases:

    • Phase 1 — Candidate Preparation (CandidatePreparationConfig): extract H/V line segments from CompositeEntity objects and apply connectivity density filtering. For the background view, direct geometries are also included (after excluding reference frame lines).

    • Phase 2 — Table Generation (TableGenerationConfig): build connected components from filtered edges, detect cells in each structure, produce raw DetectedTable objects.

    • Phase 3 — Table Validation (TableSelectionConfig): validate each raw table (cell existence, sheet coverage ratio, content ratio) to produce the final validated geometry tables.

The results of all three branches are merged and returned. Finally, the title block is identified among validated background-view tables as the one closest to the bottom-right corner.

All visualizations below are generated from the GICLEUR drawing (sheet 0) using TableDetectionVisualizer.

Starting point: the raw drawing sheet with all views, geometries, and annotations.

tables_from_tables — Original table annotations#

Existing Table annotation objects are collected from the view using ViewCollector.type.by_types(Table, include_nested=True), which also captures Table objects nested inside CompositeEntity annotations. These are pre-structured tables from the drawing file format and bypass the detection pipeline entirely.

Each colored rectangle represents an original Table annotation from the drawing file.

tables_from_notes — Note frame tables#

Symbol entities of sub-type TypeNote with a rectangular frame (frame.entity_type == "RECTANGLE") are collected across all views using ViewCollector.type.by_types(Symbol, include_nested=True). Each matching symbol is converted into a single-cell DetectedTable via _table_from_framed_entity(). These note frame tables bypass the entire edge-based pipeline.

Each colored rectangle represents a note frame table detected from a Symbol entity.

tables_from_geometries — Edge-based detection#

This branch extracts candidate edge sets from the drawing and filters them to keep only lines likely to belong to table grids, then converts them into DetectedTable objects.

tables_from_geometries.content_geometries#

A drawing view may contain table lines in two places:

  • Direct geometries (background view only): the view’s own geometries list, after excluding reference frame lines identified by GridReferenceExtractor.

  • CompositeEntity objects: nested annotation groups that carry their own geometries and entities. Each CompositeEntity becomes a separate candidate set.

View
├── geometries → (edges, regular_annotations, "background_geometries")
└── annotations
    ├── CompositeEntity₀ → (comp.geometries, comp.entities, "composite_entity_0")
    ├── CompositeEntity₁ → (comp.geometries, comp.entities, "composite_entity_1")
    └── ...

For background views, GridReferenceExtractor.classify_geometries() identifies the reference frame grid. These geometry ids are excluded so that the large grid borders are never mistaken for table lines.

All content geometries from the background view (after reference frame exclusion).

tables_from_geometries.hv_lines#

For each candidate set, only horizontal and vertical LineSegment2D edges are kept. Diagonal lines, arcs, and other curve types are discarded since tables are composed exclusively of axis-aligned lines.

line_segments = (
    Collector(entities=content_geometries)
    .get_line_segments(direction="both", tolerance=line_segment_tolerance)
    .entities
)

Blue lines are the H/V segments kept; gray lines are the original content geometries.

tables_from_geometries.connectivity_filter#

The ConnectivityDensityFilter removes scattered lines that don’t belong to dense table grids. It uses an adaptive neighborhood analysis.

Adaptive thresholds based on drawing dimensions:

# For a landscape drawing (width > height):
x_threshold = drawing_width * threshold_ratio * 1.2   # e.g. 1000 * 0.2 * 1.2 = 240
y_threshold = drawing_height * threshold_ratio          # e.g. 700 * 0.2 = 140

The default threshold_ratio is 0.2 (20% of drawing dimensions).

Neighborhood analysis: for each line, count how many other lines fall within its elliptical neighborhood:

normalized_distance = ((x1 - x2) / x_threshold) ** 2 + ((y1 - y2) / y_threshold) ** 2
are_neighbors = normalized_distance <= 1.0

Statistical threshold: compute the minimum neighbor count from the distribution:

min_neighbors = max(1, int(avg_neighbors - std_neighbors))

Lines with fewer neighbors than this threshold are excluded.

Note

If filtering produces fewer than min_edges_for_table (4) lines, the filter falls back to returning all original lines unfiltered.

Colored lines are kept (dense clusters); lighter lines are excluded (isolated).

The connectivity debug visualization below provides a detailed view of the filtering process for a single candidate set. It shows:

  • Each line labeled with its index and neighbor count (e.g. L5 (12n) means line 5 has 12 neighbors).

  • Kept lines (green) vs rejected lines (red/pink) based on the statistical threshold.

  • Elliptical neighborhoods drawn around representative lines, showing the adaptive search area used to count neighbors.

  • Connection lines (thin gray) between neighboring kept lines, illustrating the connectivity graph.

  • A statistics panel showing the computed thresholds (X/Y threshold, average neighbors, standard deviation, minimum required neighbors) and the filtering summary (total/kept/rejected counts).

When there are multiple candidate sets, one debug visualization is generated per set.

Connectivity debug for one candidate set: lines are labeled with their neighbor count, elliptical neighborhoods show the search area, and the statistics panel summarizes the filtering parameters and results.

tables_from_geometries.connected_components#

Each Edge with a LineSegment2D primitive is converted to a TableLine. Perpendicular and collinear connections are established, then a depth-first traversal groups structural lines into connected components. Each component is filtered by minimum line count, bounding box size, and structural validity (at least 1 H and 1 V line).

Valid components become TableStructure objects.

Each color represents a distinct connected component (potential table).

tables_from_geometries.raw_generated_tables#

Each TableStructure detects its cells by analyzing the grid formed by its H/V lines. Cells smaller than min_cell_size are discarded. The structure handles merged cells (where internal borders are missing) and populates cell content from annotations that fall within cell boundaries.

If at least one cell is found, a DetectedTable is created. This step shows all raw generated tables before any validation filtering.

All raw generated tables from the edge-based pipeline, before validation.

tables_from_geometries.valid_generated_tables#

Each raw generated table is validated against three criteria: cell existence, sheet coverage ratio (max 80%), and content ratio (min 30%). Tables that fail any check are discarded. This step shows only the tables that passed validation.

Only validated tables from the edge-based pipeline (after filtering).

all_detected_tables#

Summary of all tables from all three branches (tables_from_tables, tables_from_notes, tables_from_geometries), overlaid on the full sheet with distinct colors per table and labels showing the source of each table.

All detected tables from every source, with colored bounding boxes and source labels.

title_block#

The title block is selected among validated background-view tables as the one closest to the bottom-right corner of the drawing (measured by Euclidean distance between bottom-right corners).

The selected title block is highlighted; the red dot marks the drawing’s bottom-right corner.

Configuration & Thresholds#

The tables_from_geometries pipeline has three phases, each controlled by its own configuration dataclass. All parameters have sensible defaults.

Phase 1 — CandidatePreparationConfig#

Parameter

Default

Purpose

line_segment_tolerance

0.5

Tolerance for H/V line segment extraction

connectivity_filter

(default)

Nested ConnectivityFilterConfig for density filtering

Phase 2 — TableGenerationConfig#

Parameter

Default

Purpose

tolerance

0.5

General tolerance for line processing and collinear connections

intersection_tolerance

0.5

Tolerance for perpendicular intersection detection

min_table_size

10.0

Minimum bounding box dimension for a valid table structure

min_cell_size

2.0

Minimum cell dimensions

min_edges_for_table

4

Minimum edges to attempt table detection

min_lines_in_component

4

Minimum lines per connected component

Phase 3 — TableSelectionConfig#

Parameter

Default

Purpose

max_sheet_coverage_ratio

0.8

Maximum table/sheet area ratio before rejection

min_content_ratio

0.3

Minimum fraction of non-empty cells for a valid table

min_nb_cells

2

Minimum cells; smaller tables use stricter content threshold

validate_has_cells

True

Enable cell existence check

validate_sheet_coverage

True

Enable sheet coverage check

validate_content_ratio

True

Enable content ratio check