18 KiB
Fast Table Rendering
Status: Production Ready - Now the Default (as of 2025-10-08) Performance: ~8-10x faster than Rich rendering with correct colspan/rowspan handling
Overview
Fast table rendering provides a high-performance alternative to Rich library rendering for table text extraction. When parsing SEC filings with hundreds of tables, the cumulative rendering time can become a bottleneck. Fast rendering addresses this by using direct string building with TableMatrix for proper colspan/rowspan handling, achieving 8-10x speedup while maintaining correctness.
As of 2025-10-08, fast rendering is the default for all table text extraction. You no longer need to explicitly enable it.
Why It's Now the Default
- Production-ready: Fixed all major issues (colspan, multi-row headers, multi-line cells)
- 7-10x faster: Significant performance improvement with correct output
- Maintains quality: Matches Rich's appearance with simple() style
- Proven: Extensively tested with Apple, NVIDIA, Microsoft 10-K filings
When to Disable (Use Rich Instead)
You may want to disable fast rendering and use Rich for:
- Terminal display for humans: Rich has more sophisticated text wrapping and layout
- Visual reports: When presentation quality is more important than speed
- Debugging: Rich output can be easier to visually inspect
Usage
Default Behavior (Fast Rendering Enabled)
from edgar.documents import parse_html
# Fast rendering is now the default - no configuration needed!
doc = parse_html(html)
# Tables automatically use fast renderer (7-10x faster)
table_text = doc.tables[0].text()
Disabling Fast Rendering (Use Rich Instead)
If you need Rich's sophisticated layout for visual display:
from edgar.documents import parse_html
from edgar.documents.config import ParserConfig
# Explicitly disable fast rendering to use Rich
config = ParserConfig(fast_table_rendering=False)
doc = parse_html(html, config=config)
# Tables use Rich renderer (slower but with advanced formatting)
table_text = doc.tables[0].text()
Custom Table Styles
New in this version: Fast rendering now uses the simple() style by default, which matches Rich's box.SIMPLE appearance (borderless, clean).
from edgar.documents import parse_html
from edgar.documents.config import ParserConfig
from edgar.documents.renderers.fast_table import FastTableRenderer, TableStyle
# Enable fast rendering (uses simple() style by default)
config = ParserConfig(fast_table_rendering=True)
doc = parse_html(html, config=config)
# Default: simple() style - borderless, clean
table_text = doc.tables[0].text()
# To use pipe_table() style explicitly (markdown-compatible borders):
renderer = FastTableRenderer(TableStyle.pipe_table())
pipe_text = renderer.render_table_node(doc.tables[0])
# To use minimal() style (no separator):
renderer = FastTableRenderer(TableStyle.minimal())
minimal_text = renderer.render_table_node(doc.tables[0])
Performance Comparison
Benchmark Results
Test: Apple 10-K (63 tables) - Updated 2025-10-08
| Renderer | Average Per Table | Improvement | Notes |
|---|---|---|---|
| Rich | 1.5-2.5ms | Baseline | Varies by table complexity |
| Fast (simple) | 0.15-0.35ms | 7-10x faster | With proper colspan/rowspan handling |
Real-world Examples (Apple 10-K):
- Table 15 (complex colspan): Rich 2.51ms → Fast 0.35ms (7.1x faster)
- Table 6 (multi-line cells): Rich 1.61ms → Fast 0.17ms (9.5x faster)
- Table 5 (wide table): Rich 3.70ms → Fast 0.48ms (7.7x faster)
Impact on Full Parse:
- Rich rendering: 30-40% of total parse time spent in table rendering
- Fast rendering: 5-10% of total parse time
- Overall speedup: Reduces total parsing time by ~25-30%
Memory Impact
Fast rendering also reduces memory overhead:
- No Rich Console objects retained
- Direct string building (no intermediate objects)
- Helps prevent memory leaks identified in profiling
Output Examples
Rich Renderer Output (Default)
(In millions)
Year Ended June 30, 2025 2024 2023
──────────────────────────────────────────────────────────
Operating lease cost $5,524 3,555 2,875
Finance lease cost:
Amortization of right-of-use assets $3,408 1,800 1,352
Interest on lease liabilities 1,417 734 501
Total finance lease cost $4,825 2,534 1,853
Style: box.SIMPLE - No outer border, just horizontal separator under header
Pros: Clean, uncluttered, perfect alignment, generous spacing
Cons: Slow (6.5ms per table), creates Rich objects, memory overhead
Fast Renderer Output (NEW: simple() style - Default)
December 31, 2023 December 31, 2022 December 31, 2021
───────────────────────────────────────────────────────────────────────────────────────
Revenue 365,817 394,328 365,817
Cost of revenue 223,546 212,981 192,266
Gross profit 142,271 181,347 173,551
Style: simple() - Matches Rich's box.SIMPLE appearance
Pros: Fast (0.2ms per table), clean appearance, no visual noise, professional look
Cons: None - this is now the recommended default!
Fast Renderer Output (pipe_table() style - Optional)
| | December 31, 2023 | December 31, 2022 | December 31, 2021 |
|--------------------------|---------------------|---------------------|---------------------|
| Revenue | 365,817 | 394,328 | 365,817 |
| Cost of revenue | 223,546 | 212,981 | 192,266 |
| Gross profit | 142,271 | 181,347 | 173,551 |
Style: pipe_table() - Markdown-compatible with borders
Pros: Fast (0.2ms per table), markdown-compatible, explicit column boundaries
Cons: Visual noise from pipe characters, busier appearance
Use when: You need markdown-compatible output with explicit borders
Visual Comparison
Rich (box.SIMPLE):
- No outer border - clean, uncluttered look
- Horizontal line separator under header only
- Generous internal spacing and padding
- Perfect column alignment
- Professional, minimalist presentation
Fast simple() (NEW DEFAULT):
- No outer border - matches Rich's clean look
- Horizontal line separator under header (using
─) - Space-separated columns with generous padding
- Clean, professional appearance
- Same performance as pipe_table (~0.2ms per table)
Fast pipe_table() (optional):
- Full pipe table borders (
|characters everywhere) - Horizontal dashes for header separator
- Markdown-compatible format
- Explicit column boundaries
Recent Improvements (2025-10-08)
1. Colspan/Rowspan Support
Fixed: Tables with colspan and rowspan attributes now render correctly.
Previous issue: Fast renderer was extracting cell text without accounting for colspan/rowspan, causing:
- Missing columns (e.g., "2023" column disappeared in Apple 10-K table 15)
- Misaligned data (currency symbols separated from values)
- Data loss (em dashes and other values missing)
Solution: Integrated TableMatrix for proper cell expansion, same as Rich rendering uses.
Status: ✅ FIXED
2. Multi-Row Header Preservation
Fixed: Tables with multiple header rows now preserve each row separately.
Previous issue: Multi-row headers were collapsed into a single line, causing "Investment portfolio" row to disappear in Apple 10-K table 20.
Solution: Modified render_table_data() and _build_table() to preserve each header row as a separate line.
Status: ✅ FIXED
3. Multi-Line Cell Rendering
Fixed: Cells containing newline characters (\n) now render as multiple lines.
Previous issue: Multi-line cells like "Interest Rate\nSensitive Instrument" were truncated to first line only.
Solution: Added _format_multiline_row() to split cells by \n and render each line separately.
Status: ✅ FIXED
Performance Impact
All three fixes maintain excellent performance:
- Speedup: 7-10x faster than Rich (down from initial 14x, but with correct output)
- Correctness: Now matches Rich output exactly for colspan, multi-row headers, and multi-line cells
- Production ready: Can confidently use as default renderer
Known Limitations
1. Column Alignment in Some Tables
Issue: Currency symbols and values may have extra spacing in some complex tables (e.g., Apple 10-K table 22)
Example:
- Rich:
$294,866 - Fast:
$ 294,866(extra spacing)
Root cause: Column width calculation creates wider columns for some currency/value pairs after colspan expansion and column filtering.
Impact: Visual appearance differs slightly, but data is correct and readable.
Status: ⚠️ Minor visual difference - acceptable trade-off for 10x performance gain
3. Visual Polish
Issue: Some visual aspects don't exactly match Rich's sophisticated layout
Examples:
- Multi-line cell wrapping may differ
- Column alignment in edge cases
Status: ⚠️ Acceptable trade-off for 8-10x performance gain
Configuration Options
Table Styles
Fast renderer supports different visual styles:
from edgar.documents.renderers.fast_table import FastTableRenderer, TableStyle
# Pipe table style (default) - markdown compatible
renderer = FastTableRenderer(TableStyle.pipe_table())
# Minimal style - no borders, just spacing
renderer = FastTableRenderer(TableStyle.minimal())
Minimal Style Output
December 31, 2023 December 31, 2022 December 31, 2021
Revenue 365,817 394,328 365,817
Cost of revenue 223,546 212,981 192,266
Gross profit 142,271 181,347 173,551
Note: Minimal style has cleaner appearance but loses column boundaries
Technical Details
How It Works
- Direct String Building: Bypasses Rich's layout engine
- Column Analysis: Detects numeric columns for right-alignment
- Smart Filtering: Removes empty spacing columns
- Currency Merging: Combines
$symbols with amounts - Width Calculation: Measures content, applies min/max limits
Code Path
# When fast_table_rendering=True:
table.text()
→ TableNode._fast_text_rendering()
→ FastTableRenderer.render_table_node()
→ Direct string building
Memory Benefits
Fast rendering avoids:
- Rich Console object creation (~0.4MB per document)
- Intermediate rich.Table objects
- Style/theme processing overhead
- ANSI escape code generation
Future Improvements
Planned Enhancements
-
Match Rich's
box.SIMPLEStyle (Priority: HIGH)- Remove all pipe characters - no outer border, no column separators
- Keep only horizontal separator under header (using
─character) - Increase internal padding to match Rich's generous spacing
- Clean, minimalist appearance like Rich's SIMPLE box style
- Goal: Match Rich visual quality, still 30x faster
-
Improved Layout Engine
- Better column width calculation (avoid too-wide/too-narrow columns)
- Respect natural content breaks
- Dynamic spacing based on content type
- Handle wrapping for long content
-
Dynamic Padding
- Match Rich's generous spacing (currently too tight)
- Adjust padding based on content type
- Configurable padding rules
- Maintain alignment with variable padding
-
Header Handling
- Better multi-row header collapse
- Preserve important hierarchies
- Smart column spanning
- Honor header groupings
-
Style Presets
TableStyle.simple()- Match Rich'sbox.SIMPLE(no borders, header separator only) ⭐ PRIMARY GOALTableStyle.minimal()- no borders, just spacing (already implemented)TableStyle.pipe_table()- current markdown style (default)TableStyle.ascii_clean()- no Unicode, pure ASCIITableStyle.compact()- minimal spacing for dense data
Timeline
These improvements are planned for Phase 2 of the HTML parser optimization work (after memory leak fixes).
Migration Guide
From Rich to Fast
Before (using Rich):
doc = parse_html(html)
table_text = doc.tables[0].text() # Slow but pretty
After (using Fast):
config = ParserConfig(fast_table_rendering=True)
doc = parse_html(html, config=config)
table_text = doc.tables[0].text() # Fast but current visual issues
Hybrid Approach
Use fast rendering during processing, Rich for final display:
# Fast processing
config = ParserConfig(fast_table_rendering=True)
doc = parse_html(html, config=config)
# Extract data quickly
for table in doc.tables:
data = table.text() # Fast
# Process data...
# Display one table nicely
special_table = doc.tables[5]
rich_output = special_table.render() # Switch to Rich for display
Performance Recommendations
Recommended Settings by Use Case
Batch Processing (optimize for speed):
config = ParserConfig.for_performance()
# Includes: fast_table_rendering=True, eager_section_extraction=False
Data Extraction (balance speed and accuracy):
config = ParserConfig(
fast_table_rendering=True,
extract_xbrl=True,
detect_sections=True
)
Display/Reports (optimize for quality):
config = ParserConfig() # Default settings use Rich
# Or explicitly:
config = ParserConfig.for_accuracy()
FAQ
Q: Can I mix Fast and Rich rendering?
A: Not per-table. The setting is document-wide via ParserConfig. However, you can manually call table.render() to get Rich output.
Q: Does this affect section extraction?
A: Indirectly, yes. Section detection calls text() on the entire document, which includes tables. Fast rendering speeds this up significantly.
Q: Will the output format change? A: Yes, as we improve the renderer. We'll maintain backward compatibility via style options.
Q: Can I customize the appearance?
A: Currently limited to TableStyle.pipe_table() vs TableStyle.minimal(). More options coming.
Q: What about DataFrame export?
A: Fast rendering only affects text output. table.to_dataframe() is unaffected.
Feedback
The fast renderer is actively being improved based on user feedback. Known issues:
- ❌ Pipe characters - visual noise
- ❌ Layout engine - inconsistent spacing
- ❌ Padding - needs tuning
If you have specific rendering issues or suggestions, please provide:
- Sample table HTML
- Expected vs actual output
- Use case description
This helps prioritize improvements while maintaining the performance advantage.
Summary
Current State (As of 2025-10-08)
Performance: ✅ Excellent (8-10x faster than Rich) Correctness: ✅ Production ready (proper colspan/rowspan handling) Visual Quality: ⚠️ Good (simple() style matches Rich's box.SIMPLE appearance) Use Case: Production-ready for all use cases
Recent Milestones
✅ Completed:
- Core fast rendering implementation
- TableStyle.simple() preset (borderless, clean)
- Column filtering and merging
- Numeric alignment detection
- Colspan/rowspan support via TableMatrix
- Performance benchmarking with real tables
🔧 Current Limitations:
- Multi-row header collapsing differs from Rich
- Some visual polish differences (acceptable for speed gain)
- Layout engine not as sophisticated as Rich
Development Roadmap
Phase 1 (✅ COMPLETED):
- ✅ Core fast rendering implementation
- ✅ Simple() style matching Rich's box.SIMPLE
- ✅ Proper colspan/rowspan handling via TableMatrix
- ✅ Production-ready performance (8-10x faster)
Phase 2 (Future Enhancements):
- 📋 Improve multi-row header handling
- 📋 Better layout engine for perfect column widths
- 📋 Additional style presets
- 📋 Advanced header detection (data vs labels)
Bottom Line
Fast table rendering is production-ready and now the default for all table text extraction in EdgarTools.
Benefits:
- ✅ 7-10x faster than Rich rendering
- ✅ Correct data extraction with proper colspan/rowspan handling
- ✅ Multi-row header preservation
- ✅ Multi-line cell rendering
- ✅ Clean, borderless appearance (simple() style)
Minor differences from Rich:
- ⚠️ Some tables have extra spacing between currency symbols and values (e.g., table 22)
- ⚠️ Column width calculation may differ slightly in complex tables
- ✅ All data is preserved and correct - only visual presentation differs
The implementation achieves correct data extraction with significant performance gains and clean visual output, making it the ideal default for EdgarTools.
Related Documentation
- HTML Parser Status - Overall parser progress
- Performance Analysis - Profiling results showing Rich rendering bottleneck
- Memory Analysis - Memory leak issues with Rich objects