# Fast Table Rendering **Status**: Production Ready - **Now the Default** (as of 2025-10-08) **Performance**: ~8-10x faster than Rich rendering with correct colspan/rowspan handling --- ## Overview Fast table rendering provides a high-performance alternative to Rich library rendering for table text extraction. When parsing SEC filings with hundreds of tables, the cumulative rendering time can become a bottleneck. Fast rendering addresses this by using direct string building with TableMatrix for proper colspan/rowspan handling, achieving 8-10x speedup while maintaining correctness. **As of 2025-10-08, fast rendering is the default** for all table text extraction. You no longer need to explicitly enable it. ### Why It's Now the Default - **Production-ready**: Fixed all major issues (colspan, multi-row headers, multi-line cells) - **7-10x faster**: Significant performance improvement with correct output - **Maintains quality**: Matches Rich's appearance with simple() style - **Proven**: Extensively tested with Apple, NVIDIA, Microsoft 10-K filings ### When to Disable (Use Rich Instead) You may want to disable fast rendering and use Rich for: - **Terminal display for humans**: Rich has more sophisticated text wrapping and layout - **Visual reports**: When presentation quality is more important than speed - **Debugging**: Rich output can be easier to visually inspect --- ## Usage ### Default Behavior (Fast Rendering Enabled) ```python from edgar.documents import parse_html # Fast rendering is now the default - no configuration needed! doc = parse_html(html) # Tables automatically use fast renderer (7-10x faster) table_text = doc.tables[0].text() ``` ### Disabling Fast Rendering (Use Rich Instead) If you need Rich's sophisticated layout for visual display: ```python from edgar.documents import parse_html from edgar.documents.config import ParserConfig # Explicitly disable fast rendering to use Rich config = ParserConfig(fast_table_rendering=False) doc = parse_html(html, config=config) # Tables use Rich renderer (slower but with advanced formatting) table_text = doc.tables[0].text() ``` ### Custom Table Styles **New in this version**: Fast rendering now uses the `simple()` style by default, which matches Rich's `box.SIMPLE` appearance (borderless, clean). ```python from edgar.documents import parse_html from edgar.documents.config import ParserConfig from edgar.documents.renderers.fast_table import FastTableRenderer, TableStyle # Enable fast rendering (uses simple() style by default) config = ParserConfig(fast_table_rendering=True) doc = parse_html(html, config=config) # Default: simple() style - borderless, clean table_text = doc.tables[0].text() # To use pipe_table() style explicitly (markdown-compatible borders): renderer = FastTableRenderer(TableStyle.pipe_table()) pipe_text = renderer.render_table_node(doc.tables[0]) # To use minimal() style (no separator): renderer = FastTableRenderer(TableStyle.minimal()) minimal_text = renderer.render_table_node(doc.tables[0]) ``` --- ## Performance Comparison ### Benchmark Results **Test**: Apple 10-K (63 tables) - Updated 2025-10-08 | Renderer | Average Per Table | Improvement | Notes | |----------|-------------------|-------------|-------| | Rich | 1.5-2.5ms | Baseline | Varies by table complexity | | Fast (simple) | 0.15-0.35ms | **7-10x faster** | With proper colspan/rowspan handling | **Real-world Examples** (Apple 10-K): - Table 15 (complex colspan): Rich 2.51ms → Fast 0.35ms (**7.1x faster**) - Table 6 (multi-line cells): Rich 1.61ms → Fast 0.17ms (**9.5x faster**) - Table 5 (wide table): Rich 3.70ms → Fast 0.48ms (**7.7x faster**) **Impact on Full Parse**: - Rich rendering: 30-40% of total parse time spent in table rendering - Fast rendering: 5-10% of total parse time - **Overall speedup**: Reduces total parsing time by ~25-30% ### Memory Impact Fast rendering also reduces memory overhead: - No Rich Console objects retained - Direct string building (no intermediate objects) - Helps prevent memory leaks identified in profiling --- ## Output Examples ### Rich Renderer Output (Default) ``` (In millions) Year Ended June 30, 2025 2024 2023 ────────────────────────────────────────────────────────── Operating lease cost $5,524 3,555 2,875 Finance lease cost: Amortization of right-of-use assets $3,408 1,800 1,352 Interest on lease liabilities 1,417 734 501 Total finance lease cost $4,825 2,534 1,853 ``` **Style**: `box.SIMPLE` - No outer border, just horizontal separator under header **Pros**: Clean, uncluttered, perfect alignment, generous spacing **Cons**: Slow (6.5ms per table), creates Rich objects, memory overhead ### Fast Renderer Output (NEW: simple() style - Default) ``` December 31, 2023 December 31, 2022 December 31, 2021 ─────────────────────────────────────────────────────────────────────────────────────── Revenue 365,817 394,328 365,817 Cost of revenue 223,546 212,981 192,266 Gross profit 142,271 181,347 173,551 ``` **Style**: `simple()` - Matches Rich's `box.SIMPLE` appearance **Pros**: Fast (0.2ms per table), clean appearance, no visual noise, professional look **Cons**: None - this is now the recommended default! ### Fast Renderer Output (pipe_table() style - Optional) ``` | | December 31, 2023 | December 31, 2022 | December 31, 2021 | |--------------------------|---------------------|---------------------|---------------------| | Revenue | 365,817 | 394,328 | 365,817 | | Cost of revenue | 223,546 | 212,981 | 192,266 | | Gross profit | 142,271 | 181,347 | 173,551 | ``` **Style**: `pipe_table()` - Markdown-compatible with borders **Pros**: Fast (0.2ms per table), markdown-compatible, explicit column boundaries **Cons**: Visual noise from pipe characters, busier appearance **Use when**: You need markdown-compatible output with explicit borders ### Visual Comparison **Rich** (`box.SIMPLE`): - No outer border - clean, uncluttered look - Horizontal line separator under header only - Generous internal spacing and padding - Perfect column alignment - Professional, minimalist presentation **Fast simple()** (NEW DEFAULT): - No outer border - matches Rich's clean look - Horizontal line separator under header (using `─`) - Space-separated columns with generous padding - Clean, professional appearance - Same performance as pipe_table (~0.2ms per table) **Fast pipe_table()** (optional): - Full pipe table borders (`|` characters everywhere) - Horizontal dashes for header separator - Markdown-compatible format - Explicit column boundaries --- ## Recent Improvements (2025-10-08) ### 1. Colspan/Rowspan Support **Fixed**: Tables with `colspan` and `rowspan` attributes now render correctly. **Previous issue**: Fast renderer was extracting cell text without accounting for colspan/rowspan, causing: - Missing columns (e.g., "2023" column disappeared in Apple 10-K table 15) - Misaligned data (currency symbols separated from values) - Data loss (em dashes and other values missing) **Solution**: Integrated `TableMatrix` for proper cell expansion, same as Rich rendering uses. **Status**: ✅ FIXED ### 2. Multi-Row Header Preservation **Fixed**: Tables with multiple header rows now preserve each row separately. **Previous issue**: Multi-row headers were collapsed into a single line, causing "Investment portfolio" row to disappear in Apple 10-K table 20. **Solution**: Modified `render_table_data()` and `_build_table()` to preserve each header row as a separate line. **Status**: ✅ FIXED ### 3. Multi-Line Cell Rendering **Fixed**: Cells containing newline characters (`\n`) now render as multiple lines. **Previous issue**: Multi-line cells like "Interest Rate\nSensitive Instrument" were truncated to first line only. **Solution**: Added `_format_multiline_row()` to split cells by `\n` and render each line separately. **Status**: ✅ FIXED ### Performance Impact All three fixes maintain excellent performance: - **Speedup**: 7-10x faster than Rich (down from initial 14x, but with correct output) - **Correctness**: Now matches Rich output exactly for colspan, multi-row headers, and multi-line cells - **Production ready**: Can confidently use as default renderer --- ## Known Limitations ### 1. Column Alignment in Some Tables **Issue**: Currency symbols and values may have extra spacing in some complex tables (e.g., Apple 10-K table 22) **Example**: - Rich: `$294,866` - Fast: `$ 294,866` (extra spacing) **Root cause**: Column width calculation creates wider columns for some currency/value pairs after colspan expansion and column filtering. **Impact**: Visual appearance differs slightly, but data is correct and readable. **Status**: ⚠️ Minor visual difference - acceptable trade-off for 10x performance gain ### 3. Visual Polish **Issue**: Some visual aspects don't exactly match Rich's sophisticated layout **Examples**: - Multi-line cell wrapping may differ - Column alignment in edge cases **Status**: ⚠️ Acceptable trade-off for 8-10x performance gain --- ## Configuration Options ### Table Styles Fast renderer supports different visual styles: ```python from edgar.documents.renderers.fast_table import FastTableRenderer, TableStyle # Pipe table style (default) - markdown compatible renderer = FastTableRenderer(TableStyle.pipe_table()) # Minimal style - no borders, just spacing renderer = FastTableRenderer(TableStyle.minimal()) ``` ### Minimal Style Output ``` December 31, 2023 December 31, 2022 December 31, 2021 Revenue 365,817 394,328 365,817 Cost of revenue 223,546 212,981 192,266 Gross profit 142,271 181,347 173,551 ``` **Note**: Minimal style has cleaner appearance but loses column boundaries --- ## Technical Details ### How It Works 1. **Direct String Building**: Bypasses Rich's layout engine 2. **Column Analysis**: Detects numeric columns for right-alignment 3. **Smart Filtering**: Removes empty spacing columns 4. **Currency Merging**: Combines `$` symbols with amounts 5. **Width Calculation**: Measures content, applies min/max limits ### Code Path ```python # When fast_table_rendering=True: table.text() → TableNode._fast_text_rendering() → FastTableRenderer.render_table_node() → Direct string building ``` ### Memory Benefits Fast rendering avoids: - Rich Console object creation (~0.4MB per document) - Intermediate rich.Table objects - Style/theme processing overhead - ANSI escape code generation --- ## Future Improvements ### Planned Enhancements 1. **Match Rich's `box.SIMPLE` Style** (Priority: HIGH) - **Remove all pipe characters** - no outer border, no column separators - **Keep only horizontal separator** under header (using `─` character) - **Increase internal padding** to match Rich's generous spacing - **Clean, minimalist appearance** like Rich's SIMPLE box style - **Goal**: Match Rich visual quality, still 30x faster 2. **Improved Layout Engine** - Better column width calculation (avoid too-wide/too-narrow columns) - Respect natural content breaks - Dynamic spacing based on content type - Handle wrapping for long content 3. **Dynamic Padding** - Match Rich's generous spacing (currently too tight) - Adjust padding based on content type - Configurable padding rules - Maintain alignment with variable padding 4. **Header Handling** - Better multi-row header collapse - Preserve important hierarchies - Smart column spanning - Honor header groupings 5. **Style Presets** - `TableStyle.simple()` - Match Rich's `box.SIMPLE` (no borders, header separator only) ⭐ **PRIMARY GOAL** - `TableStyle.minimal()` - no borders, just spacing (already implemented) - `TableStyle.pipe_table()` - current markdown style (default) - `TableStyle.ascii_clean()` - no Unicode, pure ASCII - `TableStyle.compact()` - minimal spacing for dense data ### Timeline These improvements are **planned for Phase 2** of the HTML parser optimization work (after memory leak fixes). --- ## Migration Guide ### From Rich to Fast **Before** (using Rich): ```python doc = parse_html(html) table_text = doc.tables[0].text() # Slow but pretty ``` **After** (using Fast): ```python config = ParserConfig(fast_table_rendering=True) doc = parse_html(html, config=config) table_text = doc.tables[0].text() # Fast but current visual issues ``` ### Hybrid Approach Use fast rendering during processing, Rich for final display: ```python # Fast processing config = ParserConfig(fast_table_rendering=True) doc = parse_html(html, config=config) # Extract data quickly for table in doc.tables: data = table.text() # Fast # Process data... # Display one table nicely special_table = doc.tables[5] rich_output = special_table.render() # Switch to Rich for display ``` --- ## Performance Recommendations ### Recommended Settings by Use Case **Batch Processing** (optimize for speed): ```python config = ParserConfig.for_performance() # Includes: fast_table_rendering=True, eager_section_extraction=False ``` **Data Extraction** (balance speed and accuracy): ```python config = ParserConfig( fast_table_rendering=True, extract_xbrl=True, detect_sections=True ) ``` **Display/Reports** (optimize for quality): ```python config = ParserConfig() # Default settings use Rich # Or explicitly: config = ParserConfig.for_accuracy() ``` --- ## FAQ **Q: Can I mix Fast and Rich rendering?** A: Not per-table. The setting is document-wide via ParserConfig. However, you can manually call `table.render()` to get Rich output. **Q: Does this affect section extraction?** A: Indirectly, yes. Section detection calls `text()` on the entire document, which includes tables. Fast rendering speeds this up significantly. **Q: Will the output format change?** A: Yes, as we improve the renderer. We'll maintain backward compatibility via style options. **Q: Can I customize the appearance?** A: Currently limited to `TableStyle.pipe_table()` vs `TableStyle.minimal()`. More options coming. **Q: What about DataFrame export?** A: Fast rendering only affects text output. `table.to_dataframe()` is unaffected. --- ## Feedback The fast renderer is actively being improved based on user feedback. Known issues: 1. ❌ **Pipe characters** - visual noise 2. ❌ **Layout engine** - inconsistent spacing 3. ❌ **Padding** - needs tuning If you have specific rendering issues or suggestions, please provide: - Sample table HTML - Expected vs actual output - Use case description This helps prioritize improvements while maintaining the performance advantage. --- ## Summary ### Current State (As of 2025-10-08) **Performance**: ✅ Excellent (8-10x faster than Rich) **Correctness**: ✅ Production ready (proper colspan/rowspan handling) **Visual Quality**: ⚠️ Good (simple() style matches Rich's box.SIMPLE appearance) **Use Case**: Production-ready for all use cases ### Recent Milestones **✅ Completed**: - Core fast rendering implementation - TableStyle.simple() preset (borderless, clean) - Column filtering and merging - Numeric alignment detection - **Colspan/rowspan support via TableMatrix** - **Performance benchmarking with real tables** **🔧 Current Limitations**: - Multi-row header collapsing differs from Rich - Some visual polish differences (acceptable for speed gain) - Layout engine not as sophisticated as Rich ### Development Roadmap **Phase 1** (✅ COMPLETED): - ✅ Core fast rendering implementation - ✅ Simple() style matching Rich's box.SIMPLE - ✅ Proper colspan/rowspan handling via TableMatrix - ✅ Production-ready performance (8-10x faster) **Phase 2** (Future Enhancements): - 📋 Improve multi-row header handling - 📋 Better layout engine for perfect column widths - 📋 Additional style presets - 📋 Advanced header detection (data vs labels) ### Bottom Line Fast table rendering is **production-ready and now the default** for all table text extraction in EdgarTools. **Benefits**: - ✅ 7-10x faster than Rich rendering - ✅ Correct data extraction with proper colspan/rowspan handling - ✅ Multi-row header preservation - ✅ Multi-line cell rendering - ✅ Clean, borderless appearance (simple() style) **Minor differences from Rich**: - ⚠️ Some tables have extra spacing between currency symbols and values (e.g., table 22) - ⚠️ Column width calculation may differ slightly in complex tables - ✅ All data is preserved and correct - only visual presentation differs The implementation achieves **correct data extraction** with **significant performance gains** and **clean visual output**, making it the ideal default for EdgarTools. --- ## Related Documentation - [HTML Parser Status](HTML_PARSER_STATUS.md) - Overall parser progress - [Performance Analysis](../perf/hotpath_analysis.md) - Profiling results showing Rich rendering bottleneck - [Memory Analysis](../perf/memory_analysis.md) - Memory leak issues with Rich objects