Files
edgartools/chart_generator_documentation.md

7.7 KiB

Chart Generator Documentation

Overview

The chart_generator.py script is a Python tool designed to generate financial charts for publicly traded companies using data from SEC filings. It fetches quarterly (10-Q) and annual (10-K) financial reports, extracts key metrics such as Revenue, Gross Profit, and Net Income, and creates visual charts showing trends over time. The script also calculates and displays profit margins (Gross and Net).

For companies that do not provide XBRL data, the script falls back to parsing HTML content from 20-F filings using regex patterns.

Generated charts are saved as PNG files in the charts/ directory and automatically displayed using available image viewers on Linux systems.

Requirements

  • Python 3.x
  • Virtual environment activated (the script checks for VIRTUAL_ENV environment variable)
  • Required Python packages: pandas, matplotlib, edgartools, beautifulsoup4
  • SEC identity set (currently hardcoded as "your.email@example.com" - update this in the code)
  • Local storage directory for caching filings (./edgar_cache)

Installation

  1. Ensure you have Python 3.x installed.
  2. Create and activate a virtual environment:
    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:
    pip install pandas matplotlib edgartools beautifulsoup4
    
  4. Update the SEC identity in the script (line 21) with your actual email address.

Usage

Run the script from the command line, providing a stock ticker symbol as an argument:

python chart_generator.py <TICKER>

If no ticker is provided, the script will prompt for one.

Example:

python chart_generator.py AAPL

This will generate two charts:

  • Quarterly chart: charts/AAPL_chart.png (last 20 quarters)
  • Yearly chart: charts/AAPL_yearly_chart.png (last 5 years)

How It Works

  1. Initialization: Checks for virtual environment, sets SEC identity, and enables local caching.
  2. Data Fetching: Retrieves the last 20 10-Q filings and last 5 10-K filings for the given ticker using the edgartools library.
  3. Data Extraction:
    • Attempts to parse XBRL data from filings for structured financial data.
    • Looks for specific GAAP elements: Revenue, Gross Profit, Net Income.
    • If XBRL is unavailable, falls back to HTML parsing for 20-F filings.
  4. Data Processing: Organizes extracted data into quarterly and yearly dictionaries, converts to pandas Series, and calculates profit margins.
  5. Chart Generation: Uses matplotlib to create bar charts for financial metrics and line plots for margins.
  6. Output: Saves charts as PNG files and attempts to display them using system image viewers.

Functions

show_image(image_path)

Attempts to display the generated chart using common Linux image viewers (eog, feh, gthumb, etc.). If no viewer is found, prints a message indicating the chart was saved but not displayed.

parse_20f_html(html_content, year)

Parses HTML content from 20-F filings to extract Revenue, Gross Profit, and Net Income using regex patterns. Returns the extracted values or None if not found.

extract_number(text)

Helper function to extract and clean numerical values from text, handling commas, parentheses (for negatives), and converting to float.

generate_charts(ticker)

Main function that orchestrates the entire process:

  • Fetches company filings
  • Extracts financial data from XBRL or HTML
  • Processes and organizes data
  • Generates and saves quarterly and yearly charts

Data Sources and Elements

The script prioritizes XBRL data and looks for these GAAP elements:

  • Revenue: us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax, us-gaap:Revenues, us-gaap:SalesRevenueNet
  • Gross Profit: us-gaap:GrossProfit, us-gaap:GrossMargin (or calculated as Revenue - COGS)
  • Net Income: us-gaap:NetIncomeLoss

For HTML parsing (20-F filings), it uses regex to find patterns like "net revenue", "gross profit", "net income" followed by numerical values.

Chart Features

  • Quarterly Chart: Shows last 20 quarters with bars for Revenue, Gross Profit, Net Income (in billions USD), and lines for Gross/Net Margins (%).
  • Yearly Chart: Shows last 5 years with the same metrics.
  • Value labels on bars for easy reading.
  • Dual y-axes: left for monetary values, right for percentages.
  • Grid lines and legends for clarity.

Error Handling

  • Checks if company exists and has filings.
  • Handles missing XBRL data by falling back to HTML parsing.
  • Continues processing even if individual filings fail.
  • Prints progress and error messages to console.

Limitations

  • Requires SEC filings to be available and accessible.
  • HTML parsing is less reliable than XBRL and may miss data.
  • Assumes USD currency and standard GAAP reporting.
  • Chart display depends on having image viewers installed on Linux.

Dependencies

  • os, sys, subprocess: Standard library for system operations.
  • matplotlib.pyplot: For chart generation.
  • pandas: For data manipulation and Series operations.
  • edgar.Company, edgar.set_identity, etc.: From edgartools library for SEC data access.
  • BeautifulSoup: For HTML parsing.
  • re: For regex pattern matching.

Codebase Overview

This codebase is a suite of Python scripts for analyzing SEC filings and financial data, built around the edgartools library. It focuses on retrieving, processing, and visualizing financial information from public companies.

Core Scripts

  • chart_generator.py: Generates financial charts (Revenue, Gross Profit, Net Income, margins) from 10-Q and 10-K filings.
  • fetch_today_filings.py: Fetches and displays latest SEC filings with interactive filtering by form type and ownership.
  • get_company_details.py: Retrieves and displays company information (name, CIK, addresses, industry) for a given ticker.
  • daily_insiders.py: Extracts insider transactions from recent Form 4 filings.
  • investor_holdings.py: Analyzes 13F holdings for institutional investors and insider transactions, with caching and interactive selection.
  • parse_duggan_filing.py: Parses a specific Form 4 HTML file for transaction details.
  • test_cache.py: Tests the caching functionality for 13F filings.

Configuration and Support Files

  • investors.json: JSON configuration file containing a list of notable investors (Warren Buffett, Cathie Wood, etc.) with their tickers and types (13F or insider).
  • AGENTS.md: Instructions for AI agents, including edgartools API documentation link and reminder to use pyright for code checking.
  • run.sh: Shell script that activates the virtual environment and runs investor_holdings.py.

Directories

  • charts/: Stores generated PNG chart files from chart_generator.py.
  • cache/: Caches processed data (holdings, transactions) to avoid redundant API calls.
  • edgar_cache/: Local storage for edgartools library caching of SEC filings.
  • venv/: Python virtual environment with required dependencies.

Common Patterns

  • All scripts check for virtual environment activation.
  • Use edgartools for SEC data access with identity setting.
  • Implement local caching to improve performance.
  • Handle errors gracefully and provide informative output.
  • Many scripts support command-line arguments and interactive modes.

Troubleshooting

  • If charts don't display, ensure an image viewer is installed (e.g., sudo apt install eog).
  • For data issues, check SEC filings availability or try different tickers.
  • Run pyright chart_generator.py to check for type errors.
  • Ensure virtual environment is activated before running.

API Reference

For more details on edgartools usage, see: https://edgartools.readthedocs.io/en/stable/api/company/