Add comprehensive documentation for chart_generator.py and codebase analysis

This commit is contained in:
kdusek
2025-12-09 12:17:54 +01:00
parent 8e654ed209
commit ce59173ecd

View File

@@ -0,0 +1,167 @@
# Chart Generator Documentation
## Overview
The `chart_generator.py` script is a Python tool designed to generate financial charts for publicly traded companies using data from SEC filings. It fetches quarterly (10-Q) and annual (10-K) financial reports, extracts key metrics such as Revenue, Gross Profit, and Net Income, and creates visual charts showing trends over time. The script also calculates and displays profit margins (Gross and Net).
For companies that do not provide XBRL data, the script falls back to parsing HTML content from 20-F filings using regex patterns.
Generated charts are saved as PNG files in the `charts/` directory and automatically displayed using available image viewers on Linux systems.
## Requirements
- Python 3.x
- Virtual environment activated (the script checks for `VIRTUAL_ENV` environment variable)
- Required Python packages: `pandas`, `matplotlib`, `edgartools`, `beautifulsoup4`
- SEC identity set (currently hardcoded as "your.email@example.com" - update this in the code)
- Local storage directory for caching filings (`./edgar_cache`)
## Installation
1. Ensure you have Python 3.x installed.
2. Create and activate a virtual environment:
```
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. Install dependencies:
```
pip install pandas matplotlib edgartools beautifulsoup4
```
4. Update the SEC identity in the script (line 21) with your actual email address.
## Usage
Run the script from the command line, providing a stock ticker symbol as an argument:
```
python chart_generator.py <TICKER>
```
If no ticker is provided, the script will prompt for one.
Example:
```
python chart_generator.py AAPL
```
This will generate two charts:
- Quarterly chart: `charts/AAPL_chart.png` (last 20 quarters)
- Yearly chart: `charts/AAPL_yearly_chart.png` (last 5 years)
## How It Works
1. **Initialization**: Checks for virtual environment, sets SEC identity, and enables local caching.
2. **Data Fetching**: Retrieves the last 20 10-Q filings and last 5 10-K filings for the given ticker using the `edgartools` library.
3. **Data Extraction**:
- Attempts to parse XBRL data from filings for structured financial data.
- Looks for specific GAAP elements: Revenue, Gross Profit, Net Income.
- If XBRL is unavailable, falls back to HTML parsing for 20-F filings.
4. **Data Processing**: Organizes extracted data into quarterly and yearly dictionaries, converts to pandas Series, and calculates profit margins.
5. **Chart Generation**: Uses matplotlib to create bar charts for financial metrics and line plots for margins.
6. **Output**: Saves charts as PNG files and attempts to display them using system image viewers.
## Functions
### `show_image(image_path)`
Attempts to display the generated chart using common Linux image viewers (eog, feh, gthumb, etc.). If no viewer is found, prints a message indicating the chart was saved but not displayed.
### `parse_20f_html(html_content, year)`
Parses HTML content from 20-F filings to extract Revenue, Gross Profit, and Net Income using regex patterns. Returns the extracted values or None if not found.
### `extract_number(text)`
Helper function to extract and clean numerical values from text, handling commas, parentheses (for negatives), and converting to float.
### `generate_charts(ticker)`
Main function that orchestrates the entire process:
- Fetches company filings
- Extracts financial data from XBRL or HTML
- Processes and organizes data
- Generates and saves quarterly and yearly charts
## Data Sources and Elements
The script prioritizes XBRL data and looks for these GAAP elements:
- **Revenue**: `us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax`, `us-gaap:Revenues`, `us-gaap:SalesRevenueNet`
- **Gross Profit**: `us-gaap:GrossProfit`, `us-gaap:GrossMargin` (or calculated as Revenue - COGS)
- **Net Income**: `us-gaap:NetIncomeLoss`
For HTML parsing (20-F filings), it uses regex to find patterns like "net revenue", "gross profit", "net income" followed by numerical values.
## Chart Features
- **Quarterly Chart**: Shows last 20 quarters with bars for Revenue, Gross Profit, Net Income (in billions USD), and lines for Gross/Net Margins (%).
- **Yearly Chart**: Shows last 5 years with the same metrics.
- Value labels on bars for easy reading.
- Dual y-axes: left for monetary values, right for percentages.
- Grid lines and legends for clarity.
## Error Handling
- Checks if company exists and has filings.
- Handles missing XBRL data by falling back to HTML parsing.
- Continues processing even if individual filings fail.
- Prints progress and error messages to console.
## Limitations
- Requires SEC filings to be available and accessible.
- HTML parsing is less reliable than XBRL and may miss data.
- Assumes USD currency and standard GAAP reporting.
- Chart display depends on having image viewers installed on Linux.
## Dependencies
- `os`, `sys`, `subprocess`: Standard library for system operations.
- `matplotlib.pyplot`: For chart generation.
- `pandas`: For data manipulation and Series operations.
- `edgar.Company`, `edgar.set_identity`, etc.: From edgartools library for SEC data access.
- `BeautifulSoup`: For HTML parsing.
- `re`: For regex pattern matching.
## Codebase Overview
This codebase is a suite of Python scripts for analyzing SEC filings and financial data, built around the `edgartools` library. It focuses on retrieving, processing, and visualizing financial information from public companies.
### Core Scripts
- **`chart_generator.py`**: Generates financial charts (Revenue, Gross Profit, Net Income, margins) from 10-Q and 10-K filings.
- **`fetch_today_filings.py`**: Fetches and displays latest SEC filings with interactive filtering by form type and ownership.
- **`get_company_details.py`**: Retrieves and displays company information (name, CIK, addresses, industry) for a given ticker.
- **`daily_insiders.py`**: Extracts insider transactions from recent Form 4 filings.
- **`investor_holdings.py`**: Analyzes 13F holdings for institutional investors and insider transactions, with caching and interactive selection.
- **`parse_duggan_filing.py`**: Parses a specific Form 4 HTML file for transaction details.
- **`test_cache.py`**: Tests the caching functionality for 13F filings.
### Configuration and Support Files
- **`investors.json`**: JSON configuration file containing a list of notable investors (Warren Buffett, Cathie Wood, etc.) with their tickers and types (13F or insider).
- **`AGENTS.md`**: Instructions for AI agents, including edgartools API documentation link and reminder to use `pyright` for code checking.
- **`run.sh`**: Shell script that activates the virtual environment and runs `investor_holdings.py`.
### Directories
- **`charts/`**: Stores generated PNG chart files from `chart_generator.py`.
- **`cache/`**: Caches processed data (holdings, transactions) to avoid redundant API calls.
- **`edgar_cache/`**: Local storage for edgartools library caching of SEC filings.
- **`venv/`**: Python virtual environment with required dependencies.
### Common Patterns
- All scripts check for virtual environment activation.
- Use `edgartools` for SEC data access with identity setting.
- Implement local caching to improve performance.
- Handle errors gracefully and provide informative output.
- Many scripts support command-line arguments and interactive modes.
## Troubleshooting
- If charts don't display, ensure an image viewer is installed (e.g., `sudo apt install eog`).
- For data issues, check SEC filings availability or try different tickers.
- Run `pyright chart_generator.py` to check for type errors.
- Ensure virtual environment is activated before running.
## API Reference
For more details on edgartools usage, see: https://edgartools.readthedocs.io/en/stable/api/company/