12 KiB
Statement Class Documentation
Overview
The Statement class represents a single financial statement extracted from XBRL data. It provides methods for viewing, manipulating, and analyzing financial statement data including income statements, balance sheets, cash flow statements, and disclosure notes.
A Statement object contains:
- Line items with values across multiple periods
- Hierarchy showing the structure and relationships
- Metadata including concept names and labels
- Period information for time-series analysis
Getting a Statement
From XBRL
# Get XBRL data first
xbrl = filing.xbrl()
# Access specific statements
income = xbrl.statements.income_statement()
balance = xbrl.statements.balance_sheet()
cashflow = xbrl.statements.cash_flow_statement()
equity = xbrl.statements.statement_of_equity()
# By name
cover_page = xbrl.statements['CoverPage']
# By index
first_statement = xbrl.statements[0]
Viewing Statements
Rich Display
# Print statement to see formatted table
print(income)
# Shows:
# - Statement title
# - Line items with hierarchical structure
# - Values for multiple periods
# - Proper number formatting
Text Representation
# Get plain text version
text = str(income)
# Or explicitly
text_output = income.text()
Converting to DataFrame
Basic Conversion
# Convert statement to pandas DataFrame
df = income.to_dataframe()
# DataFrame structure:
# - Index: Line item labels or concepts
# - Columns: Period dates
# - Values: Financial amounts
With Period Filter
# Filter to specific periods
df = income.to_dataframe(period_filter='2024')
# Only includes periods matching the filter
Accessing Specific Data
# Convert to DataFrame for easy analysis
df = income.to_dataframe()
# Access specific line items
revenue = df.loc['Revenue']
net_income = df.loc['Net Income']
# Access specific periods
current_period = df.iloc[:, 0] # First column (most recent)
prior_period = df.iloc[:, 1] # Second column
# Specific cell
current_revenue = df.loc['Revenue', df.columns[0]]
Statement Properties
Available Periods
# Get list of periods in the statement
periods = statement.periods
# Each period is a date string (YYYY-MM-DD)
for period in periods:
print(f"Data available for: {period}")
Statement Name and Type
# Get statement information
name = statement.name # Statement display name
concept = statement.concept # XBRL concept identifier
Raw Data Access
# Get underlying statement data structure
raw_data = statement.get_raw_data()
# Returns list of dictionaries with:
# - concept: XBRL concept name
# - label: Display label
# - values: Dict of period -> value
# - level: Hierarchy depth
# - all_names: All concept variations
Rendering and Display
Custom Rendering
# Render with specific options
rendered = statement.render()
# Rendered statement has rich formatting
print(rendered)
Text Export
# Get markdown-formatted text
markdown_text = statement.text()
# Suitable for:
# - AI/LLM consumption
# - Documentation
# - Text-based analysis
Working with Statement Data
Calculate Growth Rates
# Convert to DataFrame
df = income.to_dataframe()
# Calculate period-over-period growth
if len(df.columns) >= 2:
current = df.iloc[:, 0]
prior = df.iloc[:, 1]
# Growth rate
growth = ((current - prior) / prior * 100).round(2)
# Create comparison DataFrame
comparison = pd.DataFrame({
'Current': current,
'Prior': prior,
'Growth %': growth
})
print(comparison)
Extract Specific Metrics
# Get income statement metrics
df = income.to_dataframe()
# Extract key metrics from most recent period
current = df.iloc[:, 0]
metrics = {
'Revenue': current.get('Revenue', 0),
'Operating Income': current.get('Operating Income', 0),
'Net Income': current.get('Net Income', 0),
}
# Calculate derived metrics
if metrics['Revenue'] > 0:
metrics['Operating Margin'] = (
metrics['Operating Income'] / metrics['Revenue'] * 100
)
metrics['Net Margin'] = (
metrics['Net Income'] / metrics['Revenue'] * 100
)
Filter Line Items
# Convert to DataFrame
df = balance.to_dataframe()
# Filter for specific items
asset_items = df[df.index.str.contains('Asset', case=False)]
liability_items = df[df.index.str.contains('Liabilit', case=False)]
# Get subtotals
if 'Current Assets' in df.index:
current_assets = df.loc['Current Assets']
Time Series Analysis
# Get multiple periods
df = income.to_dataframe()
# Plot revenue trend
if 'Revenue' in df.index:
revenue_series = df.loc['Revenue']
# Convert to numeric and plot
import matplotlib.pyplot as plt
revenue_series.plot(kind='line', title='Revenue Trend')
plt.show()
Common Workflows
Compare Current vs Prior Period
# Get income statement
income = xbrl.statements.income_statement()
df = income.to_dataframe()
# Ensure we have at least 2 periods
if len(df.columns) >= 2:
# Create comparison
comparison = pd.DataFrame({
'Current': df.iloc[:, 0],
'Prior': df.iloc[:, 1],
'Change': df.iloc[:, 0] - df.iloc[:, 1],
'Change %': ((df.iloc[:, 0] - df.iloc[:, 1]) / df.iloc[:, 1] * 100).round(2)
})
# Show key metrics
key_items = ['Revenue', 'Operating Income', 'Net Income']
for item in key_items:
if item in comparison.index:
print(f"\n{item}:")
print(comparison.loc[item])
Extract All Periods to CSV
# Get statement
statement = xbrl.statements.income_statement()
# Convert and save
df = statement.to_dataframe()
df.to_csv('income_statement.csv')
print(f"Exported {len(df)} line items across {len(df.columns)} periods")
Build Financial Ratios
# Get both income statement and balance sheet
income = xbrl.statements.income_statement()
balance = xbrl.statements.balance_sheet()
# Convert to DataFrames
income_df = income.to_dataframe()
balance_df = balance.to_dataframe()
# Extract values (most recent period)
revenue = income_df.loc['Revenue', income_df.columns[0]]
net_income = income_df.loc['Net Income', income_df.columns[0]]
total_assets = balance_df.loc['Assets', balance_df.columns[0]]
total_equity = balance_df.loc['Equity', balance_df.columns[0]]
# Calculate ratios
ratios = {
'Net Profit Margin': (net_income / revenue * 100).round(2),
'ROA': (net_income / total_assets * 100).round(2),
'ROE': (net_income / total_equity * 100).round(2),
'Asset Turnover': (revenue / total_assets).round(2),
}
print("Financial Ratios:")
for ratio, value in ratios.items():
print(f" {ratio}: {value}")
Search for Specific Items
# Get statement as DataFrame
df = income.to_dataframe()
# Search for items containing keywords
research_costs = df[df.index.str.contains('Research', case=False)]
tax_items = df[df.index.str.contains('Tax', case=False)]
# Or get raw data with concept names
raw = income.get_raw_data()
research_concepts = [
item for item in raw
if 'research' in item['label'].lower()
]
Aggregate Subcategories
# Get statement
df = balance.to_dataframe()
# Define categories (adjust based on actual labels)
current_asset_categories = [
'Cash and Cash Equivalents',
'Accounts Receivable',
'Inventory',
'Other Current Assets'
]
# Sum categories
current_assets_sum = sum([
df.loc[cat, df.columns[0]]
for cat in current_asset_categories
if cat in df.index
])
# Verify against reported total
if 'Current Assets' in df.index:
reported_total = df.loc['Current Assets', df.columns[0]]
print(f"Calculated: {current_assets_sum}")
print(f"Reported: {reported_total}")
print(f"Difference: {current_assets_sum - reported_total}")
Integration with Analysis Tools
With Pandas
# Statement integrates seamlessly with pandas
df = statement.to_dataframe()
# Use all pandas functionality
summary = df.describe()
correlations = df.T.corr()
rolling_avg = df.T.rolling(window=4).mean()
With NumPy
import numpy as np
# Convert to numpy array for numerical operations
df = statement.to_dataframe()
values = df.values
# Numerical analysis
mean_values = np.mean(values, axis=1)
std_values = np.std(values, axis=1)
growth_rates = np.diff(values, axis=1) / values[:, :-1]
Export for Visualization
# Prepare data for plotting
df = income.to_dataframe()
# Select key items
plot_items = ['Revenue', 'Operating Income', 'Net Income']
plot_data = df.loc[plot_items].T
# Plot with matplotlib
import matplotlib.pyplot as plt
plot_data.plot(kind='bar', figsize=(12, 6))
plt.title('Income Statement Trends')
plt.xlabel('Period')
plt.ylabel('Amount (USD)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Error Handling
Missing Line Items
# Check if item exists before accessing
df = statement.to_dataframe()
if 'Revenue' in df.index:
revenue = df.loc['Revenue']
else:
print("Revenue not found in statement")
# Try alternative names
for alt in ['Revenues', 'Total Revenue', 'Net Revenue']:
if alt in df.index:
revenue = df.loc[alt]
break
Handling Different Formats
# Companies may use different labels
def find_item(df, possible_names):
"""Find item by trying multiple possible names."""
for name in possible_names:
if name in df.index:
return df.loc[name]
return None
# Usage
revenue_names = ['Revenue', 'Revenues', 'Total Revenue', 'Net Sales']
revenue = find_item(df, revenue_names)
if revenue is not None:
print(f"Found revenue: {revenue}")
else:
print("Revenue not found under common names")
Incomplete Period Data
# Check data availability
df = statement.to_dataframe()
# Check for null values
missing_data = df.isnull().sum()
if missing_data.any():
print("Periods with missing data:")
print(missing_data[missing_data > 0])
# Fill missing with 0 or forward fill
df_filled = df.fillna(0) # Replace NaN with 0
# or
df_filled = df.fillna(method='ffill') # Forward fill
Best Practices
-
Always convert to DataFrame for analysis:
df = statement.to_dataframe() # Easier to work with -
Check item names before accessing:
if 'Revenue' in df.index: revenue = df.loc['Revenue'] -
Handle multiple naming conventions:
# Try variations for name in ['Revenue', 'Revenues', 'Total Revenue']: if name in df.index: revenue = df.loc[name] break -
Validate calculated values:
# Check against reported totals calculated = sum(components) reported = df.loc['Total'] assert abs(calculated - reported) < 0.01, "Mismatch!" -
Use period filters appropriately:
# Filter to specific years df_2024 = statement.to_dataframe(period_filter='2024')
Performance Tips
Caching DataFrames
# Cache the DataFrame if using repeatedly
df_cache = statement.to_dataframe()
# Reuse cached version
revenue = df_cache.loc['Revenue']
net_income = df_cache.loc['Net Income']
# ... more operations
Selective Period Loading
# If you only need recent data
current_only = xbrl.current_period.income_statement()
df = current_only.to_dataframe() # Smaller, faster
Troubleshooting
"KeyError: Line item not found"
Cause: Item label doesn't match exactly
Solution:
# List all available items
print(df.index.tolist())
# Or search for pattern
matching = df[df.index.str.contains('keyword', case=False)]
"Empty DataFrame"
Cause: Statement has no data or wrong period filter
Solution:
# Check raw data
raw = statement.get_raw_data()
print(f"Statement has {len(raw)} items")
# Check periods
print(f"Available periods: {statement.periods}")
"Index error when accessing columns"
Cause: Fewer periods than expected
Solution:
# Check column count first
if len(df.columns) >= 2:
current = df.iloc[:, 0]
prior = df.iloc[:, 1]
else:
print("Insufficient periods for comparison")
This guide covers the essential patterns for working with Statement objects in edgartools. For information on accessing statements from XBRL, see the XBRL documentation.