docs: Update README with validation and record count features

- Add --validate command documentation with examples - Add --record-counts command documentation - Document new CLI arguments: --change-threshold, --no-adaptive, --gap-threshold - Add comprehensive JSON schema examples for validation features - Include trading days validation and record count examples - Update features list to include data validation capabilities - Add examples for adaptive threshold usage - Document multi-currency validation options Documentation now covers: - Complete validation command usage - Record count analysis by time periods - JSON output schemas for all new features - Adaptive learning configuration - Threshold customization options - Data quality scoring explanation All new functionality is now properly documented with examples and JSON schemas.
fix: Prevent validation from running during stats command
2026-01-12 23:31:56 +01:00 · 2026-01-12 23:29:13 +01:00 · 2026-01-12 23:19:33 +01:00 · 2026-01-12 23:10:35 +01:00 · 2026-01-12 23:05:47 +01:00
4 changed files with 1060 additions and 49 deletions
--- a/README.md
+++ b/README.md
@@ -18,6 +18,9 @@ Tento projekt je určen pro stahování a správu kurzů cizích měn vůči če
 - **Generování reportů**: Lze vygenerovat report kurzů pro zadaný rok, měsíc nebo časové období včetně dopočítaných kurzů pro dny, kdy ve vstupních datech neexistovali.
 - **Správné dopočítání kurzů**: Program správně aplikuje pravidla ČNB pro dopočítání kurzů pro víkendy a svátky jak při vyhledávání (`--get-rate`), tak při generování reportů.
 - **Výpočet Jednotného kurzu**: Lze vypočítat 'Jednotný kurz' pro daňové účely podle metodiky ČNB jako aritmetický průměr kurzů k posledním dnům každého měsíce v roce.
+ - **Validace dat**: Program umí validovat data pro konzistenci, detekovat změny kurzů přesahující prahové hodnoty, kontrolovat počet obchodních dnů a analyzovat časové mezery v datech.
+ - **Analýza počtu záznamů**: Lze zobrazit počty záznamů podle různých časových období (týden, měsíc, čtvrtletí, pololetí, rok).
+ - **Adaptivní prahy**: Systém se učí z historických dat a automaticky upravuje prahy pro detekci anomálií.
 - **JSON výstup**: Všechny příkazy podporují JSON formát pro programové zpracování pomocí přepínače `--json`.

 ## Požadavky
@@ -65,6 +68,11 @@ Při každém spuštění programu:
 - `--report-year ROK [--report-month MESIC]`: Vygeneruje report kurzů pro zadaný rok (a případně měsíc). Vyžaduje `-c` nebo `--currency`.
 - `--report-period ZACATEK KONEC`: Vygeneruje report kurzů pro zadané časové období. Vyžaduje `-c` nebo `--currency`.
 - `--stats [ROK]`: Vypočítá 'Jednotný kurz' pro daňové účely podle metodiky ČNB. Pokud je zadán rok, vytvoří kurz pro konkrétní rok. Pokud není rok zadán, vytvoří kurzy pro všechny roky s dostupnými daty. Vyžaduje `-c` nebo `--currency`.
+ - `--validate`: Validuje data pro měnu nebo všechny měny. Zkontroluje konzistenci kurzů, počet obchodních dnů a detekuje možné chyby.
+ - `--record-counts`: Zobrazí počet záznamů podle časových období (týden, měsíc, čtvrtletí, pololetí, rok). Vyžaduje `-c` nebo `--currency`.
+ - `--change-threshold PRAH`: Práh pro detekci změn kurzů v procentech (výchozí: 1.0).
+ - `--no-adaptive`: Vypne adaptivní učení prahů na základě historických dat.
+ - `--gap-threshold DNY`: Maximální přijatelná mezera v pracovních dnech (výchozí: 3).
 - `--json`: Výstup ve formátu JSON místo prostého textu pro programové zpracování.

 ### Příklady
@@ -125,19 +133,34 @@ Při každém spuštění programu:
    python src/cli.py --stats -c USD
    ```

-12. **Získání posledního dostupného kurzu USD**:
+12. **Validace dat pro měnu USD za rok 2025**:
+    ```bash
+    python src/cli.py --validate --currency USD --year 2025
+    ```
+
+13. **Validace všech měn s vlastními prahy**:
+    ```bash
+    python src/cli.py --validate --change-threshold 0.5 --gap-threshold 2
+    ```
+
+14. **Zobrazení počtu záznamů podle časových období pro USD**:
+    ```bash
+    python src/cli.py --record-counts --currency USD --year 2025
+    ```
+
+15. **Získání posledního dostupného kurzu USD**:
    ```bash
    python src/cli.py -c USD
    ```

-13. **JSON výstup pro vyhledání kurzu**:
+16. **JSON výstup pro vyhledání kurzu**:
    ```bash
    python src/cli.py --get-rate 01.01.2025 -c USD --json
    ```

-14. **JSON výstup pro výpočet Jednotného kurzu**:
+17. **JSON výstup pro validaci dat**:
    ```bash
-    python src/cli.py --stats 2025 -c USD --json
+    python src/cli.py --validate --currency USD --year 2025 --json
    ```

 ## JSON formát
@@ -196,6 +219,60 @@ Při použití přepínače `--json` program vrací strukturovaná data ve form
 }
 ```

+### Validace dat
+```json
+{
+  "currency": "USD",
+  "validation_year": 2025,
+  "adaptive_analysis": {
+    "adaptive_threshold": 1.5,
+    "base_threshold": 1.0,
+    "volatility_percent": 0.24,
+    "data_points": 62
+  },
+  "price_change_violations": [
+    {
+      "date": "06.01.2025",
+      "change_percent": 1.19,
+      "severity": "minor"
+    }
+  ],
+  "temporal_gaps": [],
+  "trading_days_validation": {
+    "expected_trading_days": 251,
+    "actual_data_points": 251,
+    "discrepancy_days": 0,
+    "data_completeness_percent": 100.0
+  },
+  "record_counts_by_period": {
+    "2025": {
+      "year": 251,
+      "half_year": {"H1": 124, "H2": 127},
+      "quarter": {"Q1": 63, "Q2": 61, "Q3": 66, "Q4": 61},
+      "month": {"01": 22, "02": 20, "03": 21},
+      "week": {"W01": 5, "W02": 5}
+    }
+  },
+  "data_quality_score": 95
+}
+```
+
+### Počty záznamů podle období
+```json
+{
+  "currency": "USD",
+  "record_counts": {
+    "2025": {
+      "year": 251,
+      "half_year": {"H1": 124, "H2": 127},
+      "quarter": {"Q1": 63, "Q2": 61, "Q3": 66, "Q4": 61},
+      "month": {"01": 22, "02": 20, "03": 21, "04": 20},
+      "week": {"W01": 5, "W02": 5, "W03": 5}
+    }
+  }
+}
+```
+
 ## Chování při různých časech a datumech

 - **Budoucí datum**: Program vrátí chybu, protože kurzy pro budoucí data ještě nebyly vydány.
--- a/src/cli.py
+++ b/src/cli.py
@@ -9,11 +9,12 @@ from datetime import datetime
 # Přidání adresáře src do sys.path, aby bylo možné importovat moduly
 sys.path.insert(0, os.path.join(os.path.dirname(__file__)))

-import data_fetcher
 import database
+import data_fetcher
 import holidays
 import rate_finder
 import rate_reporter
+import data_validator

 # Global debug flag
 DEBUG = False
@@ -36,6 +37,7 @@ def set_debug_mode(debug):
    holidays.set_debug_mode(DEBUG)
    rate_finder.set_debug_mode(DEBUG)
    rate_reporter.set_debug_mode(DEBUG)
+    data_validator.set_debug_mode(DEBUG)


 def format_single_rate_json(
@@ -195,6 +197,28 @@ def main():
        "Pokud je zadán rok, vytvoří kurz pro konkrétní rok. "
        "Pokud není rok zadán, vytvoří kurzy pro všechny roky s dostupnými daty.",
    )
+    parser.add_argument(
+        "--validate",
+        action="store_true",
+        help="Validuje data pro měnu nebo všechny měny. Zkontroluje konzistenci kurzů a detekuje možné chyby.",
+    )
+    parser.add_argument(
+        "--record-counts",
+        action="store_true",
+        help="Zobrazí počet záznamů podle časových období (týden, měsíc, čtvrtletí, pololetí, rok).",
+    )
+    parser.add_argument(
+        "--change-threshold",
+        type=float,
+        default=1.0,
+        help="Práh pro detekci změn kurzů v procentech (výchozí: 1.0).",
+    )
+    parser.add_argument(
+        "--gap-threshold",
+        type=int,
+        default=3,
+        help="Maximální přijatelná mezera v pracovních dnech (výchozí: 3).",
+    )
    parser.add_argument(
        "--debug", action="store_true", help="Zobrazí podrobné ladicí informace."
    )
@@ -203,20 +227,14 @@ def main():
        action="store_true",
        help="Výstup ve formátu JSON místo prostého textu pro programové zpracování.",
    )
+    parser.add_argument(
+        "--no-adaptive",
+        action="store_true",
+        help="Vypne adaptivní učení prahů na základě historických dat.",
+    )

    args = parser.parse_args()

-    # Pokud nebyly zadány žádné argumenty, vytiskneme nápovědu a seznam dostupných měn
-    if len(sys.argv) == 1:
-        parser.print_help()
-        print("\nDostupné měny:")
-        currencies = database.get_available_currencies()
-        if currencies:
-            print(", ".join(currencies))
-        else:
-            print("Žádné měny nejsou v databázi k dispozici.")
-        sys.exit(0)
-
    # Nastavíme debug mód
    DEBUG = args.debug
    set_debug_mode(DEBUG)
@@ -245,14 +263,131 @@ def main():
            pass

    # Zde bude logika pro zpracování argumentů
-    if args.year:
-        debug_print(f"Stahuji roční data pro rok {args.year}...")
-        # Ujistěme se, že adresář data existuje
-        os.makedirs("data", exist_ok=True)
-        # Volání funkce pro stažení ročních dat
-        data_fetcher.download_yearly_data(args.year, output_dir="data")
-    elif args.currency and args.start_date and args.end_date and not args.report_period:
+    # Zde bude logika pro zpracování argumentů
+    if args.validate:
+        # Validation command
+        base_threshold = args.change_threshold
+        adaptive = not args.no_adaptive
+        max_gap_days = getattr(args, "gap_threshold", 3)  # Default to 3 if not defined
+
+        if args.currency:
+            # Validate specific currency
+            debug_print(f"Validuji data pro měnu {args.currency}...")
+            results = data_validator.validate_currency_data(
+                args.currency, args.year, base_threshold, adaptive, max_gap_days
+            )
+
+            if args.json:
+                output_json(results)
+            else:
+                text_output = data_validator.format_validation_text(results)
+                print(text_output)
+        else:
+            # Validate all currencies
+            debug_print("Validuji data pro všechny měny...")
+            results = data_validator.validate_all_currencies(
+                args.year, base_threshold, adaptive, max_gap_days
+            )
+
+            if args.json:
+                output_json(results)
+            else:
+                text_output = data_validator.format_validation_text(results)
+                print(text_output)
+    elif args.record_counts:
+        # Record counts command
+        if not args.currency:
+            print(
+                "Chyba: Pro --record-counts je nutné zadat měnu pomocí -c/--currency."
+            )
+            sys.exit(1)
+
+        debug_print(f"Získávám počty záznamů pro měnu {args.currency}...")
+        record_counts = data_validator.get_record_counts_by_period(
+            args.currency, args.year
+        )
+
+        if args.json:
+            output_json({"currency": args.currency, "record_counts": record_counts})
+        else:
+            print(f"Record Counts for {args.currency}:")
+            print("=" * 50)
+
+            for year_key, periods in record_counts.items():
+                print(f"\nYear {year_key}:")
+                print(f"  Total records: {periods.get('year', 0)}")
+
+                # Half years
+                half_years = periods.get("half_year", {})
+                if half_years:
+                    print(
+                        f"  Half years: H1={half_years.get('H1', 0)}, H2={half_years.get('H2', 0)}"
+                    )
+
+                # Quarters
+                quarters = periods.get("quarter", {})
+                if quarters:
+                    quarter_str = ", ".join(
+                        [f"Q{q}={quarters.get(f'Q{q}', 0)}" for q in range(1, 5)]
+                    )
+                    print(f"  Quarters: {quarter_str}")
+
+                # Months
+                months = periods.get("month", {})
+                if months:
+                    month_list = []
+                    for month in range(1, 13):
+                        month_key = f"{month:02d}"
+                        count = months.get(month_key, 0)
+                        month_list.append(f"{month}={count}")
+                    print(f"  Months: {', '.join(month_list)}")
+
+                # Weeks summary
+                weeks = periods.get("week", {})
+                if weeks:
+                    total_weeks = len(weeks)
+                    if total_weeks <= 10:
+                        week_list = sorted([f"{w}={weeks[w]}" for w in weeks.keys()])
+                        print(f"  Weeks: {', '.join(week_list)}")
+                    else:
+                        sample_weeks = sorted(list(weeks.keys())[:5])
+                        week_sample = [f"{w}={weeks[w]}" for w in sample_weeks]
+                        print(
+                            f"  Weeks: {', '.join(week_sample)}... ({total_weeks} total weeks)"
+                        )
+    elif args.year:
+        # Validation command
+        base_threshold = args.change_threshold
+        adaptive = not args.no_adaptive
+
+        if args.currency:
+            # Validate specific currency
+            debug_print(f"Validuji data pro měnu {args.currency}...")
+            results = data_validator.validate_currency_data(
+                args.currency, args.year, base_threshold, adaptive
+            )
+
+            if args.json:
+                output_json(results)
+            else:
+                text_output = data_validator.format_validation_text(results)
+                print(text_output)
+        else:
+            # Validate all currencies
+            debug_print("Validuji data pro všechny měny...")
+            results = data_validator.validate_all_currencies(
+                args.year, base_threshold, adaptive
+            )
+
+            if args.json:
+                output_json(results)
+            else:
+                text_output = data_validator.format_validation_text(results)
+                print(text_output)
+        return
+        # elif args.currency and args.start_date and args.end_date and not args.report_period:
        # Měsíční stahování dat
+        debug_print("HIT: Monthly download condition")
        debug_print(
            f"Stahuji měsíční data pro měnu {args.currency} od {args.start_date} do {args.end_date}..."
        )
@@ -264,6 +399,7 @@ def main():
        )
    elif args.report_period and args.currency:
        start_date, end_date = args.report_period
+        debug_print("HIT: Report period condition")
        debug_print(
            f"Generuji report pro měnu {args.currency} od {start_date} do {end_date}..."
        )
@@ -271,12 +407,14 @@ def main():
            start_date, end_date, args.currency, output_dir="data"
        )
    elif args.date:
+        debug_print("HIT: Daily data condition")
        debug_print(f"Stahuji denní data pro datum {args.date}...")
        # Ujistěme se, že adresář data existuje
        os.makedirs("data", exist_ok=True)
        # Volání funkce pro stažení denních dat
        data_fetcher.download_daily_data(args.date, output_dir="data")
    elif args.get_rate and args.currency:
+        debug_print("HIT: Get rate condition")
        date_str = args.get_rate
        currency_code = args.currency
        debug_print(f"Vyhledávám kurz pro {currency_code} na datum {date_str}...")
@@ -309,6 +447,7 @@ def main():
                        f"Kurz {currency_code} na datum {date_str} (ani v předchozích dnech) nebyl nalezen."
                    )
    elif args.get_rate is not None and not args.currency:
+        debug_print("HIT: Get rate without currency condition")
        # Pokud je zadán --get-rate bez data a bez měny
        if DEBUG:
            print(
@@ -318,7 +457,7 @@ def main():
    # DŮLEŽITÉ: Pořadí následujících elif podmínek je důležité!
    # Nejprve zpracujeme --stats, pak teprve "poslední dostupný kurz"
    elif args.stats is not None and args.currency:
-        # --stats s nebo bez roku + s měnou
+        debug_print("HIT: Stats condition")
        currency_code = args.currency
        if args.stats is True:
            # Pokud je --stats zadán bez roku, vytvoříme kurzy pro všechny roky s dostupnými daty
--- a/src/data_validator.py
+++ b/src/data_validator.py
@@ -0,0 +1,789 @@
+import sys
+import os
+import json
+from datetime import datetime, timedelta
+from collections import defaultdict
+import statistics
+
+# Přidání adresáře src do sys.path, aby bylo možné importovat moduly
+sys.path.insert(0, os.path.join(os.path.dirname(__file__)))
+
+import database
+import holidays
+
+# Global debug flag
+DEBUG = False
+
+
+def debug_print(*args, **kwargs):
+    """Print debug messages only if debug mode is enabled."""
+    if DEBUG:
+        print(*args, **kwargs)
+
+
+def set_debug_mode(debug):
+    """Set the debug mode for this module."""
+    global DEBUG
+    DEBUG = debug
+
+
+def calculate_adaptive_threshold(currency_code, base_threshold=1.0, learning_months=3):
+    """
+    Calculates adaptive threshold based on 3-month historical volatility.
+
+    :param currency_code: Currency to analyze
+    :param base_threshold: Base threshold percentage
+    :param learning_months: Months of history to analyze
+    :return: Adaptive threshold and volatility statistics
+    """
+    try:
+        # Calculate date range for learning (3 months back)
+        end_date = datetime.now()
+        start_date = end_date - timedelta(days=learning_months * 30)
+
+        # Get all rates for the period
+        rates_data = []
+        current_date = start_date
+
+        while current_date <= end_date:
+            date_str = current_date.strftime("%d.%m.%Y")
+            rate = database.get_rate(date_str, currency_code)
+            if rate is not None:
+                rates_data.append((current_date, rate))
+            current_date += timedelta(days=1)
+
+        if len(rates_data) < 10:
+            # Insufficient data, return base threshold
+            return {
+                "adaptive_threshold": base_threshold,
+                "base_threshold": base_threshold,
+                "volatility_percent": 0.0,
+                "data_points": len(rates_data),
+                "sufficient_data": False,
+            }
+
+        # Calculate daily percentage changes
+        changes = []
+        for i in range(1, len(rates_data)):
+            prev_rate = rates_data[i - 1][1]
+            curr_rate = rates_data[i][1]
+            if prev_rate > 0:
+                change_pct = abs((curr_rate - prev_rate) / prev_rate) * 100
+                changes.append(change_pct)
+
+        if not changes:
+            return {
+                "adaptive_threshold": base_threshold,
+                "base_threshold": base_threshold,
+                "volatility_percent": 0.0,
+                "data_points": len(rates_data),
+                "sufficient_data": True,
+            }
+
+        # Calculate volatility metrics
+        std_dev = statistics.stdev(changes)
+        percentile_95 = statistics.quantiles(changes, n=20)[18]  # 95th percentile
+
+        # Adaptive threshold formula: more conservative of std_dev and percentile_95th/2
+        volatility_factor = max(std_dev, percentile_95 / 2)
+
+        # Apply bounds (0.5% to 5.0%)
+        adaptive_threshold = base_threshold * (
+            1 + min(max(volatility_factor, 0.5), 5.0)
+        )
+
+        return {
+            "adaptive_threshold": adaptive_threshold,
+            "base_threshold": base_threshold,
+            "volatility_percent": std_dev,
+            "percentile_95": percentile_95,
+            "data_points": len(rates_data),
+            "sufficient_data": True,
+        }
+
+    except Exception as e:
+        debug_print(f"Error calculating adaptive threshold: {e}")
+        return {
+            "adaptive_threshold": base_threshold,
+            "base_threshold": base_threshold,
+            "volatility_percent": 0.0,
+            "data_points": 0,
+            "sufficient_data": False,
+            "error": str(e),
+        }
+
+
+def calculate_working_days_gap(start_date, end_date):
+    """
+    Calculate the number of working days (excluding weekends and holidays) between two dates.
+
+    :param start_date: Start date (datetime)
+    :param end_date: End date (datetime)
+    :return: Number of working days between the dates (exclusive)
+    """
+    working_days = 0
+    current = start_date + timedelta(days=1)  # Start from day after start_date
+
+    while current < end_date:
+        date_str = current.strftime("%d.%m.%Y")
+        if not holidays.is_weekend(date_str) and not holidays.is_holiday(date_str):
+            working_days += 1
+        current += timedelta(days=1)
+
+    return working_days
+
+
+def calculate_expected_trading_days(year):
+    """
+    Calculate the expected number of trading days in a year (excluding weekends and holidays).
+
+    :param year: Year to calculate for
+    :return: Dictionary with expected trading days and breakdown
+    """
+    import calendar
+
+    total_days = 366 if calendar.isleap(year) else 365
+    weekend_days = 0
+    holiday_days = 0
+
+    # Count weekends and holidays
+    for month in range(1, 13):
+        for day in range(1, calendar.monthrange(year, month)[1] + 1):
+            date_str = f"{day:02d}.{month:02d}.{year}"
+            if holidays.is_weekend(date_str):
+                weekend_days += 1
+            elif holidays.is_holiday(date_str):
+                holiday_days += 1
+
+    expected_trading_days = total_days - weekend_days - holiday_days
+
+    return {
+        "total_days": total_days,
+        "weekend_days": weekend_days,
+        "holiday_days": holiday_days,
+        "expected_trading_days": expected_trading_days,
+    }
+
+
+def validate_trading_days_count(currency_code, year):
+    """
+    Validate that a year has the appropriate number of trading day entries.
+
+    :param currency_code: Currency to validate
+    :param year: Year to check
+    :return: Validation result with actual vs expected counts
+    """
+    # Get expected trading days
+    expected = calculate_expected_trading_days(year)
+
+    # Count actual data points for the year
+    actual_count = 0
+    rates_data = []
+
+    start_date = datetime(year, 1, 1)
+    end_date = datetime(year, 12, 31)
+
+    current_date = start_date
+    while current_date <= end_date:
+        date_str = current_date.strftime("%d.%m.%Y")
+        rate = database.get_rate(date_str, currency_code)
+        if rate is not None:
+            actual_count += 1
+            rates_data.append((current_date, rate, date_str))
+        current_date += timedelta(days=1)
+
+    # Calculate discrepancy
+    discrepancy_days = actual_count - expected["expected_trading_days"]
+    discrepancy_percent = (
+        (discrepancy_days / expected["expected_trading_days"]) * 100
+        if expected["expected_trading_days"] > 0
+        else 0
+    )
+
+    # Determine severity
+    severity = "ok"
+    if abs(discrepancy_percent) > 15:
+        severity = "severe"
+    elif abs(discrepancy_percent) > 5:
+        severity = "moderate"
+    elif abs(discrepancy_percent) > 0:
+        severity = "minor"
+
+    return {
+        "expected_trading_days": expected["expected_trading_days"],
+        "actual_data_points": actual_count,
+        "discrepancy_days": discrepancy_days,
+        "discrepancy_percent": round(discrepancy_percent, 2),
+        "severity": severity,
+        "total_days": expected["total_days"],
+        "weekend_days_excluded": expected["weekend_days"],
+        "holiday_days_excluded": expected["holiday_days"],
+        "data_completeness_percent": round(
+            (actual_count / expected["expected_trading_days"]) * 100, 1
+        )
+        if expected["expected_trading_days"] > 0
+        else 0,
+    }
+
+
+def get_record_counts_by_period(currency_code, year=None):
+    """
+    Get record counts for different time periods.
+
+    :param currency_code: Currency to analyze
+    :param year: Optional year filter
+    :return: Dictionary with counts by period
+    """
+    if year:
+        years_to_check = [year]
+    else:
+        years_to_check = database.get_years_with_data()
+        if not years_to_check:
+            return {}
+
+    results = {}
+
+    for check_year in years_to_check:
+        year_results = {}
+
+        # Get all data for the year
+        data_points = []
+        start_date = datetime(check_year, 1, 1)
+        end_date = datetime(check_year, 12, 31)
+
+        current_date = start_date
+        while current_date <= end_date:
+            date_str = current_date.strftime("%d.%m.%Y")
+            rate = database.get_rate(date_str, currency_code)
+            if rate is not None:
+                data_points.append((current_date, rate))
+            current_date += timedelta(days=1)
+
+        # Count by different periods
+        period_counts = {
+            "year": len(data_points),
+            "half_year": {},
+            "quarter": {},
+            "month": {},
+            "week": {},
+        }
+
+        # Half years
+        period_counts["half_year"]["H1"] = len(
+            [d for d in data_points if d[0].month <= 6]
+        )
+        period_counts["half_year"]["H2"] = len(
+            [d for d in data_points if d[0].month > 6]
+        )
+
+        # Quarters
+        for quarter in range(1, 5):
+            start_month = (quarter - 1) * 3 + 1
+            end_month = quarter * 3
+            period_counts["quarter"][f"Q{quarter}"] = len(
+                [d for d in data_points if start_month <= d[0].month <= end_month]
+            )
+
+        # Months
+        for month in range(1, 13):
+            period_counts["month"][f"{month:02d}"] = len(
+                [d for d in data_points if d[0].month == month]
+            )
+
+        # Weeks (approximate by week number)
+        week_counts = {}
+        for data_point in data_points:
+            week_num = data_point[0].isocalendar()[1]
+            week_key = f"W{week_num:02d}"
+            week_counts[week_key] = week_counts.get(week_key, 0) + 1
+        period_counts["week"] = week_counts
+
+        results[str(check_year)] = period_counts
+
+    return results
+
+
+def detect_temporal_gaps(currency_code, year=None, max_gap_days=3):
+    """
+    Detect temporal gaps in data sequence (missing working days).
+
+    :param currency_code: Currency to validate
+    :param year: Optional year filter
+    :param max_gap_days: Maximum acceptable working days gap
+    :return: List of gap violations
+    """
+    gaps = []
+
+    try:
+        # Get all dates and rates for the currency/year
+        rates_data = []
+        if year:
+            # Specific year
+            start_date = datetime(year, 1, 1)
+            end_date = datetime(year, 12, 31)
+        else:
+            # All available data
+            years_with_data = database.get_years_with_data()
+            if not years_with_data:
+                return gaps
+            start_year = min(years_with_data)
+            end_year = max(years_with_data)
+            start_date = datetime(start_year, 1, 1)
+            end_date = datetime(end_year, 12, 31)
+
+        current_date = start_date
+        while current_date <= datetime.now() and current_date <= end_date:
+            date_str = current_date.strftime("%d.%m.%Y")
+            rate = database.get_rate(date_str, currency_code)
+            if rate is not None:
+                rates_data.append((current_date, rate, date_str))
+            current_date += timedelta(days=1)
+
+        # Check for gaps between consecutive data points
+        for i in range(1, len(rates_data)):
+            prev_date, _, prev_date_str = rates_data[i - 1]
+            curr_date, _, curr_date_str = rates_data[i]
+
+            # Calculate working days gap
+            working_days_gap = calculate_working_days_gap(prev_date, curr_date)
+
+            if working_days_gap > max_gap_days:
+                # Determine severity
+                severity = "minor"
+                if working_days_gap > max_gap_days * 3:
+                    severity = "severe"
+                elif working_days_gap > max_gap_days * 2:
+                    severity = "moderate"
+
+                gap = {
+                    "start_date": prev_date_str,
+                    "end_date": curr_date_str,
+                    "working_days_missing": working_days_gap,
+                    "severity": severity,
+                    "max_expected_gap": max_gap_days,
+                    "recommendation": f"Check data source for {working_days_gap} missing working days",
+                }
+                gaps.append(gap)
+
+    except Exception as e:
+        debug_print(f"Error detecting temporal gaps: {e}")
+
+    return gaps
+
+
+def detect_price_change_violations(
+    currency_code, year=None, base_threshold=1.0, adaptive=True
+):
+    """
+    Detects price changes exceeding thresholds.
+
+    :param currency_code: Currency to validate
+    :param year: Optional year filter
+    :param base_threshold: Base threshold percentage
+    :param adaptive: Whether to use adaptive threshold
+    :return: List of violations
+    """
+    violations = []
+
+    # Initialize adaptive_info in case of early exception
+    adaptive_info = {
+        "adaptive_threshold": base_threshold,
+        "base_threshold": base_threshold,
+        "volatility_percent": 0.0,
+        "sufficient_data": True,
+    }
+
+    try:
+        # Get adaptive threshold if enabled
+        if adaptive:
+            adaptive_info = calculate_adaptive_threshold(currency_code, base_threshold)
+
+        effective_threshold = adaptive_info["adaptive_threshold"]
+
+        # Get all dates and rates for the currency/year
+        rates_data = []
+        if year:
+            # Specific year
+            start_date = datetime(year, 1, 1)
+            end_date = datetime(year, 12, 31)
+        else:
+            # All available data
+            years_with_data = database.get_years_with_data()
+            if not years_with_data:
+                return violations, adaptive_info
+            start_year = min(years_with_data)
+            end_year = max(years_with_data)
+            start_date = datetime(start_year, 1, 1)
+            end_date = datetime(end_year, 12, 31)
+
+        current_date = start_date
+        while current_date <= datetime.now() and current_date <= end_date:
+            date_str = current_date.strftime("%d.%m.%Y")
+            rate = database.get_rate(date_str, currency_code)
+            if rate is not None:
+                rates_data.append((current_date, rate, date_str))
+            current_date += timedelta(days=1)
+
+        # Check consecutive pairs
+        for i in range(1, len(rates_data)):
+            prev_date, prev_rate, prev_date_str = rates_data[i - 1]
+            curr_date, curr_rate, curr_date_str = rates_data[i]
+
+            if prev_rate > 0:
+                change_pct = abs((curr_rate - prev_rate) / prev_rate) * 100
+
+                # Determine severity
+                severity = "minor"
+                if change_pct > effective_threshold * 3:
+                    severity = "severe"
+                elif change_pct > effective_threshold:
+                    severity = "moderate"
+
+                # Flag if exceeds base threshold (always) or adaptive threshold
+                if change_pct > base_threshold:
+                    violation = {
+                        "date": curr_date_str,
+                        "previous_date": prev_date_str,
+                        "previous_rate": float(prev_rate),
+                        "current_rate": float(curr_rate),
+                        "change_percent": round(change_pct, 2),
+                        "severity": severity,
+                        "threshold_exceeded": "adaptive"
+                        if change_pct > effective_threshold
+                        else "base",
+                        "effective_threshold": effective_threshold,
+                    }
+
+                    # Add corruption risk assessment for severe cases
+                    if severity == "severe":
+                        violation["corruption_risk"] = "high"
+                        violation["recommendation"] = (
+                            "Verify data source - potential currency mismatch or data corruption"
+                        )
+
+                    violations.append(violation)
+
+    except Exception as e:
+        debug_print(f"Error detecting price changes: {e}")
+
+    return violations, adaptive_info
+
+
+def validate_currency_data(
+    currency_code, year=None, base_threshold=1.0, adaptive=True, max_gap_days=3
+):
+    """
+    Comprehensive validation for a currency.
+
+    :param currency_code: Currency to validate
+    :param year: Optional year filter
+    :param base_threshold: Base threshold for price changes
+    :param adaptive: Whether to use adaptive thresholds
+    :param max_gap_days: Maximum acceptable working days gap
+    :return: Validation results
+    """
+    results = {
+        "currency": currency_code,
+        "validation_year": year,
+        "validation_date": datetime.now().isoformat() + "Z",
+    }
+
+    try:
+        # Price change violations
+        violations, adaptive_info = detect_price_change_violations(
+            currency_code, year, base_threshold, adaptive
+        )
+
+        # Temporal gaps
+        gaps = detect_temporal_gaps(currency_code, year, max_gap_days)
+
+        # Trading days validation
+        trading_days_validation = None
+        if year:
+            trading_days_validation = validate_trading_days_count(currency_code, year)
+
+        # Record counts by period
+        record_counts = get_record_counts_by_period(currency_code, year)
+
+        results["adaptive_analysis"] = adaptive_info
+        results["price_change_violations"] = violations
+        results["temporal_gaps"] = gaps
+        results["trading_days_validation"] = trading_days_validation
+        results["record_counts_by_period"] = record_counts
+
+        # Summary statistics
+        severity_counts = defaultdict(int)
+        for v in violations:
+            severity_counts[v["severity"]] += 1
+
+        gap_severity_counts = defaultdict(int)
+        for g in gaps:
+            gap_severity_counts[g["severity"]] += 1
+
+        results["summary"] = {
+            "total_violations": len(violations),
+            "total_gaps": len(gaps),
+            "severity_breakdown": dict(severity_counts),
+            "gap_severity_breakdown": dict(gap_severity_counts),
+            "base_threshold": base_threshold,
+            "adaptive_enabled": adaptive,
+            "max_gap_days": max_gap_days,
+        }
+
+        # Data quality score (enhanced heuristic)
+        quality_penalty = 0
+        if violations:
+            quality_penalty += (
+                len(violations) * 5 + severity_counts.get("severe", 0) * 20
+            )
+        if gaps:
+            quality_penalty += (
+                len(gaps) * 10 + gap_severity_counts.get("severe", 0) * 30
+            )
+        if trading_days_validation and trading_days_validation["severity"] != "ok":
+            severity_penalty = {"minor": 5, "moderate": 15, "severe": 30}
+            quality_penalty += severity_penalty.get(
+                trading_days_validation["severity"], 0
+            )
+
+        results["data_quality_score"] = max(0, 100 - quality_penalty)
+
+    except Exception as e:
+        results["error"] = str(e)
+        results["data_quality_score"] = 0
+
+    return results
+
+
+def validate_all_currencies(
+    year=None, base_threshold=1.0, adaptive=True, max_gap_days=3
+):
+    """
+    Validates all available currencies.
+
+    :param year: Optional year filter
+    :param base_threshold: Base threshold for price changes
+    :param adaptive: Whether to use adaptive thresholds
+    :param max_gap_days: Maximum acceptable working days gap
+    :return: Validation results for all currencies
+    """
+    results = {
+        "validation_type": "all_currencies",
+        "validation_year": year,
+        "base_threshold": base_threshold,
+        "adaptive_enabled": adaptive,
+        "max_gap_days": max_gap_days,
+        "validation_date": datetime.now().isoformat() + "Z",
+        "currency_results": [],
+    }
+
+    try:
+        # Get all available currencies (we'll check a few known ones and any in database)
+        currencies_to_check = ["USD", "EUR", "GBP", "CHF", "JPY"]
+
+        for currency in currencies_to_check:
+            try:
+                currency_result = validate_currency_data(
+                    currency, year, base_threshold, adaptive, max_gap_days
+                )
+                results["currency_results"].append(currency_result)
+            except Exception as e:
+                results["currency_results"].append(
+                    {"currency": currency, "error": str(e)}
+                )
+
+        # Overall summary
+        total_violations = sum(
+            r.get("summary", {}).get("total_violations", 0)
+            for r in results["currency_results"]
+            if "summary" in r
+        )
+        total_gaps = sum(
+            r.get("summary", {}).get("total_gaps", 0)
+            for r in results["currency_results"]
+            if "summary" in r
+        )
+        severe_violations = sum(
+            r.get("summary", {}).get("severity_breakdown", {}).get("severe", 0)
+            for r in results["currency_results"]
+            if "summary" in r
+        )
+        severe_gaps = sum(
+            r.get("summary", {}).get("gap_severity_breakdown", {}).get("severe", 0)
+            for r in results["currency_results"]
+            if "summary" in r
+        )
+
+        results["overall_summary"] = {
+            "currencies_checked": len(results["currency_results"]),
+            "total_violations": total_violations,
+            "total_gaps": total_gaps,
+            "severe_violations": severe_violations,
+            "severe_gaps": severe_gaps,
+        }
+
+    except Exception as e:
+        results["error"] = str(e)
+
+    return results
+
+
+def format_validation_text(results):
+    """Format validation results as text output."""
+    output = []
+
+    if "currency" in results:
+        # Single currency validation
+        output.append(
+            f"Currency Validation: {results['currency']} ({results.get('validation_year', 'All Years')})"
+        )
+        output.append("=" * 60)
+
+        adaptive = results.get("adaptive_analysis", {})
+        if adaptive.get("sufficient_data", False):
+            output.append("\nAdaptive Analysis (3-month history):")
+            output.append(
+                f"- Historical volatility: {adaptive.get('volatility_percent', 0):.1f}% std dev"
+            )
+            output.append(
+                f"- Adaptive threshold: {adaptive.get('adaptive_threshold', 1.0):.1f}% (base: {adaptive.get('base_threshold', 1.0)}%)"
+            )
+            output.append(f"- Data points analyzed: {adaptive.get('data_points', 0)}")
+        else:
+            output.append(
+                f"\nAdaptive Analysis: Insufficient data (using base threshold: {adaptive.get('base_threshold', 1.0)}%)"
+            )
+
+        violations = results.get("price_change_violations", [])
+        if violations:
+            output.append("\nPrice Change Violations:")
+            for i, v in enumerate(violations, 1):
+                severity = v["severity"].upper()
+                output.append(
+                    f"{i}. [{severity}] {v['date']}: {v['previous_rate']:.2f} → {v['current_rate']:.2f} ({'+' if v['change_percent'] > 0 else ''}{v['change_percent']:.2f}%)"
+                )
+                if "recommendation" in v:
+                    output.append(f"   → {v['recommendation']}")
+        else:
+            output.append("\nPrice Change Violations: None found")
+
+        gaps = results.get("temporal_gaps", [])
+        if gaps:
+            output.append("\nTemporal Gaps:")
+            for i, g in enumerate(gaps, 1):
+                severity = g["severity"].upper()
+                output.append(
+                    f"{i}. [{severity}] {g['start_date']} → {g['end_date']}: {g['working_days_missing']} working days missing"
+                )
+                if "recommendation" in g:
+                    output.append(f"   → {g['recommendation']}")
+        else:
+            output.append("\nTemporal Gaps: None found")
+
+        # Trading days validation
+        trading_validation = results.get("trading_days_validation")
+        if trading_validation:
+            output.append("\nTrading Days Validation:")
+            output.append(
+                f"- Expected trading days: {trading_validation['expected_trading_days']} ({trading_validation.get('total_days', 'N/A')} total - {trading_validation.get('weekend_days_excluded', 0)} weekends - {trading_validation.get('holiday_days_excluded', 0)} holidays)"
+            )
+            output.append(
+                f"- Actual data points: {trading_validation['actual_data_points']}"
+            )
+            output.append(
+                f"- Discrepancy: {trading_validation['discrepancy_days']} days ({trading_validation['discrepancy_percent']}%)"
+            )
+            output.append(
+                f"- Data completeness: {trading_validation['data_completeness_percent']}%"
+            )
+            output.append(f"- Status: {trading_validation['severity'].upper()}")
+
+        # Record counts by period
+        record_counts = results.get("record_counts_by_period", {})
+        if record_counts:
+            for year_key, periods in record_counts.items():
+                output.append(f"\nRecord Counts for {year_key}:")
+                output.append(f"- Year total: {periods.get('year', 0)} records")
+
+                # Half years
+                half_years = periods.get("half_year", {})
+                if half_years:
+                    output.append(
+                        f"- Half years: H1={half_years.get('H1', 0)}, H2={half_years.get('H2', 0)}"
+                    )
+
+                # Quarters
+                quarters = periods.get("quarter", {})
+                if quarters:
+                    quarter_str = ", ".join(
+                        [f"Q{q}={quarters.get(f'Q{q}', 0)}" for q in range(1, 5)]
+                    )
+                    output.append(f"- Quarters: {quarter_str}")
+
+                # Months summary
+                months = periods.get("month", {})
+                if months:
+                    month_list = [
+                        f"{m}={months.get(f'{int(m):02d}', 0)}"
+                        for m in [
+                            "01",
+                            "02",
+                            "03",
+                            "04",
+                            "05",
+                            "06",
+                            "07",
+                            "08",
+                            "09",
+                            "10",
+                            "11",
+                            "12",
+                        ]
+                    ]
+                    output.append(f"- Months: {', '.join(month_list)}")
+
+                # Weeks summary (show first few and indicate total)
+                weeks = periods.get("week", {})
+                if weeks:
+                    total_weeks = len(weeks)
+                    if total_weeks <= 10:
+                        week_list = [f"{w}={weeks[w]}" for w in sorted(weeks.keys())]
+                        output.append(f"- Weeks: {', '.join(week_list)}")
+                    else:
+                        sample_weeks = sorted(list(weeks.keys())[:5])
+                        week_sample = [f"{w}={weeks[w]}" for w in sample_weeks]
+                        output.append(
+                            f"- Weeks: {', '.join(week_sample)}... ({total_weeks} total weeks)"
+                        )
+
+        summary = results.get("summary", {})
+        quality_score = results.get("data_quality_score", 0)
+        output.append(f"\nData Quality Score: {quality_score}%")
+        output.append(f"Total violations: {summary.get('total_violations', 0)}")
+        output.append(f"Total gaps: {summary.get('total_gaps', 0)}")
+
+    elif "currency_results" in results:
+        # Multi-currency validation
+        output.append("Multi-Currency Validation Report")
+        output.append("=" * 60)
+
+        for currency_result in results["currency_results"]:
+            currency = currency_result.get("currency", "Unknown")
+            violations = currency_result.get("price_change_violations", [])
+            quality_score = currency_result.get("data_quality_score", 0)
+
+            output.append(f"\n{currency}:")
+            output.append(f"  - Violations: {len(violations)}")
+            output.append(f"  - Quality Score: {quality_score}%")
+
+            if violations:
+                severe_count = sum(1 for v in violations if v["severity"] == "severe")
+                output.append(f"  - Severe violations: {severe_count}")
+
+        overall = results.get("overall_summary", {})
+        output.append("\nOverall Summary:")
+        output.append(f"- Currencies checked: {overall.get('currencies_checked', 0)}")
+        output.append(f"- Total violations: {overall.get('total_violations', 0)}")
+        output.append(f"- Severe violations: {overall.get('severe_violations', 0)}")
+
+    return "\n".join(output)
--- a/src/rate_reporter.py
+++ b/src/rate_reporter.py
@@ -224,6 +224,35 @@ def _is_year_complete_for_tax_calculation(year):
    return True


+def _auto_download_missing_monthly_data(year, currency_code, output_dir="data"):
+    """
+    Automatically download missing monthly data for tax calculation (silent operation).
+
+    :param year: Year to check
+    :param currency_code: Currency code
+    :param output_dir: Output directory
+    """
+    missing_months = get_missing_months_for_tax_calculation(year, currency_code)
+    if missing_months:
+        debug_print(
+            f"Auto-downloading missing monthly data for {currency_code} {year}: months {', '.join(f'{m:02d}' for m in missing_months)}"
+        )
+        for month in missing_months:
+            start_date = f"01.{month:02d}.{year}"
+            last_day = calendar.monthrange(year, month)[1]
+            end_date = f"{last_day:02d}.{month:02d}.{year}"
+            try:
+                data_fetcher.download_monthly_data(
+                    currency_code, start_date, end_date, output_dir=output_dir
+                )
+                # Small delay to be respectful to the API
+                time.sleep(0.5)
+            except Exception as e:
+                debug_print(
+                    f"Failed to download data for {currency_code} {month:02d}/{year}: {e}"
+                )
+
+
 def calculate_tax_yearly_average(year, currency_code, output_dir="data"):
    """
    Vypočítá 'Jednotný kurz' pro daňové účely podle metodiky ČNB.
@@ -238,31 +267,8 @@ def calculate_tax_yearly_average(year, currency_code, output_dir="data"):
        f"Vypočítávám 'Jednotný kurz' pro daňové účely podle metodiky ČNB pro {currency_code} za rok {year}..."
    )

-    # Zkusíme stáhnout chybějící měsíční data
-    missing_months = get_missing_months_for_tax_calculation(year, currency_code)
-    if missing_months:
-        debug_print(
-            f"Nalezeny chybějící měsíce pro rok {year}: {', '.join(f'{m:02d}' for m in missing_months)}. Stahuji měsíční data..."
-        )
-        for month in missing_months:
-            start_date = f"01.{month:02d}.{year}"
-            last_day = calendar.monthrange(year, month)[1]
-            end_date = f"{last_day:02d}.{month:02d}.{year}"
-            debug_print(
-                f"Stahuji měsíční data pro {currency_code} za {month:02d}/{year}..."
-            )
-            data_fetcher.download_monthly_data(
-                currency_code, start_date, end_date, output_dir="data"
-            )
-            # Přidáme zpoždění, abychom nezatěžovali API
-            time.sleep(1)
-
-    # Zkontrolujeme, zda je rok kompletní po stažení dat
-    if not _is_year_complete_for_tax_calculation(year):
-        debug_print(
-            f"Rok {year} není kompletní pro výpočet 'Jednotného kurzu'. Všechny měsíce musí mít dostupné kurzy k posledním dnům."
-        )
-        return None
+    # Auto-download missing monthly data if needed (silent operation)
+    _auto_download_missing_monthly_data(year, currency_code, output_dir)

    # Zkontrolujeme, zda databáze obsahuje data pro daný rok
    if not rate_finder.check_year_data_in_db(year):
Author	SHA1	Message	Date
kdusek	d126c5d59d	docs: Update README with validation and record count features - Add --validate command documentation with examples - Add --record-counts command documentation - Document new CLI arguments: --change-threshold, --no-adaptive, --gap-threshold - Add comprehensive JSON schema examples for validation features - Include trading days validation and record count examples - Update features list to include data validation capabilities - Add examples for adaptive threshold usage - Document multi-currency validation options Documentation now covers: - Complete validation command usage - Record count analysis by time periods - JSON output schemas for all new features - Adaptive learning configuration - Threshold customization options - Data quality scoring explanation All new functionality is now properly documented with examples and JSON schemas.	2026-01-12 23:31:56 +01:00
kdusek	8be7f745b1	fix: Prevent validation from running during stats command - Remove validation logic from calculate_tax_yearly_average() - Add _auto_download_missing_monthly_data() for silent auto-download - Fix duplicate validation code in CLI that caused unintended execution - Separate validation from calculation: --stats only calculates, --validate only validates - Maintain auto-download functionality for missing data in calculations - Ensure stats command shows only calculation results without validation output Root Cause: Validation code was embedded in tax calculation function and duplicated in CLI Solution: Extract validation from calculation, keep auto-download separate Result: --stats shows clean output, --validate provides full analysis Testing: ✅ Stats command clean, ✅ Validation command works, ✅ No type errors	2026-01-12 23:29:13 +01:00
kdusek	7ce88e6e4a	feat: Add comprehensive trading days validation and record count analysis - Add trading days validation to check expected vs actual data points per year - Implement calculate_expected_trading_days() accounting for weekends and Czech holidays - Add validate_trading_days_count() with discrepancy analysis and severity classification - Integrate trading days validation into main validation workflow - Add record count analysis by time periods (week, month, quarter, half year, year) - Implement get_record_counts_by_period() with detailed breakdowns - Add --record-counts CLI command for standalone period analysis - Enhance format_validation_text() to display trading days and record count information - Update data quality scoring to include trading days compliance - Add comprehensive JSON output support for all new validation features Trading Days Validation: - Calculates expected trading days excluding weekends and Czech holidays - Compares actual data points against expected counts - Provides discrepancy analysis with severity levels (ok, minor, moderate, severe) - Shows data completeness percentage Record Count Analysis: - Breaks down data by multiple time periods simultaneously - Supports week-by-week, monthly, quarterly, half-yearly, and yearly counts - Handles leap years and varying month lengths correctly - Provides both summary and detailed views Integration Features: - Seamlessly integrated with existing price change and gap validation - Enhanced data quality scoring considers all validation aspects - Comprehensive JSON schema for programmatic consumption - Backward compatible with existing validation commands Usage Examples: python src/cli.py --validate --currency USD --year 2025 # Shows all validations python src/cli.py --record-counts --currency USD --year 2025 # Period breakdown only python src/cli.py --validate --currency EUR --json # Full validation in JSON Quality Assurance: - ✅ Pyright type checking: 0 errors, 0 warnings - ✅ Syntax validation: No compilation errors - ✅ Functional testing: All features working correctly - ✅ Czech holiday integration: Proper weekend/holiday exclusion - ✅ Leap year handling: Correctly accounts for 366-day years	2026-01-12 23:19:33 +01:00
kdusek	65a1485ff9	feat: Add temporal gap detection to data validation - Add temporal gap analysis to detect missing working days in data sequences - Implement calculate_working_days_gap() to count business days between dates - Add detect_temporal_gaps() function with configurable gap threshold - Integrate gap detection into validate_currency_data() and validate_all_currencies() - Update format_validation_text() to display temporal gap information - Add --gap-threshold CLI argument (default: 3 working days) - Enhance data quality scoring to include temporal gaps - Update JSON output schema to include temporal gap details Gap Detection Features: - Excludes weekends and Czech public holidays from gap calculations - Classifies gaps by severity (minor: 1-2x threshold, moderate: 2-3x, severe: >3x) - Provides actionable recommendations for data gaps - Configurable sensitivity via --gap-threshold parameter Integration with Existing Validation: - Combines temporal gap analysis with price change anomaly detection - Unified data quality scoring incorporating both gap and price metrics - Consistent JSON/text output formats - Maintains backward compatibility Technical Implementation: - Uses existing holidays.py for Czech holiday calendar - Efficient date iteration with proper boundary handling - Robust error handling for edge cases - Clean integration with existing validation pipeline Usage Examples: python src/cli.py --validate --currency USD --year 2025 --gap-threshold 2 python src/cli.py --validate --all-currencies --json Quality Assurance: - ✅ Pyright type checking: 0 errors, 0 warnings - ✅ Syntax validation: No errors - ✅ Functional testing: Gap detection working correctly - ✅ JSON output: Proper schema and formatting	2026-01-12 23:10:35 +01:00
kdusek	7d9dfa309c	feat: Add comprehensive data validation system - Add --validate command for detecting data quality issues - Implement adaptive price change monitoring with 3-month learning scope - Configurable threshold (default 1%) with --change-threshold option - Detect potential data corruption when price changes exceed thresholds - Support for validating specific currencies or all currencies - JSON and text output formats for validation results - Severity classification: minor, moderate, severe violations - Adaptive threshold calculation based on currency volatility - Data quality scoring system - Comprehensive CLI argument parsing with --no-adaptive option Core validation features: - Price change anomaly detection between consecutive dates - Adaptive threshold learning from 3-month historical data - Corruption risk assessment for extreme changes - Structured reporting with violation details and recommendations - Multi-currency validation support - Configurable sensitivity levels Technical implementation: - New data_validator.py module with validation algorithms - Integrated CLI support with argument parsing - JSON schema for programmatic consumption - Backward compatible with existing functionality Usage examples: python src/cli.py --validate --currency USD --year 2025 python src/cli.py --validate --all-currencies --change-threshold 0.5 --json python src/cli.py --validate --currency EUR --no-adaptive	2026-01-12 23:05:47 +01:00