Compare commits
5 Commits
ed5d126d77
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
d126c5d59d | ||
|
|
8be7f745b1 | ||
|
|
7ce88e6e4a | ||
|
|
65a1485ff9 | ||
|
|
7d9dfa309c |
85
README.md
85
README.md
@@ -18,6 +18,9 @@ Tento projekt je určen pro stahování a správu kurzů cizích měn vůči če
|
|||||||
- **Generování reportů**: Lze vygenerovat report kurzů pro zadaný rok, měsíc nebo časové období včetně dopočítaných kurzů pro dny, kdy ve vstupních datech neexistovali.
|
- **Generování reportů**: Lze vygenerovat report kurzů pro zadaný rok, měsíc nebo časové období včetně dopočítaných kurzů pro dny, kdy ve vstupních datech neexistovali.
|
||||||
- **Správné dopočítání kurzů**: Program správně aplikuje pravidla ČNB pro dopočítání kurzů pro víkendy a svátky jak při vyhledávání (`--get-rate`), tak při generování reportů.
|
- **Správné dopočítání kurzů**: Program správně aplikuje pravidla ČNB pro dopočítání kurzů pro víkendy a svátky jak při vyhledávání (`--get-rate`), tak při generování reportů.
|
||||||
- **Výpočet Jednotného kurzu**: Lze vypočítat 'Jednotný kurz' pro daňové účely podle metodiky ČNB jako aritmetický průměr kurzů k posledním dnům každého měsíce v roce.
|
- **Výpočet Jednotného kurzu**: Lze vypočítat 'Jednotný kurz' pro daňové účely podle metodiky ČNB jako aritmetický průměr kurzů k posledním dnům každého měsíce v roce.
|
||||||
|
- **Validace dat**: Program umí validovat data pro konzistenci, detekovat změny kurzů přesahující prahové hodnoty, kontrolovat počet obchodních dnů a analyzovat časové mezery v datech.
|
||||||
|
- **Analýza počtu záznamů**: Lze zobrazit počty záznamů podle různých časových období (týden, měsíc, čtvrtletí, pololetí, rok).
|
||||||
|
- **Adaptivní prahy**: Systém se učí z historických dat a automaticky upravuje prahy pro detekci anomálií.
|
||||||
- **JSON výstup**: Všechny příkazy podporují JSON formát pro programové zpracování pomocí přepínače `--json`.
|
- **JSON výstup**: Všechny příkazy podporují JSON formát pro programové zpracování pomocí přepínače `--json`.
|
||||||
|
|
||||||
## Požadavky
|
## Požadavky
|
||||||
@@ -65,6 +68,11 @@ Při každém spuštění programu:
|
|||||||
- `--report-year ROK [--report-month MESIC]`: Vygeneruje report kurzů pro zadaný rok (a případně měsíc). Vyžaduje `-c` nebo `--currency`.
|
- `--report-year ROK [--report-month MESIC]`: Vygeneruje report kurzů pro zadaný rok (a případně měsíc). Vyžaduje `-c` nebo `--currency`.
|
||||||
- `--report-period ZACATEK KONEC`: Vygeneruje report kurzů pro zadané časové období. Vyžaduje `-c` nebo `--currency`.
|
- `--report-period ZACATEK KONEC`: Vygeneruje report kurzů pro zadané časové období. Vyžaduje `-c` nebo `--currency`.
|
||||||
- `--stats [ROK]`: Vypočítá 'Jednotný kurz' pro daňové účely podle metodiky ČNB. Pokud je zadán rok, vytvoří kurz pro konkrétní rok. Pokud není rok zadán, vytvoří kurzy pro všechny roky s dostupnými daty. Vyžaduje `-c` nebo `--currency`.
|
- `--stats [ROK]`: Vypočítá 'Jednotný kurz' pro daňové účely podle metodiky ČNB. Pokud je zadán rok, vytvoří kurz pro konkrétní rok. Pokud není rok zadán, vytvoří kurzy pro všechny roky s dostupnými daty. Vyžaduje `-c` nebo `--currency`.
|
||||||
|
- `--validate`: Validuje data pro měnu nebo všechny měny. Zkontroluje konzistenci kurzů, počet obchodních dnů a detekuje možné chyby.
|
||||||
|
- `--record-counts`: Zobrazí počet záznamů podle časových období (týden, měsíc, čtvrtletí, pololetí, rok). Vyžaduje `-c` nebo `--currency`.
|
||||||
|
- `--change-threshold PRAH`: Práh pro detekci změn kurzů v procentech (výchozí: 1.0).
|
||||||
|
- `--no-adaptive`: Vypne adaptivní učení prahů na základě historických dat.
|
||||||
|
- `--gap-threshold DNY`: Maximální přijatelná mezera v pracovních dnech (výchozí: 3).
|
||||||
- `--json`: Výstup ve formátu JSON místo prostého textu pro programové zpracování.
|
- `--json`: Výstup ve formátu JSON místo prostého textu pro programové zpracování.
|
||||||
|
|
||||||
### Příklady
|
### Příklady
|
||||||
@@ -125,19 +133,34 @@ Při každém spuštění programu:
|
|||||||
python src/cli.py --stats -c USD
|
python src/cli.py --stats -c USD
|
||||||
```
|
```
|
||||||
|
|
||||||
12. **Získání posledního dostupného kurzu USD**:
|
12. **Validace dat pro měnu USD za rok 2025**:
|
||||||
|
```bash
|
||||||
|
python src/cli.py --validate --currency USD --year 2025
|
||||||
|
```
|
||||||
|
|
||||||
|
13. **Validace všech měn s vlastními prahy**:
|
||||||
|
```bash
|
||||||
|
python src/cli.py --validate --change-threshold 0.5 --gap-threshold 2
|
||||||
|
```
|
||||||
|
|
||||||
|
14. **Zobrazení počtu záznamů podle časových období pro USD**:
|
||||||
|
```bash
|
||||||
|
python src/cli.py --record-counts --currency USD --year 2025
|
||||||
|
```
|
||||||
|
|
||||||
|
15. **Získání posledního dostupného kurzu USD**:
|
||||||
```bash
|
```bash
|
||||||
python src/cli.py -c USD
|
python src/cli.py -c USD
|
||||||
```
|
```
|
||||||
|
|
||||||
13. **JSON výstup pro vyhledání kurzu**:
|
16. **JSON výstup pro vyhledání kurzu**:
|
||||||
```bash
|
```bash
|
||||||
python src/cli.py --get-rate 01.01.2025 -c USD --json
|
python src/cli.py --get-rate 01.01.2025 -c USD --json
|
||||||
```
|
```
|
||||||
|
|
||||||
14. **JSON výstup pro výpočet Jednotného kurzu**:
|
17. **JSON výstup pro validaci dat**:
|
||||||
```bash
|
```bash
|
||||||
python src/cli.py --stats 2025 -c USD --json
|
python src/cli.py --validate --currency USD --year 2025 --json
|
||||||
```
|
```
|
||||||
|
|
||||||
## JSON formát
|
## JSON formát
|
||||||
@@ -196,6 +219,60 @@ Při použití přepínače `--json` program vrací strukturovaná data ve form
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Validace dat
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"currency": "USD",
|
||||||
|
"validation_year": 2025,
|
||||||
|
"adaptive_analysis": {
|
||||||
|
"adaptive_threshold": 1.5,
|
||||||
|
"base_threshold": 1.0,
|
||||||
|
"volatility_percent": 0.24,
|
||||||
|
"data_points": 62
|
||||||
|
},
|
||||||
|
"price_change_violations": [
|
||||||
|
{
|
||||||
|
"date": "06.01.2025",
|
||||||
|
"change_percent": 1.19,
|
||||||
|
"severity": "minor"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"temporal_gaps": [],
|
||||||
|
"trading_days_validation": {
|
||||||
|
"expected_trading_days": 251,
|
||||||
|
"actual_data_points": 251,
|
||||||
|
"discrepancy_days": 0,
|
||||||
|
"data_completeness_percent": 100.0
|
||||||
|
},
|
||||||
|
"record_counts_by_period": {
|
||||||
|
"2025": {
|
||||||
|
"year": 251,
|
||||||
|
"half_year": {"H1": 124, "H2": 127},
|
||||||
|
"quarter": {"Q1": 63, "Q2": 61, "Q3": 66, "Q4": 61},
|
||||||
|
"month": {"01": 22, "02": 20, "03": 21},
|
||||||
|
"week": {"W01": 5, "W02": 5}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"data_quality_score": 95
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Počty záznamů podle období
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"currency": "USD",
|
||||||
|
"record_counts": {
|
||||||
|
"2025": {
|
||||||
|
"year": 251,
|
||||||
|
"half_year": {"H1": 124, "H2": 127},
|
||||||
|
"quarter": {"Q1": 63, "Q2": 61, "Q3": 66, "Q4": 61},
|
||||||
|
"month": {"01": 22, "02": 20, "03": 21, "04": 20},
|
||||||
|
"week": {"W01": 5, "W02": 5, "W03": 5}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
## Chování při různých časech a datumech
|
## Chování při různých časech a datumech
|
||||||
|
|
||||||
- **Budoucí datum**: Program vrátí chybu, protože kurzy pro budoucí data ještě nebyly vydány.
|
- **Budoucí datum**: Program vrátí chybu, protože kurzy pro budoucí data ještě nebyly vydány.
|
||||||
|
|||||||
179
src/cli.py
179
src/cli.py
@@ -9,11 +9,12 @@ from datetime import datetime
|
|||||||
# Přidání adresáře src do sys.path, aby bylo možné importovat moduly
|
# Přidání adresáře src do sys.path, aby bylo možné importovat moduly
|
||||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__)))
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__)))
|
||||||
|
|
||||||
import data_fetcher
|
|
||||||
import database
|
import database
|
||||||
|
import data_fetcher
|
||||||
import holidays
|
import holidays
|
||||||
import rate_finder
|
import rate_finder
|
||||||
import rate_reporter
|
import rate_reporter
|
||||||
|
import data_validator
|
||||||
|
|
||||||
# Global debug flag
|
# Global debug flag
|
||||||
DEBUG = False
|
DEBUG = False
|
||||||
@@ -36,6 +37,7 @@ def set_debug_mode(debug):
|
|||||||
holidays.set_debug_mode(DEBUG)
|
holidays.set_debug_mode(DEBUG)
|
||||||
rate_finder.set_debug_mode(DEBUG)
|
rate_finder.set_debug_mode(DEBUG)
|
||||||
rate_reporter.set_debug_mode(DEBUG)
|
rate_reporter.set_debug_mode(DEBUG)
|
||||||
|
data_validator.set_debug_mode(DEBUG)
|
||||||
|
|
||||||
|
|
||||||
def format_single_rate_json(
|
def format_single_rate_json(
|
||||||
@@ -195,6 +197,28 @@ def main():
|
|||||||
"Pokud je zadán rok, vytvoří kurz pro konkrétní rok. "
|
"Pokud je zadán rok, vytvoří kurz pro konkrétní rok. "
|
||||||
"Pokud není rok zadán, vytvoří kurzy pro všechny roky s dostupnými daty.",
|
"Pokud není rok zadán, vytvoří kurzy pro všechny roky s dostupnými daty.",
|
||||||
)
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--validate",
|
||||||
|
action="store_true",
|
||||||
|
help="Validuje data pro měnu nebo všechny měny. Zkontroluje konzistenci kurzů a detekuje možné chyby.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--record-counts",
|
||||||
|
action="store_true",
|
||||||
|
help="Zobrazí počet záznamů podle časových období (týden, měsíc, čtvrtletí, pololetí, rok).",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--change-threshold",
|
||||||
|
type=float,
|
||||||
|
default=1.0,
|
||||||
|
help="Práh pro detekci změn kurzů v procentech (výchozí: 1.0).",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--gap-threshold",
|
||||||
|
type=int,
|
||||||
|
default=3,
|
||||||
|
help="Maximální přijatelná mezera v pracovních dnech (výchozí: 3).",
|
||||||
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--debug", action="store_true", help="Zobrazí podrobné ladicí informace."
|
"--debug", action="store_true", help="Zobrazí podrobné ladicí informace."
|
||||||
)
|
)
|
||||||
@@ -203,20 +227,14 @@ def main():
|
|||||||
action="store_true",
|
action="store_true",
|
||||||
help="Výstup ve formátu JSON místo prostého textu pro programové zpracování.",
|
help="Výstup ve formátu JSON místo prostého textu pro programové zpracování.",
|
||||||
)
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--no-adaptive",
|
||||||
|
action="store_true",
|
||||||
|
help="Vypne adaptivní učení prahů na základě historických dat.",
|
||||||
|
)
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
# Pokud nebyly zadány žádné argumenty, vytiskneme nápovědu a seznam dostupných měn
|
|
||||||
if len(sys.argv) == 1:
|
|
||||||
parser.print_help()
|
|
||||||
print("\nDostupné měny:")
|
|
||||||
currencies = database.get_available_currencies()
|
|
||||||
if currencies:
|
|
||||||
print(", ".join(currencies))
|
|
||||||
else:
|
|
||||||
print("Žádné měny nejsou v databázi k dispozici.")
|
|
||||||
sys.exit(0)
|
|
||||||
|
|
||||||
# Nastavíme debug mód
|
# Nastavíme debug mód
|
||||||
DEBUG = args.debug
|
DEBUG = args.debug
|
||||||
set_debug_mode(DEBUG)
|
set_debug_mode(DEBUG)
|
||||||
@@ -245,14 +263,131 @@ def main():
|
|||||||
pass
|
pass
|
||||||
|
|
||||||
# Zde bude logika pro zpracování argumentů
|
# Zde bude logika pro zpracování argumentů
|
||||||
if args.year:
|
# Zde bude logika pro zpracování argumentů
|
||||||
debug_print(f"Stahuji roční data pro rok {args.year}...")
|
if args.validate:
|
||||||
# Ujistěme se, že adresář data existuje
|
# Validation command
|
||||||
os.makedirs("data", exist_ok=True)
|
base_threshold = args.change_threshold
|
||||||
# Volání funkce pro stažení ročních dat
|
adaptive = not args.no_adaptive
|
||||||
data_fetcher.download_yearly_data(args.year, output_dir="data")
|
max_gap_days = getattr(args, "gap_threshold", 3) # Default to 3 if not defined
|
||||||
elif args.currency and args.start_date and args.end_date and not args.report_period:
|
|
||||||
|
if args.currency:
|
||||||
|
# Validate specific currency
|
||||||
|
debug_print(f"Validuji data pro měnu {args.currency}...")
|
||||||
|
results = data_validator.validate_currency_data(
|
||||||
|
args.currency, args.year, base_threshold, adaptive, max_gap_days
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output_json(results)
|
||||||
|
else:
|
||||||
|
text_output = data_validator.format_validation_text(results)
|
||||||
|
print(text_output)
|
||||||
|
else:
|
||||||
|
# Validate all currencies
|
||||||
|
debug_print("Validuji data pro všechny měny...")
|
||||||
|
results = data_validator.validate_all_currencies(
|
||||||
|
args.year, base_threshold, adaptive, max_gap_days
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output_json(results)
|
||||||
|
else:
|
||||||
|
text_output = data_validator.format_validation_text(results)
|
||||||
|
print(text_output)
|
||||||
|
elif args.record_counts:
|
||||||
|
# Record counts command
|
||||||
|
if not args.currency:
|
||||||
|
print(
|
||||||
|
"Chyba: Pro --record-counts je nutné zadat měnu pomocí -c/--currency."
|
||||||
|
)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
debug_print(f"Získávám počty záznamů pro měnu {args.currency}...")
|
||||||
|
record_counts = data_validator.get_record_counts_by_period(
|
||||||
|
args.currency, args.year
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output_json({"currency": args.currency, "record_counts": record_counts})
|
||||||
|
else:
|
||||||
|
print(f"Record Counts for {args.currency}:")
|
||||||
|
print("=" * 50)
|
||||||
|
|
||||||
|
for year_key, periods in record_counts.items():
|
||||||
|
print(f"\nYear {year_key}:")
|
||||||
|
print(f" Total records: {periods.get('year', 0)}")
|
||||||
|
|
||||||
|
# Half years
|
||||||
|
half_years = periods.get("half_year", {})
|
||||||
|
if half_years:
|
||||||
|
print(
|
||||||
|
f" Half years: H1={half_years.get('H1', 0)}, H2={half_years.get('H2', 0)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Quarters
|
||||||
|
quarters = periods.get("quarter", {})
|
||||||
|
if quarters:
|
||||||
|
quarter_str = ", ".join(
|
||||||
|
[f"Q{q}={quarters.get(f'Q{q}', 0)}" for q in range(1, 5)]
|
||||||
|
)
|
||||||
|
print(f" Quarters: {quarter_str}")
|
||||||
|
|
||||||
|
# Months
|
||||||
|
months = periods.get("month", {})
|
||||||
|
if months:
|
||||||
|
month_list = []
|
||||||
|
for month in range(1, 13):
|
||||||
|
month_key = f"{month:02d}"
|
||||||
|
count = months.get(month_key, 0)
|
||||||
|
month_list.append(f"{month}={count}")
|
||||||
|
print(f" Months: {', '.join(month_list)}")
|
||||||
|
|
||||||
|
# Weeks summary
|
||||||
|
weeks = periods.get("week", {})
|
||||||
|
if weeks:
|
||||||
|
total_weeks = len(weeks)
|
||||||
|
if total_weeks <= 10:
|
||||||
|
week_list = sorted([f"{w}={weeks[w]}" for w in weeks.keys()])
|
||||||
|
print(f" Weeks: {', '.join(week_list)}")
|
||||||
|
else:
|
||||||
|
sample_weeks = sorted(list(weeks.keys())[:5])
|
||||||
|
week_sample = [f"{w}={weeks[w]}" for w in sample_weeks]
|
||||||
|
print(
|
||||||
|
f" Weeks: {', '.join(week_sample)}... ({total_weeks} total weeks)"
|
||||||
|
)
|
||||||
|
elif args.year:
|
||||||
|
# Validation command
|
||||||
|
base_threshold = args.change_threshold
|
||||||
|
adaptive = not args.no_adaptive
|
||||||
|
|
||||||
|
if args.currency:
|
||||||
|
# Validate specific currency
|
||||||
|
debug_print(f"Validuji data pro měnu {args.currency}...")
|
||||||
|
results = data_validator.validate_currency_data(
|
||||||
|
args.currency, args.year, base_threshold, adaptive
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output_json(results)
|
||||||
|
else:
|
||||||
|
text_output = data_validator.format_validation_text(results)
|
||||||
|
print(text_output)
|
||||||
|
else:
|
||||||
|
# Validate all currencies
|
||||||
|
debug_print("Validuji data pro všechny měny...")
|
||||||
|
results = data_validator.validate_all_currencies(
|
||||||
|
args.year, base_threshold, adaptive
|
||||||
|
)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
output_json(results)
|
||||||
|
else:
|
||||||
|
text_output = data_validator.format_validation_text(results)
|
||||||
|
print(text_output)
|
||||||
|
return
|
||||||
|
# elif args.currency and args.start_date and args.end_date and not args.report_period:
|
||||||
# Měsíční stahování dat
|
# Měsíční stahování dat
|
||||||
|
debug_print("HIT: Monthly download condition")
|
||||||
debug_print(
|
debug_print(
|
||||||
f"Stahuji měsíční data pro měnu {args.currency} od {args.start_date} do {args.end_date}..."
|
f"Stahuji měsíční data pro měnu {args.currency} od {args.start_date} do {args.end_date}..."
|
||||||
)
|
)
|
||||||
@@ -264,6 +399,7 @@ def main():
|
|||||||
)
|
)
|
||||||
elif args.report_period and args.currency:
|
elif args.report_period and args.currency:
|
||||||
start_date, end_date = args.report_period
|
start_date, end_date = args.report_period
|
||||||
|
debug_print("HIT: Report period condition")
|
||||||
debug_print(
|
debug_print(
|
||||||
f"Generuji report pro měnu {args.currency} od {start_date} do {end_date}..."
|
f"Generuji report pro měnu {args.currency} od {start_date} do {end_date}..."
|
||||||
)
|
)
|
||||||
@@ -271,12 +407,14 @@ def main():
|
|||||||
start_date, end_date, args.currency, output_dir="data"
|
start_date, end_date, args.currency, output_dir="data"
|
||||||
)
|
)
|
||||||
elif args.date:
|
elif args.date:
|
||||||
|
debug_print("HIT: Daily data condition")
|
||||||
debug_print(f"Stahuji denní data pro datum {args.date}...")
|
debug_print(f"Stahuji denní data pro datum {args.date}...")
|
||||||
# Ujistěme se, že adresář data existuje
|
# Ujistěme se, že adresář data existuje
|
||||||
os.makedirs("data", exist_ok=True)
|
os.makedirs("data", exist_ok=True)
|
||||||
# Volání funkce pro stažení denních dat
|
# Volání funkce pro stažení denních dat
|
||||||
data_fetcher.download_daily_data(args.date, output_dir="data")
|
data_fetcher.download_daily_data(args.date, output_dir="data")
|
||||||
elif args.get_rate and args.currency:
|
elif args.get_rate and args.currency:
|
||||||
|
debug_print("HIT: Get rate condition")
|
||||||
date_str = args.get_rate
|
date_str = args.get_rate
|
||||||
currency_code = args.currency
|
currency_code = args.currency
|
||||||
debug_print(f"Vyhledávám kurz pro {currency_code} na datum {date_str}...")
|
debug_print(f"Vyhledávám kurz pro {currency_code} na datum {date_str}...")
|
||||||
@@ -309,6 +447,7 @@ def main():
|
|||||||
f"Kurz {currency_code} na datum {date_str} (ani v předchozích dnech) nebyl nalezen."
|
f"Kurz {currency_code} na datum {date_str} (ani v předchozích dnech) nebyl nalezen."
|
||||||
)
|
)
|
||||||
elif args.get_rate is not None and not args.currency:
|
elif args.get_rate is not None and not args.currency:
|
||||||
|
debug_print("HIT: Get rate without currency condition")
|
||||||
# Pokud je zadán --get-rate bez data a bez měny
|
# Pokud je zadán --get-rate bez data a bez měny
|
||||||
if DEBUG:
|
if DEBUG:
|
||||||
print(
|
print(
|
||||||
@@ -318,7 +457,7 @@ def main():
|
|||||||
# DŮLEŽITÉ: Pořadí následujících elif podmínek je důležité!
|
# DŮLEŽITÉ: Pořadí následujících elif podmínek je důležité!
|
||||||
# Nejprve zpracujeme --stats, pak teprve "poslední dostupný kurz"
|
# Nejprve zpracujeme --stats, pak teprve "poslední dostupný kurz"
|
||||||
elif args.stats is not None and args.currency:
|
elif args.stats is not None and args.currency:
|
||||||
# --stats s nebo bez roku + s měnou
|
debug_print("HIT: Stats condition")
|
||||||
currency_code = args.currency
|
currency_code = args.currency
|
||||||
if args.stats is True:
|
if args.stats is True:
|
||||||
# Pokud je --stats zadán bez roku, vytvoříme kurzy pro všechny roky s dostupnými daty
|
# Pokud je --stats zadán bez roku, vytvoříme kurzy pro všechny roky s dostupnými daty
|
||||||
|
|||||||
789
src/data_validator.py
Normal file
789
src/data_validator.py
Normal file
@@ -0,0 +1,789 @@
|
|||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import json
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from collections import defaultdict
|
||||||
|
import statistics
|
||||||
|
|
||||||
|
# Přidání adresáře src do sys.path, aby bylo možné importovat moduly
|
||||||
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__)))
|
||||||
|
|
||||||
|
import database
|
||||||
|
import holidays
|
||||||
|
|
||||||
|
# Global debug flag
|
||||||
|
DEBUG = False
|
||||||
|
|
||||||
|
|
||||||
|
def debug_print(*args, **kwargs):
|
||||||
|
"""Print debug messages only if debug mode is enabled."""
|
||||||
|
if DEBUG:
|
||||||
|
print(*args, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
def set_debug_mode(debug):
|
||||||
|
"""Set the debug mode for this module."""
|
||||||
|
global DEBUG
|
||||||
|
DEBUG = debug
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_adaptive_threshold(currency_code, base_threshold=1.0, learning_months=3):
|
||||||
|
"""
|
||||||
|
Calculates adaptive threshold based on 3-month historical volatility.
|
||||||
|
|
||||||
|
:param currency_code: Currency to analyze
|
||||||
|
:param base_threshold: Base threshold percentage
|
||||||
|
:param learning_months: Months of history to analyze
|
||||||
|
:return: Adaptive threshold and volatility statistics
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Calculate date range for learning (3 months back)
|
||||||
|
end_date = datetime.now()
|
||||||
|
start_date = end_date - timedelta(days=learning_months * 30)
|
||||||
|
|
||||||
|
# Get all rates for the period
|
||||||
|
rates_data = []
|
||||||
|
current_date = start_date
|
||||||
|
|
||||||
|
while current_date <= end_date:
|
||||||
|
date_str = current_date.strftime("%d.%m.%Y")
|
||||||
|
rate = database.get_rate(date_str, currency_code)
|
||||||
|
if rate is not None:
|
||||||
|
rates_data.append((current_date, rate))
|
||||||
|
current_date += timedelta(days=1)
|
||||||
|
|
||||||
|
if len(rates_data) < 10:
|
||||||
|
# Insufficient data, return base threshold
|
||||||
|
return {
|
||||||
|
"adaptive_threshold": base_threshold,
|
||||||
|
"base_threshold": base_threshold,
|
||||||
|
"volatility_percent": 0.0,
|
||||||
|
"data_points": len(rates_data),
|
||||||
|
"sufficient_data": False,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Calculate daily percentage changes
|
||||||
|
changes = []
|
||||||
|
for i in range(1, len(rates_data)):
|
||||||
|
prev_rate = rates_data[i - 1][1]
|
||||||
|
curr_rate = rates_data[i][1]
|
||||||
|
if prev_rate > 0:
|
||||||
|
change_pct = abs((curr_rate - prev_rate) / prev_rate) * 100
|
||||||
|
changes.append(change_pct)
|
||||||
|
|
||||||
|
if not changes:
|
||||||
|
return {
|
||||||
|
"adaptive_threshold": base_threshold,
|
||||||
|
"base_threshold": base_threshold,
|
||||||
|
"volatility_percent": 0.0,
|
||||||
|
"data_points": len(rates_data),
|
||||||
|
"sufficient_data": True,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Calculate volatility metrics
|
||||||
|
std_dev = statistics.stdev(changes)
|
||||||
|
percentile_95 = statistics.quantiles(changes, n=20)[18] # 95th percentile
|
||||||
|
|
||||||
|
# Adaptive threshold formula: more conservative of std_dev and percentile_95th/2
|
||||||
|
volatility_factor = max(std_dev, percentile_95 / 2)
|
||||||
|
|
||||||
|
# Apply bounds (0.5% to 5.0%)
|
||||||
|
adaptive_threshold = base_threshold * (
|
||||||
|
1 + min(max(volatility_factor, 0.5), 5.0)
|
||||||
|
)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"adaptive_threshold": adaptive_threshold,
|
||||||
|
"base_threshold": base_threshold,
|
||||||
|
"volatility_percent": std_dev,
|
||||||
|
"percentile_95": percentile_95,
|
||||||
|
"data_points": len(rates_data),
|
||||||
|
"sufficient_data": True,
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
debug_print(f"Error calculating adaptive threshold: {e}")
|
||||||
|
return {
|
||||||
|
"adaptive_threshold": base_threshold,
|
||||||
|
"base_threshold": base_threshold,
|
||||||
|
"volatility_percent": 0.0,
|
||||||
|
"data_points": 0,
|
||||||
|
"sufficient_data": False,
|
||||||
|
"error": str(e),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_working_days_gap(start_date, end_date):
|
||||||
|
"""
|
||||||
|
Calculate the number of working days (excluding weekends and holidays) between two dates.
|
||||||
|
|
||||||
|
:param start_date: Start date (datetime)
|
||||||
|
:param end_date: End date (datetime)
|
||||||
|
:return: Number of working days between the dates (exclusive)
|
||||||
|
"""
|
||||||
|
working_days = 0
|
||||||
|
current = start_date + timedelta(days=1) # Start from day after start_date
|
||||||
|
|
||||||
|
while current < end_date:
|
||||||
|
date_str = current.strftime("%d.%m.%Y")
|
||||||
|
if not holidays.is_weekend(date_str) and not holidays.is_holiday(date_str):
|
||||||
|
working_days += 1
|
||||||
|
current += timedelta(days=1)
|
||||||
|
|
||||||
|
return working_days
|
||||||
|
|
||||||
|
|
||||||
|
def calculate_expected_trading_days(year):
|
||||||
|
"""
|
||||||
|
Calculate the expected number of trading days in a year (excluding weekends and holidays).
|
||||||
|
|
||||||
|
:param year: Year to calculate for
|
||||||
|
:return: Dictionary with expected trading days and breakdown
|
||||||
|
"""
|
||||||
|
import calendar
|
||||||
|
|
||||||
|
total_days = 366 if calendar.isleap(year) else 365
|
||||||
|
weekend_days = 0
|
||||||
|
holiday_days = 0
|
||||||
|
|
||||||
|
# Count weekends and holidays
|
||||||
|
for month in range(1, 13):
|
||||||
|
for day in range(1, calendar.monthrange(year, month)[1] + 1):
|
||||||
|
date_str = f"{day:02d}.{month:02d}.{year}"
|
||||||
|
if holidays.is_weekend(date_str):
|
||||||
|
weekend_days += 1
|
||||||
|
elif holidays.is_holiday(date_str):
|
||||||
|
holiday_days += 1
|
||||||
|
|
||||||
|
expected_trading_days = total_days - weekend_days - holiday_days
|
||||||
|
|
||||||
|
return {
|
||||||
|
"total_days": total_days,
|
||||||
|
"weekend_days": weekend_days,
|
||||||
|
"holiday_days": holiday_days,
|
||||||
|
"expected_trading_days": expected_trading_days,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def validate_trading_days_count(currency_code, year):
|
||||||
|
"""
|
||||||
|
Validate that a year has the appropriate number of trading day entries.
|
||||||
|
|
||||||
|
:param currency_code: Currency to validate
|
||||||
|
:param year: Year to check
|
||||||
|
:return: Validation result with actual vs expected counts
|
||||||
|
"""
|
||||||
|
# Get expected trading days
|
||||||
|
expected = calculate_expected_trading_days(year)
|
||||||
|
|
||||||
|
# Count actual data points for the year
|
||||||
|
actual_count = 0
|
||||||
|
rates_data = []
|
||||||
|
|
||||||
|
start_date = datetime(year, 1, 1)
|
||||||
|
end_date = datetime(year, 12, 31)
|
||||||
|
|
||||||
|
current_date = start_date
|
||||||
|
while current_date <= end_date:
|
||||||
|
date_str = current_date.strftime("%d.%m.%Y")
|
||||||
|
rate = database.get_rate(date_str, currency_code)
|
||||||
|
if rate is not None:
|
||||||
|
actual_count += 1
|
||||||
|
rates_data.append((current_date, rate, date_str))
|
||||||
|
current_date += timedelta(days=1)
|
||||||
|
|
||||||
|
# Calculate discrepancy
|
||||||
|
discrepancy_days = actual_count - expected["expected_trading_days"]
|
||||||
|
discrepancy_percent = (
|
||||||
|
(discrepancy_days / expected["expected_trading_days"]) * 100
|
||||||
|
if expected["expected_trading_days"] > 0
|
||||||
|
else 0
|
||||||
|
)
|
||||||
|
|
||||||
|
# Determine severity
|
||||||
|
severity = "ok"
|
||||||
|
if abs(discrepancy_percent) > 15:
|
||||||
|
severity = "severe"
|
||||||
|
elif abs(discrepancy_percent) > 5:
|
||||||
|
severity = "moderate"
|
||||||
|
elif abs(discrepancy_percent) > 0:
|
||||||
|
severity = "minor"
|
||||||
|
|
||||||
|
return {
|
||||||
|
"expected_trading_days": expected["expected_trading_days"],
|
||||||
|
"actual_data_points": actual_count,
|
||||||
|
"discrepancy_days": discrepancy_days,
|
||||||
|
"discrepancy_percent": round(discrepancy_percent, 2),
|
||||||
|
"severity": severity,
|
||||||
|
"total_days": expected["total_days"],
|
||||||
|
"weekend_days_excluded": expected["weekend_days"],
|
||||||
|
"holiday_days_excluded": expected["holiday_days"],
|
||||||
|
"data_completeness_percent": round(
|
||||||
|
(actual_count / expected["expected_trading_days"]) * 100, 1
|
||||||
|
)
|
||||||
|
if expected["expected_trading_days"] > 0
|
||||||
|
else 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def get_record_counts_by_period(currency_code, year=None):
|
||||||
|
"""
|
||||||
|
Get record counts for different time periods.
|
||||||
|
|
||||||
|
:param currency_code: Currency to analyze
|
||||||
|
:param year: Optional year filter
|
||||||
|
:return: Dictionary with counts by period
|
||||||
|
"""
|
||||||
|
if year:
|
||||||
|
years_to_check = [year]
|
||||||
|
else:
|
||||||
|
years_to_check = database.get_years_with_data()
|
||||||
|
if not years_to_check:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
results = {}
|
||||||
|
|
||||||
|
for check_year in years_to_check:
|
||||||
|
year_results = {}
|
||||||
|
|
||||||
|
# Get all data for the year
|
||||||
|
data_points = []
|
||||||
|
start_date = datetime(check_year, 1, 1)
|
||||||
|
end_date = datetime(check_year, 12, 31)
|
||||||
|
|
||||||
|
current_date = start_date
|
||||||
|
while current_date <= end_date:
|
||||||
|
date_str = current_date.strftime("%d.%m.%Y")
|
||||||
|
rate = database.get_rate(date_str, currency_code)
|
||||||
|
if rate is not None:
|
||||||
|
data_points.append((current_date, rate))
|
||||||
|
current_date += timedelta(days=1)
|
||||||
|
|
||||||
|
# Count by different periods
|
||||||
|
period_counts = {
|
||||||
|
"year": len(data_points),
|
||||||
|
"half_year": {},
|
||||||
|
"quarter": {},
|
||||||
|
"month": {},
|
||||||
|
"week": {},
|
||||||
|
}
|
||||||
|
|
||||||
|
# Half years
|
||||||
|
period_counts["half_year"]["H1"] = len(
|
||||||
|
[d for d in data_points if d[0].month <= 6]
|
||||||
|
)
|
||||||
|
period_counts["half_year"]["H2"] = len(
|
||||||
|
[d for d in data_points if d[0].month > 6]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Quarters
|
||||||
|
for quarter in range(1, 5):
|
||||||
|
start_month = (quarter - 1) * 3 + 1
|
||||||
|
end_month = quarter * 3
|
||||||
|
period_counts["quarter"][f"Q{quarter}"] = len(
|
||||||
|
[d for d in data_points if start_month <= d[0].month <= end_month]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Months
|
||||||
|
for month in range(1, 13):
|
||||||
|
period_counts["month"][f"{month:02d}"] = len(
|
||||||
|
[d for d in data_points if d[0].month == month]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Weeks (approximate by week number)
|
||||||
|
week_counts = {}
|
||||||
|
for data_point in data_points:
|
||||||
|
week_num = data_point[0].isocalendar()[1]
|
||||||
|
week_key = f"W{week_num:02d}"
|
||||||
|
week_counts[week_key] = week_counts.get(week_key, 0) + 1
|
||||||
|
period_counts["week"] = week_counts
|
||||||
|
|
||||||
|
results[str(check_year)] = period_counts
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def detect_temporal_gaps(currency_code, year=None, max_gap_days=3):
|
||||||
|
"""
|
||||||
|
Detect temporal gaps in data sequence (missing working days).
|
||||||
|
|
||||||
|
:param currency_code: Currency to validate
|
||||||
|
:param year: Optional year filter
|
||||||
|
:param max_gap_days: Maximum acceptable working days gap
|
||||||
|
:return: List of gap violations
|
||||||
|
"""
|
||||||
|
gaps = []
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Get all dates and rates for the currency/year
|
||||||
|
rates_data = []
|
||||||
|
if year:
|
||||||
|
# Specific year
|
||||||
|
start_date = datetime(year, 1, 1)
|
||||||
|
end_date = datetime(year, 12, 31)
|
||||||
|
else:
|
||||||
|
# All available data
|
||||||
|
years_with_data = database.get_years_with_data()
|
||||||
|
if not years_with_data:
|
||||||
|
return gaps
|
||||||
|
start_year = min(years_with_data)
|
||||||
|
end_year = max(years_with_data)
|
||||||
|
start_date = datetime(start_year, 1, 1)
|
||||||
|
end_date = datetime(end_year, 12, 31)
|
||||||
|
|
||||||
|
current_date = start_date
|
||||||
|
while current_date <= datetime.now() and current_date <= end_date:
|
||||||
|
date_str = current_date.strftime("%d.%m.%Y")
|
||||||
|
rate = database.get_rate(date_str, currency_code)
|
||||||
|
if rate is not None:
|
||||||
|
rates_data.append((current_date, rate, date_str))
|
||||||
|
current_date += timedelta(days=1)
|
||||||
|
|
||||||
|
# Check for gaps between consecutive data points
|
||||||
|
for i in range(1, len(rates_data)):
|
||||||
|
prev_date, _, prev_date_str = rates_data[i - 1]
|
||||||
|
curr_date, _, curr_date_str = rates_data[i]
|
||||||
|
|
||||||
|
# Calculate working days gap
|
||||||
|
working_days_gap = calculate_working_days_gap(prev_date, curr_date)
|
||||||
|
|
||||||
|
if working_days_gap > max_gap_days:
|
||||||
|
# Determine severity
|
||||||
|
severity = "minor"
|
||||||
|
if working_days_gap > max_gap_days * 3:
|
||||||
|
severity = "severe"
|
||||||
|
elif working_days_gap > max_gap_days * 2:
|
||||||
|
severity = "moderate"
|
||||||
|
|
||||||
|
gap = {
|
||||||
|
"start_date": prev_date_str,
|
||||||
|
"end_date": curr_date_str,
|
||||||
|
"working_days_missing": working_days_gap,
|
||||||
|
"severity": severity,
|
||||||
|
"max_expected_gap": max_gap_days,
|
||||||
|
"recommendation": f"Check data source for {working_days_gap} missing working days",
|
||||||
|
}
|
||||||
|
gaps.append(gap)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
debug_print(f"Error detecting temporal gaps: {e}")
|
||||||
|
|
||||||
|
return gaps
|
||||||
|
|
||||||
|
|
||||||
|
def detect_price_change_violations(
|
||||||
|
currency_code, year=None, base_threshold=1.0, adaptive=True
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Detects price changes exceeding thresholds.
|
||||||
|
|
||||||
|
:param currency_code: Currency to validate
|
||||||
|
:param year: Optional year filter
|
||||||
|
:param base_threshold: Base threshold percentage
|
||||||
|
:param adaptive: Whether to use adaptive threshold
|
||||||
|
:return: List of violations
|
||||||
|
"""
|
||||||
|
violations = []
|
||||||
|
|
||||||
|
# Initialize adaptive_info in case of early exception
|
||||||
|
adaptive_info = {
|
||||||
|
"adaptive_threshold": base_threshold,
|
||||||
|
"base_threshold": base_threshold,
|
||||||
|
"volatility_percent": 0.0,
|
||||||
|
"sufficient_data": True,
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Get adaptive threshold if enabled
|
||||||
|
if adaptive:
|
||||||
|
adaptive_info = calculate_adaptive_threshold(currency_code, base_threshold)
|
||||||
|
|
||||||
|
effective_threshold = adaptive_info["adaptive_threshold"]
|
||||||
|
|
||||||
|
# Get all dates and rates for the currency/year
|
||||||
|
rates_data = []
|
||||||
|
if year:
|
||||||
|
# Specific year
|
||||||
|
start_date = datetime(year, 1, 1)
|
||||||
|
end_date = datetime(year, 12, 31)
|
||||||
|
else:
|
||||||
|
# All available data
|
||||||
|
years_with_data = database.get_years_with_data()
|
||||||
|
if not years_with_data:
|
||||||
|
return violations, adaptive_info
|
||||||
|
start_year = min(years_with_data)
|
||||||
|
end_year = max(years_with_data)
|
||||||
|
start_date = datetime(start_year, 1, 1)
|
||||||
|
end_date = datetime(end_year, 12, 31)
|
||||||
|
|
||||||
|
current_date = start_date
|
||||||
|
while current_date <= datetime.now() and current_date <= end_date:
|
||||||
|
date_str = current_date.strftime("%d.%m.%Y")
|
||||||
|
rate = database.get_rate(date_str, currency_code)
|
||||||
|
if rate is not None:
|
||||||
|
rates_data.append((current_date, rate, date_str))
|
||||||
|
current_date += timedelta(days=1)
|
||||||
|
|
||||||
|
# Check consecutive pairs
|
||||||
|
for i in range(1, len(rates_data)):
|
||||||
|
prev_date, prev_rate, prev_date_str = rates_data[i - 1]
|
||||||
|
curr_date, curr_rate, curr_date_str = rates_data[i]
|
||||||
|
|
||||||
|
if prev_rate > 0:
|
||||||
|
change_pct = abs((curr_rate - prev_rate) / prev_rate) * 100
|
||||||
|
|
||||||
|
# Determine severity
|
||||||
|
severity = "minor"
|
||||||
|
if change_pct > effective_threshold * 3:
|
||||||
|
severity = "severe"
|
||||||
|
elif change_pct > effective_threshold:
|
||||||
|
severity = "moderate"
|
||||||
|
|
||||||
|
# Flag if exceeds base threshold (always) or adaptive threshold
|
||||||
|
if change_pct > base_threshold:
|
||||||
|
violation = {
|
||||||
|
"date": curr_date_str,
|
||||||
|
"previous_date": prev_date_str,
|
||||||
|
"previous_rate": float(prev_rate),
|
||||||
|
"current_rate": float(curr_rate),
|
||||||
|
"change_percent": round(change_pct, 2),
|
||||||
|
"severity": severity,
|
||||||
|
"threshold_exceeded": "adaptive"
|
||||||
|
if change_pct > effective_threshold
|
||||||
|
else "base",
|
||||||
|
"effective_threshold": effective_threshold,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Add corruption risk assessment for severe cases
|
||||||
|
if severity == "severe":
|
||||||
|
violation["corruption_risk"] = "high"
|
||||||
|
violation["recommendation"] = (
|
||||||
|
"Verify data source - potential currency mismatch or data corruption"
|
||||||
|
)
|
||||||
|
|
||||||
|
violations.append(violation)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
debug_print(f"Error detecting price changes: {e}")
|
||||||
|
|
||||||
|
return violations, adaptive_info
|
||||||
|
|
||||||
|
|
||||||
|
def validate_currency_data(
|
||||||
|
currency_code, year=None, base_threshold=1.0, adaptive=True, max_gap_days=3
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Comprehensive validation for a currency.
|
||||||
|
|
||||||
|
:param currency_code: Currency to validate
|
||||||
|
:param year: Optional year filter
|
||||||
|
:param base_threshold: Base threshold for price changes
|
||||||
|
:param adaptive: Whether to use adaptive thresholds
|
||||||
|
:param max_gap_days: Maximum acceptable working days gap
|
||||||
|
:return: Validation results
|
||||||
|
"""
|
||||||
|
results = {
|
||||||
|
"currency": currency_code,
|
||||||
|
"validation_year": year,
|
||||||
|
"validation_date": datetime.now().isoformat() + "Z",
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Price change violations
|
||||||
|
violations, adaptive_info = detect_price_change_violations(
|
||||||
|
currency_code, year, base_threshold, adaptive
|
||||||
|
)
|
||||||
|
|
||||||
|
# Temporal gaps
|
||||||
|
gaps = detect_temporal_gaps(currency_code, year, max_gap_days)
|
||||||
|
|
||||||
|
# Trading days validation
|
||||||
|
trading_days_validation = None
|
||||||
|
if year:
|
||||||
|
trading_days_validation = validate_trading_days_count(currency_code, year)
|
||||||
|
|
||||||
|
# Record counts by period
|
||||||
|
record_counts = get_record_counts_by_period(currency_code, year)
|
||||||
|
|
||||||
|
results["adaptive_analysis"] = adaptive_info
|
||||||
|
results["price_change_violations"] = violations
|
||||||
|
results["temporal_gaps"] = gaps
|
||||||
|
results["trading_days_validation"] = trading_days_validation
|
||||||
|
results["record_counts_by_period"] = record_counts
|
||||||
|
|
||||||
|
# Summary statistics
|
||||||
|
severity_counts = defaultdict(int)
|
||||||
|
for v in violations:
|
||||||
|
severity_counts[v["severity"]] += 1
|
||||||
|
|
||||||
|
gap_severity_counts = defaultdict(int)
|
||||||
|
for g in gaps:
|
||||||
|
gap_severity_counts[g["severity"]] += 1
|
||||||
|
|
||||||
|
results["summary"] = {
|
||||||
|
"total_violations": len(violations),
|
||||||
|
"total_gaps": len(gaps),
|
||||||
|
"severity_breakdown": dict(severity_counts),
|
||||||
|
"gap_severity_breakdown": dict(gap_severity_counts),
|
||||||
|
"base_threshold": base_threshold,
|
||||||
|
"adaptive_enabled": adaptive,
|
||||||
|
"max_gap_days": max_gap_days,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Data quality score (enhanced heuristic)
|
||||||
|
quality_penalty = 0
|
||||||
|
if violations:
|
||||||
|
quality_penalty += (
|
||||||
|
len(violations) * 5 + severity_counts.get("severe", 0) * 20
|
||||||
|
)
|
||||||
|
if gaps:
|
||||||
|
quality_penalty += (
|
||||||
|
len(gaps) * 10 + gap_severity_counts.get("severe", 0) * 30
|
||||||
|
)
|
||||||
|
if trading_days_validation and trading_days_validation["severity"] != "ok":
|
||||||
|
severity_penalty = {"minor": 5, "moderate": 15, "severe": 30}
|
||||||
|
quality_penalty += severity_penalty.get(
|
||||||
|
trading_days_validation["severity"], 0
|
||||||
|
)
|
||||||
|
|
||||||
|
results["data_quality_score"] = max(0, 100 - quality_penalty)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
results["error"] = str(e)
|
||||||
|
results["data_quality_score"] = 0
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def validate_all_currencies(
|
||||||
|
year=None, base_threshold=1.0, adaptive=True, max_gap_days=3
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Validates all available currencies.
|
||||||
|
|
||||||
|
:param year: Optional year filter
|
||||||
|
:param base_threshold: Base threshold for price changes
|
||||||
|
:param adaptive: Whether to use adaptive thresholds
|
||||||
|
:param max_gap_days: Maximum acceptable working days gap
|
||||||
|
:return: Validation results for all currencies
|
||||||
|
"""
|
||||||
|
results = {
|
||||||
|
"validation_type": "all_currencies",
|
||||||
|
"validation_year": year,
|
||||||
|
"base_threshold": base_threshold,
|
||||||
|
"adaptive_enabled": adaptive,
|
||||||
|
"max_gap_days": max_gap_days,
|
||||||
|
"validation_date": datetime.now().isoformat() + "Z",
|
||||||
|
"currency_results": [],
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Get all available currencies (we'll check a few known ones and any in database)
|
||||||
|
currencies_to_check = ["USD", "EUR", "GBP", "CHF", "JPY"]
|
||||||
|
|
||||||
|
for currency in currencies_to_check:
|
||||||
|
try:
|
||||||
|
currency_result = validate_currency_data(
|
||||||
|
currency, year, base_threshold, adaptive, max_gap_days
|
||||||
|
)
|
||||||
|
results["currency_results"].append(currency_result)
|
||||||
|
except Exception as e:
|
||||||
|
results["currency_results"].append(
|
||||||
|
{"currency": currency, "error": str(e)}
|
||||||
|
)
|
||||||
|
|
||||||
|
# Overall summary
|
||||||
|
total_violations = sum(
|
||||||
|
r.get("summary", {}).get("total_violations", 0)
|
||||||
|
for r in results["currency_results"]
|
||||||
|
if "summary" in r
|
||||||
|
)
|
||||||
|
total_gaps = sum(
|
||||||
|
r.get("summary", {}).get("total_gaps", 0)
|
||||||
|
for r in results["currency_results"]
|
||||||
|
if "summary" in r
|
||||||
|
)
|
||||||
|
severe_violations = sum(
|
||||||
|
r.get("summary", {}).get("severity_breakdown", {}).get("severe", 0)
|
||||||
|
for r in results["currency_results"]
|
||||||
|
if "summary" in r
|
||||||
|
)
|
||||||
|
severe_gaps = sum(
|
||||||
|
r.get("summary", {}).get("gap_severity_breakdown", {}).get("severe", 0)
|
||||||
|
for r in results["currency_results"]
|
||||||
|
if "summary" in r
|
||||||
|
)
|
||||||
|
|
||||||
|
results["overall_summary"] = {
|
||||||
|
"currencies_checked": len(results["currency_results"]),
|
||||||
|
"total_violations": total_violations,
|
||||||
|
"total_gaps": total_gaps,
|
||||||
|
"severe_violations": severe_violations,
|
||||||
|
"severe_gaps": severe_gaps,
|
||||||
|
}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
results["error"] = str(e)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def format_validation_text(results):
|
||||||
|
"""Format validation results as text output."""
|
||||||
|
output = []
|
||||||
|
|
||||||
|
if "currency" in results:
|
||||||
|
# Single currency validation
|
||||||
|
output.append(
|
||||||
|
f"Currency Validation: {results['currency']} ({results.get('validation_year', 'All Years')})"
|
||||||
|
)
|
||||||
|
output.append("=" * 60)
|
||||||
|
|
||||||
|
adaptive = results.get("adaptive_analysis", {})
|
||||||
|
if adaptive.get("sufficient_data", False):
|
||||||
|
output.append("\nAdaptive Analysis (3-month history):")
|
||||||
|
output.append(
|
||||||
|
f"- Historical volatility: {adaptive.get('volatility_percent', 0):.1f}% std dev"
|
||||||
|
)
|
||||||
|
output.append(
|
||||||
|
f"- Adaptive threshold: {adaptive.get('adaptive_threshold', 1.0):.1f}% (base: {adaptive.get('base_threshold', 1.0)}%)"
|
||||||
|
)
|
||||||
|
output.append(f"- Data points analyzed: {adaptive.get('data_points', 0)}")
|
||||||
|
else:
|
||||||
|
output.append(
|
||||||
|
f"\nAdaptive Analysis: Insufficient data (using base threshold: {adaptive.get('base_threshold', 1.0)}%)"
|
||||||
|
)
|
||||||
|
|
||||||
|
violations = results.get("price_change_violations", [])
|
||||||
|
if violations:
|
||||||
|
output.append("\nPrice Change Violations:")
|
||||||
|
for i, v in enumerate(violations, 1):
|
||||||
|
severity = v["severity"].upper()
|
||||||
|
output.append(
|
||||||
|
f"{i}. [{severity}] {v['date']}: {v['previous_rate']:.2f} → {v['current_rate']:.2f} ({'+' if v['change_percent'] > 0 else ''}{v['change_percent']:.2f}%)"
|
||||||
|
)
|
||||||
|
if "recommendation" in v:
|
||||||
|
output.append(f" → {v['recommendation']}")
|
||||||
|
else:
|
||||||
|
output.append("\nPrice Change Violations: None found")
|
||||||
|
|
||||||
|
gaps = results.get("temporal_gaps", [])
|
||||||
|
if gaps:
|
||||||
|
output.append("\nTemporal Gaps:")
|
||||||
|
for i, g in enumerate(gaps, 1):
|
||||||
|
severity = g["severity"].upper()
|
||||||
|
output.append(
|
||||||
|
f"{i}. [{severity}] {g['start_date']} → {g['end_date']}: {g['working_days_missing']} working days missing"
|
||||||
|
)
|
||||||
|
if "recommendation" in g:
|
||||||
|
output.append(f" → {g['recommendation']}")
|
||||||
|
else:
|
||||||
|
output.append("\nTemporal Gaps: None found")
|
||||||
|
|
||||||
|
# Trading days validation
|
||||||
|
trading_validation = results.get("trading_days_validation")
|
||||||
|
if trading_validation:
|
||||||
|
output.append("\nTrading Days Validation:")
|
||||||
|
output.append(
|
||||||
|
f"- Expected trading days: {trading_validation['expected_trading_days']} ({trading_validation.get('total_days', 'N/A')} total - {trading_validation.get('weekend_days_excluded', 0)} weekends - {trading_validation.get('holiday_days_excluded', 0)} holidays)"
|
||||||
|
)
|
||||||
|
output.append(
|
||||||
|
f"- Actual data points: {trading_validation['actual_data_points']}"
|
||||||
|
)
|
||||||
|
output.append(
|
||||||
|
f"- Discrepancy: {trading_validation['discrepancy_days']} days ({trading_validation['discrepancy_percent']}%)"
|
||||||
|
)
|
||||||
|
output.append(
|
||||||
|
f"- Data completeness: {trading_validation['data_completeness_percent']}%"
|
||||||
|
)
|
||||||
|
output.append(f"- Status: {trading_validation['severity'].upper()}")
|
||||||
|
|
||||||
|
# Record counts by period
|
||||||
|
record_counts = results.get("record_counts_by_period", {})
|
||||||
|
if record_counts:
|
||||||
|
for year_key, periods in record_counts.items():
|
||||||
|
output.append(f"\nRecord Counts for {year_key}:")
|
||||||
|
output.append(f"- Year total: {periods.get('year', 0)} records")
|
||||||
|
|
||||||
|
# Half years
|
||||||
|
half_years = periods.get("half_year", {})
|
||||||
|
if half_years:
|
||||||
|
output.append(
|
||||||
|
f"- Half years: H1={half_years.get('H1', 0)}, H2={half_years.get('H2', 0)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Quarters
|
||||||
|
quarters = periods.get("quarter", {})
|
||||||
|
if quarters:
|
||||||
|
quarter_str = ", ".join(
|
||||||
|
[f"Q{q}={quarters.get(f'Q{q}', 0)}" for q in range(1, 5)]
|
||||||
|
)
|
||||||
|
output.append(f"- Quarters: {quarter_str}")
|
||||||
|
|
||||||
|
# Months summary
|
||||||
|
months = periods.get("month", {})
|
||||||
|
if months:
|
||||||
|
month_list = [
|
||||||
|
f"{m}={months.get(f'{int(m):02d}', 0)}"
|
||||||
|
for m in [
|
||||||
|
"01",
|
||||||
|
"02",
|
||||||
|
"03",
|
||||||
|
"04",
|
||||||
|
"05",
|
||||||
|
"06",
|
||||||
|
"07",
|
||||||
|
"08",
|
||||||
|
"09",
|
||||||
|
"10",
|
||||||
|
"11",
|
||||||
|
"12",
|
||||||
|
]
|
||||||
|
]
|
||||||
|
output.append(f"- Months: {', '.join(month_list)}")
|
||||||
|
|
||||||
|
# Weeks summary (show first few and indicate total)
|
||||||
|
weeks = periods.get("week", {})
|
||||||
|
if weeks:
|
||||||
|
total_weeks = len(weeks)
|
||||||
|
if total_weeks <= 10:
|
||||||
|
week_list = [f"{w}={weeks[w]}" for w in sorted(weeks.keys())]
|
||||||
|
output.append(f"- Weeks: {', '.join(week_list)}")
|
||||||
|
else:
|
||||||
|
sample_weeks = sorted(list(weeks.keys())[:5])
|
||||||
|
week_sample = [f"{w}={weeks[w]}" for w in sample_weeks]
|
||||||
|
output.append(
|
||||||
|
f"- Weeks: {', '.join(week_sample)}... ({total_weeks} total weeks)"
|
||||||
|
)
|
||||||
|
|
||||||
|
summary = results.get("summary", {})
|
||||||
|
quality_score = results.get("data_quality_score", 0)
|
||||||
|
output.append(f"\nData Quality Score: {quality_score}%")
|
||||||
|
output.append(f"Total violations: {summary.get('total_violations', 0)}")
|
||||||
|
output.append(f"Total gaps: {summary.get('total_gaps', 0)}")
|
||||||
|
|
||||||
|
elif "currency_results" in results:
|
||||||
|
# Multi-currency validation
|
||||||
|
output.append("Multi-Currency Validation Report")
|
||||||
|
output.append("=" * 60)
|
||||||
|
|
||||||
|
for currency_result in results["currency_results"]:
|
||||||
|
currency = currency_result.get("currency", "Unknown")
|
||||||
|
violations = currency_result.get("price_change_violations", [])
|
||||||
|
quality_score = currency_result.get("data_quality_score", 0)
|
||||||
|
|
||||||
|
output.append(f"\n{currency}:")
|
||||||
|
output.append(f" - Violations: {len(violations)}")
|
||||||
|
output.append(f" - Quality Score: {quality_score}%")
|
||||||
|
|
||||||
|
if violations:
|
||||||
|
severe_count = sum(1 for v in violations if v["severity"] == "severe")
|
||||||
|
output.append(f" - Severe violations: {severe_count}")
|
||||||
|
|
||||||
|
overall = results.get("overall_summary", {})
|
||||||
|
output.append("\nOverall Summary:")
|
||||||
|
output.append(f"- Currencies checked: {overall.get('currencies_checked', 0)}")
|
||||||
|
output.append(f"- Total violations: {overall.get('total_violations', 0)}")
|
||||||
|
output.append(f"- Severe violations: {overall.get('severe_violations', 0)}")
|
||||||
|
|
||||||
|
return "\n".join(output)
|
||||||
@@ -224,6 +224,35 @@ def _is_year_complete_for_tax_calculation(year):
|
|||||||
return True
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def _auto_download_missing_monthly_data(year, currency_code, output_dir="data"):
|
||||||
|
"""
|
||||||
|
Automatically download missing monthly data for tax calculation (silent operation).
|
||||||
|
|
||||||
|
:param year: Year to check
|
||||||
|
:param currency_code: Currency code
|
||||||
|
:param output_dir: Output directory
|
||||||
|
"""
|
||||||
|
missing_months = get_missing_months_for_tax_calculation(year, currency_code)
|
||||||
|
if missing_months:
|
||||||
|
debug_print(
|
||||||
|
f"Auto-downloading missing monthly data for {currency_code} {year}: months {', '.join(f'{m:02d}' for m in missing_months)}"
|
||||||
|
)
|
||||||
|
for month in missing_months:
|
||||||
|
start_date = f"01.{month:02d}.{year}"
|
||||||
|
last_day = calendar.monthrange(year, month)[1]
|
||||||
|
end_date = f"{last_day:02d}.{month:02d}.{year}"
|
||||||
|
try:
|
||||||
|
data_fetcher.download_monthly_data(
|
||||||
|
currency_code, start_date, end_date, output_dir=output_dir
|
||||||
|
)
|
||||||
|
# Small delay to be respectful to the API
|
||||||
|
time.sleep(0.5)
|
||||||
|
except Exception as e:
|
||||||
|
debug_print(
|
||||||
|
f"Failed to download data for {currency_code} {month:02d}/{year}: {e}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def calculate_tax_yearly_average(year, currency_code, output_dir="data"):
|
def calculate_tax_yearly_average(year, currency_code, output_dir="data"):
|
||||||
"""
|
"""
|
||||||
Vypočítá 'Jednotný kurz' pro daňové účely podle metodiky ČNB.
|
Vypočítá 'Jednotný kurz' pro daňové účely podle metodiky ČNB.
|
||||||
@@ -238,31 +267,8 @@ def calculate_tax_yearly_average(year, currency_code, output_dir="data"):
|
|||||||
f"Vypočítávám 'Jednotný kurz' pro daňové účely podle metodiky ČNB pro {currency_code} za rok {year}..."
|
f"Vypočítávám 'Jednotný kurz' pro daňové účely podle metodiky ČNB pro {currency_code} za rok {year}..."
|
||||||
)
|
)
|
||||||
|
|
||||||
# Zkusíme stáhnout chybějící měsíční data
|
# Auto-download missing monthly data if needed (silent operation)
|
||||||
missing_months = get_missing_months_for_tax_calculation(year, currency_code)
|
_auto_download_missing_monthly_data(year, currency_code, output_dir)
|
||||||
if missing_months:
|
|
||||||
debug_print(
|
|
||||||
f"Nalezeny chybějící měsíce pro rok {year}: {', '.join(f'{m:02d}' for m in missing_months)}. Stahuji měsíční data..."
|
|
||||||
)
|
|
||||||
for month in missing_months:
|
|
||||||
start_date = f"01.{month:02d}.{year}"
|
|
||||||
last_day = calendar.monthrange(year, month)[1]
|
|
||||||
end_date = f"{last_day:02d}.{month:02d}.{year}"
|
|
||||||
debug_print(
|
|
||||||
f"Stahuji měsíční data pro {currency_code} za {month:02d}/{year}..."
|
|
||||||
)
|
|
||||||
data_fetcher.download_monthly_data(
|
|
||||||
currency_code, start_date, end_date, output_dir="data"
|
|
||||||
)
|
|
||||||
# Přidáme zpoždění, abychom nezatěžovali API
|
|
||||||
time.sleep(1)
|
|
||||||
|
|
||||||
# Zkontrolujeme, zda je rok kompletní po stažení dat
|
|
||||||
if not _is_year_complete_for_tax_calculation(year):
|
|
||||||
debug_print(
|
|
||||||
f"Rok {year} není kompletní pro výpočet 'Jednotného kurzu'. Všechny měsíce musí mít dostupné kurzy k posledním dnům."
|
|
||||||
)
|
|
||||||
return None
|
|
||||||
|
|
||||||
# Zkontrolujeme, zda databáze obsahuje data pro daný rok
|
# Zkontrolujeme, zda databáze obsahuje data pro daný rok
|
||||||
if not rate_finder.check_year_data_in_db(year):
|
if not rate_finder.check_year_data_in_db(year):
|
||||||
|
|||||||
Reference in New Issue
Block a user