DOCS/API Reference/Scoring API

Scoring API

Real-time ML scoring endpoints — first-user payer conversion (Starter Pack popup). Six features, 30-min post-install scoring, top-N% bucket output for in-session targeting.

10 min read

Overview

The Scoring API exposes real-time machine-learning endpoints for in-session player-targeting decisions. The first available model is the First-User Payer Conversion scorer — designed to drive the Starter Pack popup decision at the 30-minute post-install mark.

Send six behavioural features from a new install's first 30 minutes, get back a calibrated score in [0,1] (an isotonic-calibrated probability the user pays within D1) and a bucket label (top_1pct / top_5pct / top_10pct / rest) for operational thresholding. Firing the popup on top_5pct-or-higher targets the ~6% of installs that convert at ~26% (≈8.8× the base rate).

Endpoint	Method	Description
`/v1/scoring/first-user-conversion`	POST	Score D1 payer probability from first-30-min behaviour

Scope

The current artifact was trained on iOS US users only. Calls for Android or non-US users will return a score, but the model has never seen that distribution — block off-distribution traffic at your gateway. Cross-platform retrains are planned as drops arrive.

Authentication

All scoring endpoints require a client-side API key with the scoring:write scope.

Header	Value	Notes
`X-API-Key`	`pk_live_v1_...`	Client (pk_) key. Server (sk_) keys also accepted.
`Content-Type`	application/json
`Accept`	application/json

Scope required

Requests without scoring:write return 403. Contact your account team to grant the scope on existing keys.

Test Mode

Integrate end-to-end before going live with a test key. Test keys carry a pk_test_ prefix; live keys carry pk_live_. A test key hits the same endpoints, the same model, and the same thresholds, with an identical request and response shape. Scores from a test key are tagged and excluded from production analytics and A/B attribution, so sandbox calls never affect your reported numbers. Swap the key string when you go live; nothing else changes.

Magic device IDs

To exercise each popup branch without crafting feature payloads, send one of these reserved values as device_id. The response comes back in the named bucket (the model is skipped). These are honored only for test keys. A live key sending one gets a normal model score, so they are safe to leave in test code paths.

device_id	Forced bucket	Tests
`ILARA_TEST_TOP_1PCT`	`top_1pct`	Popup fires (highest confidence)
`ILARA_TEST_TOP_5PCT`	`top_5pct`	Popup fires (standard cohort)
`ILARA_TEST_TOP_10PCT`	`top_10pct`	Optional cheaper offer
`ILARA_TEST_REST`	`rest`	No popup

Force a bucket with a test key

bash

curl -X POST class="code-string">"https:class="code-commentclass="code-string">">//app.ilara.ai/api/v1/scoring/first-user-conversion/30m" \
  -H class="code-string">"X-API-Key: pk_test_v1_xxx" \
  -H class="code-string">"Content-Type: application/json" \
  -d class="code-string">'{"device_id": "ILARA_TEST_TOP_1PCT"}'
# -> 200, data.bucket == class="code-string">"top_1pct"

First-User Conversion

Predicts the probability that a new install will pay within their first day (D1). Use the bucket label to drive in-session offers like the Starter Pack popup.

Request

POST /v1/scoring/first-user-conversion/30m

bash

curl -X POST class="code-string">"https:class="code-commentclass="code-string">">//api.ilara.ai/v1/scoring/first-user-conversion/30m" \
  -H class="code-string">"X-API-Key: pk_live_v1_xxx" \
  -H class="code-string">"Content-Type: application/json" \
  -d '{
    class="code-string">"device_id": class="code-string">"DCD5D755-E12C-4E3C-9010-CDAE3232B059",
    class="code-string">"loot_distinct_types": 3,
    class="code-string">"sess_duration_total": 500.0,
    class="code-string">"match_avg_fight_time": 45.0,
    class="code-string">"match_wins": 4,
    class="code-string">"install_hour": 14,
    class="code-string">"vpay_count": 2
  }'
# Endpoints: .../30m (in-session) and .../12h (next-session). The bare path 404s.

Request body

One identity field plus six behavioural features observed in the first 30 minutes post-install. Every feature defaults to 0 — omit fields you don't have, never send null.

Field	Type	Required	Description
`device_id`	string (1-64)	Yes	The caller's device identifier (e.g. iOS IDFV/IDFA). Echoed in the response; stored with the score.
`loot_distinct_types`	int	No (0)	Count of distinct `_LootcaseType` values opened in the first 30 minutes. Top predictor — ~35% of model gain.
`sess_duration_total`	float (sec)	No (0)	Sum of `activityduration` across session-END events. Excludes idle/background time. NOT wall-clock elapsed time.
`match_avg_fight_time`	float (sec)	No (0)	Mean of `_FightTime` across all matches. NOT total match duration including loading.
`match_wins`	int	No (0)	Count of matches where `_Result == "WIN"` (exact uppercase).
`install_hour`	int 0-23	No (0)	Hour-of-day of install. UTC. Not IST, not device-local.
`vpay_count`	int	No (0)	Count of ALL virtual-currency events (earns + spends of in-game currency). NOT real-money in-app purchases.

Response

200 — Score returned

json

{
  class="code-string">"success": true,
  class="code-string">"data": {
    class="code-string">"device_id":       class="code-string">"DCD5D755-E12C-4E3C-9010-CDAE3232B059",
    class="code-string">"score":           0.162,
    class="code-string">"bucket":          class="code-string">"top_5pct",
    class="code-string">"model_version":   class="code-string">"a8e08f6f20bc",
    class="code-string">"thresholds_used": {
      class="code-string">"top_1pct":  0.4000,
      class="code-string">"top_5pct":  0.1009,
      class="code-string">"top_10pct": 0.0473
    },
    class="code-string">"scored_at":       class="code-string">"2026-05-20T14:30:00.913Z"
  },
  class="code-string">"error": null,
  class="code-string">"meta":  null
}

Field	Type	Description
`device_id`	string	Echoed from the request.
`score`	float [0,1]	Calibrated (isotonic) probability the user pays within D1.
`bucket`	enum	One of `top_1pct`, `top_5pct`, `top_10pct`, `rest`.
`model_version`	string	sha256[:12] of the artifact (model + calibrated thresholds). Rotates with each retrain.
`thresholds_used`	object	The calibrated 99/95/90th-percentile cutoffs (served model on an out-of-time hold-out) used to assign the bucket. Read these from the response; do NOT hardcode.
`scored_at`	RFC 3339	UTC timestamp when the score was computed.

Bucket Semantics

Cutoffs are the 90/95/99th percentiles of the served model's calibrated scores on an out-of-time hold-out — i.e. expressed in the same calibrated [0,1] probability space as score. They rotate with each retrain. Always read thresholds_used from the response — don't hardcode the cutoffs. The values below are the 30m model's; the 12h model's cutoffs are higher.

Bucket	Calibrated score ≥ (30m)	D1 payer rate	Lift	Recommended action
top_1pct	0.40	~73%	24×	Fire popup; highest confidence
top_5pct	0.10	~16%	5×	Fire popup; standard cohort
top_10pct	0.047	~6%	2×	Optional cheaper offer
rest	< 0.047	~1%	—	No popup; base rate cohort

Lift over base rate

Base D1 payer rate is ~2.6%. Payer rates above are each bucket's own band, measured on the out-of-time hold-out. For the popup decision, fire on bucket in {top_1pct, top_5pct} — that's the cumulative top ~6% of installs, converting at ~26% (≈8.8× lift), while the top_1pct bucket alone converts at ~73% (≈24× lift). The 12h model is sharper still (top-5%-or-higher ≈37%).

Call Timing

The model is trained on features observed in the user's first 30 minutes post-install. Score at 30 minutes (or at end-of-first-session, whichever fires first) for the calibration the model expects.

Field	≤15m	≤30m	≤60m	≤24h
`loot_distinct_types`	82.5%	83.3%	83.9%	86.3%
`sess_duration_total`	90.6%	97.2%	99.1%	99.4%
`match_avg_fight_time`	88.5%	89.0%	89.4%	90.9%
`match_wins`	88.4%	89.0%	89.3%	90.8%
`install_hour`	100%	100%	100%	100%
`vpay_count`	83.2%	83.9%	84.6%	86.9%

At 30 minutes, 81% of users (and 92% of payers) have all 6 fields populated. Scoring earlier than 30 minutes is supported but drops ROC-AUC below 0.75 — only do so if the popup must fire in-FTUE.

Model Card

Headline metrics for the current production artifact:

Metric	Value
Training cohort	20,048 iOS-US users (Mar 1 – Apr 30 2026)
Positives (D1 payers)	519 (2.59% base rate)
Features	6 numeric
ROC-AUC (out-of-fold)	0.8044
PR-AUC (out-of-fold)	0.2667
Served model	Full-data refit (all rows); CV-median depth
Calibration	Isotonic, fit on out-of-time hold-out
Lift @ top 1% (OOT)	24× (top_1pct converts ~73%)
Lift @ top 5%+ (OOT, cumulative)	8.8× (converts ~26%)
Cross-validation	5-fold StratifiedKFold
HPO objective	PR-AUC (50 Optuna TPE trials)
p95 latency	< 50 ms
Artifact size	~93 KB (model.joblib)

Field-Definition Gotchas

Six places where a backend-side bug will silently send values that don't match what the model was trained on. Confirm each with your engineering team before integration:

install_hour must be UTC, not IST or device-local. Off-by-5.5h on every score otherwise.
sess_duration_total uses the activityduration field on session-END events, not wall-clock elapsed time and not the (start_ts, end_ts) difference. activityduration excludes idle.
match_wins counts the exact uppercase string "WIN". "Win" and "win" are not matched.
vpay_count covers ALL virtual-currency events (earns + spends of in-game currency), NOT real-money IAPs. Sending IAP-count instead will yield values ~10× smaller than training.
loot_distinct_types counts distinct values of _LootcaseType. Use the same enumeration as devtodev exports.
Omit fields you don't have — send 0 if you must, never null. Pydantic strict-types numerics and 422s on null.

Error Responses

Status	Code	Description
401	`UNAUTHORIZED`	Missing or invalid X-API-Key
403	`FORBIDDEN`	API key lacks the scoring:write scope
422	`VALIDATION_ERROR`	Body validation failed (e.g. install_hour=24, null in numeric field)
503	`MODEL_UNAVAILABLE`	Scoring artifact failed to load — retry after 30s

Operational Notes

Idempotency

The endpoint is not idempotent in v1 — client-side retries produce additional ml_scores rows for the same device_id. Don't retry on 422; retry once after 30s on 503.

Persistence

Every score is persisted asynchronously to our ml_scores table (within ~1 s of the HTTP response) with device_id, score, bucket, model_fingerprint, the raw feature payload, and the game/tenant context. Available for A/B attribution joins.

Rate limits

Rate limit follows the standard tier scheme (1000 rpm for growth, 10000 for pro, unmetered for enterprise). For high-volume integrations, contact your account team — we can right-size dedicated capacity.

Next Steps

Authentication — API key types and the scoring:write scope
Events API — Track behaviour that feeds future scoring models
Churn Prediction — Complementary lifecycle modelling

Game BFF API