heyvisa

Document Analysis

Every document you upload runs through a four-stage pipeline: type detection, OCR + parsing, validation, and red-flag scoring. The structured result is then attached to any report that references the document.

Supported formats

FormatMax sizeNotes
PDF20 MBSearchable and scanned PDFs both supported.
JPG / JPEG10 MBBest for photos of physical documents.
PNG10 MBPreferred for screenshots and digital exports.
HEIC10 MBAutomatically transcoded to JPEG before processing.

The pipeline

1. Type detection. A lightweight classifier guesses the document type (passport, bank statement, employment letter, invitation, itinerary, etc.) so downstream extractors know which schema to apply.

2. OCR & structured extraction. Text is extracted with layout awareness; key fields (dates, IDs, amounts, names) are normalised into a typed schema per document type.

3. Validation. Each field is checked for plausibility (date ranges, currency consistency, name match across documents, passport MRZ check digits, etc.).

4. Red-flag detection. Pattern-level checks look for common reasons reviewers reject documents: expired-soon passports, inconsistent salaries, account balances that spike unnaturally days before submission, mismatched names, low-resolution photos of seals or stamps.

Per-document output

The result attached to each document is a stable JSON object with extracted, checks, and red_flags arrays. Reports use these as inputs into the document signal score.

json
{
  "id": "doc_01HXYZpassport",
  "type": "passport",
  "status": "analysed",
  "extracted": {
    "full_name": "AYŞE YILMAZ",
    "passport_number": "U12345678",
    "issued_at": "2022-03-04",
    "expires_at": "2032-03-03",
    "country": "TR"
  },
  "checks": [
    { "id": "expiry_buffer",  "status": "pass" },
    { "id": "mrz_consistency", "status": "pass" },
    { "id": "photo_quality",   "status": "pass" }
  ],
  "red_flags": []
}

Document types currently supported

  • Passport (biographical page)
  • National ID
  • Bank statements (Turkish, EU, US, UK formats)
  • Employment letter / pay slip
  • Tax return summary
  • Invitation letter (host or sponsor)
  • Hotel reservation / flight itinerary
  • Travel medical insurance
  • Previous visas and entry/exit stamps

Signal categories

Each check and red_flag belongs to a signal category. The category determines how the result feeds the document signal score and which recommendation templates it can trigger.

CategoryExample checksEffect when failed
validityexpiry_buffer, mrz_consistency, check_digitHard red flag — strongly lowers the document score.
consistencyname_match, dob_match, address_match across filesCross-document mismatch flagged for reviewer attention.
finance_integritybalance_stability, salary_consistency, deposit_spikeRaises a finances red flag and a proof-of-funds recommendation.
media_qualityphoto_quality, seal_resolution, page_completenessSoft flag — prompts a clearer re-upload recommendation.

Worked example

Suppose an applicant uploads a passport and a bank statement. Type detection labels each file, OCR extracts the fields, and validation runs the category checks. The passport passes expiry_buffer and mrz_consistency (as in the JSON above), but the statement trips finance_integrity.deposit_spike because a large deposit landed four days before submission. That single red flag lowers the document signal score and emits a finances recommendation asking the applicant to document the source of funds — the same event that the Recommendations page turns into rec_increase_funds_proof.

Personal data
OCR happens inside our isolated processing tier. Extracted text is encrypted at rest, scoped to the workspace, and deleted along with the parent report when you delete it. We never use customer documents for model training.