Lemmatizers

The analyzers behind Laesi.

A lemmatizer maps every inflected form of a word back to the one you actually learned. Laesi plugs into the best open analyzers the academic world has built — and credits every one of them.

Why it matters

How much a lemmatizer helps depends on the language

Without one, your vocabulary list fills up with every form of every word. How badly that hurts depends on how much the language inflects.

Analytical languages

Nice to have

English, Vietnamese, Indonesian. Words barely change shape, so even plain Wiktionary lookup gets you most of the way. A lemmatizer is a small convenience.

Agglutinative languages

A big help

Finnish, Hungarian, Estonian, Turkish. One word can take dozens of endings. A real analyzer is the difference between a clean vocabulary list and an unusable one.

Polysynthetic languages

Essential

Kalaallisut, Central Ojibwa. A single word can be a whole sentence. Without morpheme-level analysis you simply can't look words up — the analyzer isn't optional, it's the only way in.

An important distinction

We don't distribute the models — we point you to them

Laesi isn't shipping a pile of other people's language models. It tells you where to get the analyzer you need, helps you install it, and then uses it locally. The work — and the credit — belongs to the people who built each one.

The one exception is the Kalaallisut analyzer, which we compiled from source ourselves because its licence allows redistribution — a few hundred megabytes, included.

Available today

Three analyzers, wired in and working

spaCy

Available · every platform

Industrial-strength statistical NLP with models for most modern European languages — Germanic, Romance, Slavic, Baltic, Greek, Maltese. Lightweight, easy, and it runs the same on Mac, Windows, and Linux. For most learners of a mainstream language, this is all you need.

Source Explosion AI · spacy.io ↗

Voikko

Available · macOS / Linux / WSL

The reference-grade open analyzer for Finnish — all fifteen cases, consonant gradation, vowel harmony, and compound splitting. Fast and excellent. It needs an external system library, so it runs on macOS, Linux, or Windows via WSL. (It technically reaches a few other Finnic languages we haven't wired up yet.)

Source Harri Pitkänen & contributors · voikko.puimula.org ↗

GiellaLT

Available · macOS / Linux / WSL

Finite-state analyzers built by and for minority-language communities — the Sámi languages, Faroese, Kven, Meänkieli, Irish, Old Norse, Kalaallisut, and Central Ojibwa, with sentence-context disambiguation. A handful are hosted and download in one click; the rest are compiled from source (a power-user step on macOS, Linux, or WSL). This is the toolchain no commercial reader app ships.

Source UiT The Arctic University of Norway & the GiellaLT community · giellalt.github.io ↗

Planned

More analyzers on the way

A roadmap, not a promise. Some are quick; some are a lot of work. None of them have a date.

BÍN

Upcoming

The Database of Modern Icelandic Inflection (Árni Magnússon Institute) — an API, so wiring it in is straightforward. It will give Icelandic authoritative, reference-quality inflection lookup.

Stanza

Planned

Stanford's neural pipeline — like spaCy but heavyweight, covering 70+ languages. The path to real lemmatization for Estonian, Hungarian, Czech, and the other agglutinative languages on surface-form lookup today.

CLTK

Planned

The Classical Language Toolkit, built on Stanza — the right tool for Latin, Ancient Greek, Old English, and Old Norse (a better fit for Old Norse than GiellaLT).

TartuNLP

Researching

The University of Tartu's tools for Estonian — potentially as good for Estonian as Voikko is for Finnish. We're scoping how much work integration takes.

Apertium

Limited

A long-running finite-state platform with analyzers for many languages, sharing lineage with GiellaLT. Useful for filling specific gaps; a few targeted additions are planned.

Heavyweight analyzers like Stanza and CLTK can run locally or be paired with an LLM back end (Ollama, or OpenAI with your own API key). Track these and everything else on the roadmap.

Bring your own analyzer

If there's an analyzer we haven't wired in, you can often add it yourself through Laesi's custom-language tools — point it at a dictionary and an analyzer and go. It isn't strictly required to read a language, but when one exists, it makes everything better.

Credit where it's due

These analyzers are years of academic and community work. Laesi credits each project and follows its preferred citation. If you use Laesi's analysis in research, cite the underlying tool — each project's site above has the reference it asks for.

See which analyzer your language uses.

Every language page lists its analyzer, its status, and the dictionaries we recommend.

Browse 60+ languages Get Laesi — $99 once