The analyzers behind Laesi.
A lemmatizer maps every inflected form of a word back to the one you actually learned. Laesi plugs into the best open analyzers the academic world has built — and credits every one of them.
How much a lemmatizer helps depends on the language
Without one, your vocabulary list fills up with every form of every word. How badly that hurts depends on how much the language inflects.
Nice to have
English, Vietnamese, Indonesian. Words barely change shape, so even plain Wiktionary lookup gets you most of the way. A lemmatizer is a small convenience.
A big help
Finnish, Hungarian, Estonian, Turkish. One word can take dozens of endings. A real analyzer is the difference between a clean vocabulary list and an unusable one.
Essential
Kalaallisut, Central Ojibwa. A single word can be a whole sentence. Without morpheme-level analysis you simply can't look words up — the analyzer isn't optional, it's the only way in.
We don't distribute the models — we point you to them
Laesi isn't shipping a pile of other people's language models. It tells you where to get the analyzer you need, helps you install it, and then uses it locally. The work — and the credit — belongs to the people who built each one.
The one exception is the Kalaallisut analyzer, which we compiled from source ourselves because its licence allows redistribution — a few hundred megabytes, included.
Three analyzers, wired in and working
spaCy
Available · every platformIndustrial-strength statistical NLP with models for most modern European languages — Germanic, Romance, Slavic, Baltic, Greek, Maltese. Lightweight, easy, and it runs the same on Mac, Windows, and Linux. For most learners of a mainstream language, this is all you need.
Source Explosion AI · spacy.io ↗
Voikko
Available · macOS / Linux / WSLThe reference-grade open analyzer for Finnish — all fifteen cases, consonant gradation, vowel harmony, and compound splitting. Fast and excellent. It needs an external system library, so it runs on macOS, Linux, or Windows via WSL. (It technically reaches a few other Finnic languages we haven't wired up yet.)
Source Harri Pitkänen & contributors · voikko.puimula.org ↗
GiellaLT
Available · macOS / Linux / WSLFinite-state analyzers built by and for minority-language communities — the Sámi languages, Faroese, Kven, Meänkieli, Irish, Old Norse, Kalaallisut, and Central Ojibwa, with sentence-context disambiguation. A handful are hosted and download in one click; the rest are compiled from source (a power-user step on macOS, Linux, or WSL). This is the toolchain no commercial reader app ships.
Source UiT The Arctic University of Norway & the GiellaLT community · giellalt.github.io ↗
More analyzers on the way
A roadmap, not a promise. Some are quick; some are a lot of work. None of them have a date.
BÍN
UpcomingThe Database of Modern Icelandic Inflection (Árni Magnússon Institute) — an API, so wiring it in is straightforward. It will give Icelandic authoritative, reference-quality inflection lookup.
Stanza
PlannedStanford's neural pipeline — like spaCy but heavyweight, covering 70+ languages. The path to real lemmatization for Estonian, Hungarian, Czech, and the other agglutinative languages on surface-form lookup today.
CLTK
PlannedThe Classical Language Toolkit, built on Stanza — the right tool for Latin, Ancient Greek, Old English, and Old Norse (a better fit for Old Norse than GiellaLT).
TartuNLP
ResearchingThe University of Tartu's tools for Estonian — potentially as good for Estonian as Voikko is for Finnish. We're scoping how much work integration takes.
Apertium
LimitedA long-running finite-state platform with analyzers for many languages, sharing lineage with GiellaLT. Useful for filling specific gaps; a few targeted additions are planned.
Heavyweight analyzers like Stanza and CLTK can run locally or be paired with an LLM back end (Ollama, or OpenAI with your own API key). Track these and everything else on the roadmap.
Bring your own analyzer
If there's an analyzer we haven't wired in, you can often add it yourself through Laesi's custom-language tools — point it at a dictionary and an analyzer and go. It isn't strictly required to read a language, but when one exists, it makes everything better.
Credit where it's due
These analyzers are years of academic and community work. Laesi credits each project and follows its preferred citation. If you use Laesi's analysis in research, cite the underlying tool — each project's site above has the reference it asks for.
See which analyzer your language uses.
Every language page lists its analyzer, its status, and the dictionaries we recommend.