λ Laesi Get Laesi

Get Laesi

Languages

60+ languages, a dozen families.

Laesi's language list is the whole point. We prioritize coverage in the families mass-market reader apps ignore — and cover the mainstream ones too. Click any language for reading samples, a suggested library, and exactly what kind of lemmatization it has today.

Full morphology Surface-form lookup Coming soon

Nordic & North Germanic

The living and medieval languages of Scandinavia, Iceland, and the Faroes.

Icelandic Surface

Íslenska · 350,000

Icelandic preserves Old Norse grammar more faithfully than any other living language. Laesi ships it today with Wiktionary lookups; BÍN-grade lemmatization is on the roadmap.

Wiktionary (BÍN planned) →

Føroyskt · 70,000

Spoken by 70,000 people on 18 islands. No major reader app supports it. Laesi does — with a GiellaLT morphological analyzer and Wiktionary lookups.

Norwegian Bokmål Full

Bokmål · written standard · ~5M users

Bokmål is the written standard in about 85% of Norwegian books, newspapers, and media. Full spaCy lemmatization on day one.

Norwegian Nynorsk Full

Nynorsk · written standard · ~600k users

Built from Norway's rural dialects by Ivar Aasen in the 1850s. If you read Tarjei Vesaas or Jon Fosse, you want Nynorsk.

spaCy (Bokmål model) →

Dansk · 6 million

Danish grammar is simple; Danish pronunciation is famously not. Laesi focuses on what you study in text — vocabulary, idioms, phrasal verbs.

Svenska · 10 million

Swedish has the largest literary output of the Nordic languages. Laesi handles its compound nouns, particle verbs, and two genders through spaCy.

Norrœnt mál · medieval language (9th–14th c.)

The medieval language of Iceland, Norway, and the Viking diaspora. Almost no reading app supports it. Laesi gives it a GiellaLT analyzer — with CLTK, a better fit for Old Norse, on the roadmap.

GiellaLT (CLTK planned) →

Elfdalian Surface

Övdalska · 3,000

A conservative North Germanic variety of Älvdalen, Sweden, that kept features lost everywhere else. Surface-form lookup today.

West Germanic

German and its relatives, from Standard German to the regional and minority varieties.

Deutsch · 95 million

Full spaCy lemmatization with case, gender, and separable-verb handling. The base model also powers Swiss German and Bavarian.

Nederlands · 24 million

Full spaCy lemmatization. Handles Dutch compounds, separable verbs, and diminutives.

Swiss German Surface

Schwiizerdütsch · 5 million

Lemmas are stored as Standard German forms — there's no standard Swiss German orthography — so lemmatization is partial but useful.

spaCy (Standard German) →

Bavarian Surface

Boarisch · 12 million

Lemmas are stored as Standard German forms. Partial lemmatization, full reading and lookup support.

spaCy (Standard German) →

Low German Surface

Plattdüütsch · 2.5 million

German Low German (SASS orthography). Surface-form lookup against Wiktionary's Low German section.

Dutch Low Saxon Surface

Nedersaksisch · 1.5 million

Low German as written in the Netherlands (NSS). A separate language entry from German Low German.

Luxembourgish Surface

Lëtzebuergesch · 400,000

Surface-form lookup today. A national language with a small but growing written corpus.

West Frisian Surface

Frysk · 470,000

The closest living relative of English. Surface-form lookup against Wiktionary.

North Frisian Surface

Nordfriisk · 10,000

A cluster of endangered Frisian dialects on the German North Sea coast. Surface-form lookup.

Scots · 1.5 million

The Germanic sister language of English, spoken in Lowland Scotland and Ulster. Surface-form lookup.

Romance

The daughters of Latin, with full lemmatization across the major standards.

Français · 80 million

Full spaCy lemmatization with elision, contraction, and verb-conjugation handling.

Español · 485 million

Full spaCy lemmatization. Handles enclitic pronouns and the full verb paradigm.

Italiano · 67 million

Full spaCy lemmatization with articulated prepositions and clitic handling.

Portuguese Full

Português · 260 million

Full spaCy lemmatization covering both European and Brazilian orthography.

Română · 24 million

Full spaCy lemmatization. EPUB import normalizes legacy cedilla forms to comma-below diacritics.

Català · 9 million

Full spaCy lemmatization for the language of Catalonia, Valencia, and the Balearics.

Galician Surface

Galego · 2.4 million

Surface-form lookup today for the Romance language of northwestern Spain.

Slavic

East, West, and South Slavic — with full lemmatization for the major standards and script-aware handling.

Polski · 40 million

Full spaCy lemmatization across Polish's seven cases and complex consonant alternations.

Русский · 150 million

Full spaCy lemmatization with case, aspect, and Cyrillic handling.

Українська · 40 million

Full spaCy lemmatization for Ukrainian's seven cases and verbal aspect.

Български · 8 million

Full spaCy lemmatization. Bulgarian's postposed definite article handled correctly.

Slovenščina · 2.5 million

Full spaCy lemmatization, including the rare dual number.

Macedonian Full

Македонски · 1.6 million

Full spaCy lemmatization for the South Slavic language of North Macedonia.

Hrvatski · 5 million

Full spaCy lemmatization. The Croatian model also backs Serbian (Latin).

Serbian (Latin) Full

Srpski · 9 million

Lemmatized via the Croatian model with a Cyrillic-leak guard. A dedicated Serbian model is planned.

spaCy (Croatian model) →

Serbian (Cyrillic) Surface

Српски · 9 million

A separate language entry from Serbian Latin. Surface-form lookup; CLASSLA lemmatization planned.

Čeština · 10 million

Surface-form lookup today. A Stanza-based lemmatizer is planned.

Slovenčina · 5 million

Surface-form lookup today. A Stanza-based lemmatizer is planned.

Belarusian Surface

Беларуская · 5 million

Surface-form lookup today. A Stanza-based lemmatizer is planned.

Bosnian (Latin) Surface

Bosanski · 2.5 million

Surface-form lookup. A separate entry from Bosnian Cyrillic; CLASSLA lemmatization planned.

Bosnian (Cyrillic) Surface

Босански · 2.5 million

Surface-form lookup. A separate entry from Bosnian Latin.

Baltic

The conservative Indo-European languages of the eastern Baltic.

Lithuanian Full

Lietuvių · 3 million

Full spaCy lemmatization for one of the most archaic living Indo-European languages.

Latviešu · 1.5 million

Full spaCy lemmatization across Latvian's seven cases.

Uralic & Finno-Ugric

Finnic, Sámi, and their relatives — agglutinative, morphology-heavy, and underserved.

Suomi · 5.5 million

Finnish has a reputation for being hard because its words shape-shift. Voikko lemmatization means you see every form of a word as one entry — not fifteen.

North Sámi Full

Davvisámegiella · 25,000

North Sámi is spoken across the Arctic reaches of Norway, Sweden, and Finland. Laesi ships it with GiellaLT, the analyzer the Sámi language community built.

Lule Sámi Full

Julevsámegiella · 2,000

Full GiellaLT morphology. Wiktionary coverage is thin but the analyzer is solid.

South Sámi Full

Åarjelsaemien gïele · 600

Full GiellaLT morphology for the southernmost Sámi language.

Inari Sámi Full

Anarâškielâ · 400

Full GiellaLT morphology for the Sámi language of the Inari region in Finland.

Skolt Sámi Full

Sääʹmǩiõll · 300

Full GiellaLT morphology for one of the most endangered Sámi languages.

Kvääni · 5,000

A Finnic language of northern Norway. Full GiellaLT morphology, thin Wiktionary.

Meänkieli Full

Meänkieli · 40,000

A Finnic minority language of the Torne Valley in Sweden. Full GiellaLT morphology.

Estonian Surface

Eesti · 1.1 million

Surface-form lookup today; a Stanza lemmatizer is planned. Agglutinative, so inflected hit rates are low until then.

Hungarian Surface

Magyar · 13 million

Surface-form lookup today; a Stanza lemmatizer is planned. Highly agglutinative.

Celtic

Goidelic and Brythonic languages with initial mutations and VSO grammar.

Gaeilge · 1.7 million (some fluency)

Irish's grammatical mutations (séimhiú, urú) break naive word lookup. Laesi ships a GiellaLT analyzer that normalizes them back to dictionary form.

Cymraeg · 900,000

Laesi supports Welsh today with Wiktionary surface-form lookup. A morphological analyzer is being researched — Welsh has no GiellaLT model, so we're evaluating the alternatives.

Hellenic & Mediterranean

Greek, Maltese, and Albanian — the Mediterranean languages other readers skip.

Ελληνικά · 13 million

Full spaCy lemmatization for Modern Greek, including the polytonic-to-monotonic normalization.

Malti · 520,000

The only Semitic language written in Latin script and an EU official language. Full spaCy lemmatization.

Albanian Surface

Shqip · 7.5 million

Surface-form lookup today for the sole surviving branch of its own Indo-European family.

Turkic

Agglutinative Turkic languages — surface-form lookup today, lemmatization on the roadmap.

Turkish Surface

Türkçe · 85 million

Surface-form lookup today. Highly agglutinative, so a dedicated lemmatizer is a priority for inflected hit rates.

Azerbaijani Surface

Azərbaycan dili · 23 million

Surface-form lookup today for the Turkic language of Azerbaijan and northwestern Iran.

Indigenous & Endangered

Languages with small speaker communities and active revitalization — exactly the ones the big apps will never ship.

Kalaallisut Full

Greenlandic · 56,000

Kalaallisut builds whole sentences into single words. Laesi supports it with a GiellaLT analyzer — root-level by default, morpheme-level optional.

Ojibwemowin · 50,000

Southwestern Ojibwe (Ojibwemowin) — the Minnesota dialect cluster. Surface-form lookup today.

Anishinaabemowin · 50,000

Central Ojibwa (ISO ojc) — the Ontario dialect cluster, and the variety with a GiellaLT analyzer. Built from source; Wiktionary coverage is very thin.

World languages

Major and minor languages from beyond Europe — readable today with surface-form lookup.

Indonesian Surface

Bahasa Indonesia · 200 million

Surface-form lookup today. Light morphology means hit rates are already good.

Bahasa Melayu · 80 million

Surface-form lookup today for the language of Malaysia, Brunei, and Singapore.

Filipino / Tagalog Surface

Wikang Tagalog · 28 million

Surface-form lookup today for the national language of the Philippines.

Swahili Surface

Kiswahili · 80 million

Surface-form lookup today for East Africa's lingua franca.

Haitian Creole Surface

Kreyòl ayisyen · 12 million

Surface-form lookup today for the French-based creole of Haiti.

Vietnamese Surface

Tiếng Việt · 85 million

Surface-form lookup today. Analytic grammar means inflection is rarely an obstacle.

Esperanto Surface

Esperanto · constructed language

Surface-form lookup for the world's most successful constructed language.

Classical & Historical

Dead languages with living literatures. Coming as the CLTK analyzers land.

Lingua Latina · liturgical & scholarly

Latin learners juggle Whitaker's Words, Alpheios, Logeion, and Anki. Laesi will bring reading, lookup, SRS export, and progress tracking into one place.

CLTK (planned) →

Old English Soon

Englisc · extinct (11th c.)

The language of Alfred, of Beowulf, of the Anglo-Saxon Chronicle. CLTK integration will handle Old English's strong and weak declensions.

CLTK (planned) →

Middle English Soon

Middel Englissh · extinct (15th c.)

Middle English spelling is famously inconsistent. Laesi's surface-form handling will collapse variants so you study vocabulary, not orthographic accidents.

CLTK (planned) →

Don't see your language?

Add it yourself — Laesi's custom-language tool handles minority dialects, conlangs, and niche historical varieties. Or email us: if there's a GiellaLT, CLTK, spaCy, or Stanza analyzer we can wire in, we'll prioritize it.

Custom languages → Request a language