Thursday, February 19, 2026

warda

I would like to correct one of my previous statements: it is not entirely the case that I only like Columbo; I also like other things, for example Maltese and just about everything Umberto Eco has ever written. And so when I found out that a Maltese translation of The Name of the Rose had been published, all I could say was tace et cape pecuniam meam. A few weeks back I had the opportunity to escape the Central European cold and go to Malta. Naturally, one of my priorities was to obtain a copy of Isem il-Warda. And now that I hold it my hands (after having catalogued it and provided it with a protective cover), I am ... not really confident in the quality of the translation.

The first signs are right there on the cover, more specifically the back of it which contains a brief bio of the author. It informs us that 

Umberto Eco (1932-2016) twieled ġewwa Alessandria fil-Piemonte.
Umberto Eco (1932-2016) was born in Alessandria in Piedmont. 

I am of course not a native speaker of Maltese, but over the last sev... twel... ohmygodreally twenty-two years of my engagement with the language, I have developed a good feeling for it, and this phrase strikes me as strange. More specifically, it is the use of the preposition that is strange. As Stolz et al. (2017: 457) point out, Maltese exhibits toponymic zero marking, which is just a fancy way of saying that if you want to say something happened at a named place, you typically do not use a preposition. For example:

Dun Joe Caruana twieled il-Mellieħa nhar it-2 ta' Awissu, 1960.

Dun Joe Caruana was born in Mellieħa at noon on the 2nd of August, 1960. 

That this sentence also contains an adverbial of time without a preposition is just a happy coincidence I will look into later. For now, note that this being a human language, there is at least some variation, and so this is also perfectly good Maltese:

Patri Serafin twieled fix-Xewkija fit-23 ta' Awwissu tas-sena 1932.

Father Serafin was born in Xewkija on the 23rd of August of the year 1932.

There is a third option available, the preposition ġo also meaning 'in'. And it is a well-established option, as evident from the fact that it is featured in No. 69 from Ilg and Stumme's Maltesische Volkslieder (Leipzig 1909), p. 27. I am reproducing the verse in question in modern standard orthography.

ara x'ġara ġo Ħal-Qormi

look what happened in Ħal-Qormi 

Interestingly, ġo does not seem to be used with the verb twieled (or its feminine form twieldet). When I ran a corpus search on the two verb forms and extracted 300 random examples, only three options cropped up:

Preposition Count
ø 37
f' 42
ġewwa 3

This low frequency of ġewwa + NOUN_PROP supports my feeling and to check, I went to the local digital watercooler and asked native speakers. The vast majority of them shared my suspicion of it, some describing it as an Ingliżata, i.e. a calque from English (with a hint of negative sentiment conveyed by the suffix -ata). Many had never seen it and cried apage satanas itlaq ja xitan; some have pointed out that this is a feature of (assumingly bad) journalistic style - after all, the preposition actually means 'inside (of)' a 3-dimensional enclosed object. Corpus data partially bear these observations out: even if we just consider the two prepositions and the two most frequent toponyms - which, unsurprisingly, turn out to be the Maltese names for Malta and Gozo - ġewwa is by far the minority option:

f'Malta ġewwa Malta
    105.214 700 
f'Għawdex ġewwa Għawdex
    43.065 1.273 

So far so good. The text type (genre) analysis of the use of ġewwa with Malta, however, identifies a different culprit for this Ingliżata. As a shock to no one, it's the politicians who are responsible for this crime against the Maltese language.

Rank Text type Absolute frequency %
1 parliament debates 548 78.06%
2 newspaper 147 20.94%
3 non-fiction 6 0.85%
4 fiction 1 0.14%

Excellent, that all makes sense, so it's bad Maltese spread by politicians. And as with all such things, the origin of this abomination is in the influence of English where the function of the English 'in' was calqued onto the Maltese ġewwa. Done, dusted, all explained. Except...

You see, the corpus I have been using so far is one designed to cover the language of the first two decades of the 21st century (plus or minus). As such, it does not contain older works of literature, such as those written by Ġużè Muscat Azzopardi and Anton Manwel Caruana, where especially the latter is noted for his purism (i.e. exclusion of words of non-Arabic origin). I do have a corpus that includes these works and a few clicks and key presses later, I can confirm that ġewwa is used with toponyms in works from the late 19th century as well. Like this one from Muscat Azzopardi's 1881 Viku Mason:

Wara jumejn, kien magħluq ġewwa l-Imdina...

After two days, he was locked in Mdina...

Or this one from his 1909 Nazju Ellul (where the term il-Belt 'the city' refers to the location that only laws and tourist guides call 'Valletta').

Imma ġewwa l-Belt ġriet ix-xniegħa bejn in-nies tagħna...

But in Valletta the rumour spread among our people...

Recall that the original meaning of ġewwa is 'inside (of)', i.e. within a 3-dimensional enclosed object; in fact, ġewwa also doubles as an adverb with that very meaning. The prototypical noun to be used with ġewwa is 'house', 'building' or 'school'. And its use makes perfect sense in both examples above when you consider the physical nature of the locations: Mdina is a walled city located on a hill, while Valletta is on a peninsula with a single point of entry. Both can thus be viewed as 3-dimensional enclosed objects.

The same is not the case for the following example from Caruana's 1889 Ineż Farruġ. Here ġewwa is used with a name of a locality that is not surrounded by walls or the sea:

... kien ġej ma' missieru minn ġewwa r-Rabat...

... he was coming with his father from inside of Rabat...

And of course, in this case, we are not dealing with location, but rather movement from. The use of ġewwa is still at the very least redundant - minn itself would suffice - but maybe it serves to indicate that the character came from the center of the village and not, say, from some farmhouse on its outskirts.

Be that as it may, I am now much less confident that the use of ġewwa with toponyms can solely be blamed on bad Maltese spoken by politicians or the influence of English. The current use we see might very well be just the extension of use that was first limited to specific contexts (as with Rabat) or even specific locations (as with Valletta or Mdina). So maybe my scepticism regarding the quality of the translation of Eco's The Name of the Rose into Maltese is misplaced.

But then I opened the book to the first page. As I'm sure you remember, it begins with the author's description of how he came across the absolutely 100% totes very real manuscript, naturally, that he then somehow lost and now translates for us from his notes and memory - incidentally, a very popular trope in modern literature that turns out to have long history. This narrative is anchored by two dates: the Warsaw Pact invasion of Czechoslovakia and the date the book in question. This is what we read in the Maltese translation:

Fis-16 t'Awwissu 1969 xi ħadd għaddieli ktieb ta' awtur jismu abate Vallet, Le manuscrit de Dom Adson de Melk, traduit en français d'après l'édition de Dom J. Mabillon (Aux Presses de l'Abbaye de la Source, Paris 1982).

The Prague Spring and the subsequent invasion took place in 1968. Eco's book translated here came out in 1980.

Here is how Weaver's English translation renders this passage (emphasis mine):

ON AUGUST 16, 1968, I WAS HANDED A BOOK WRITTEN by a certain Abbé Vallet, Le Manuscrit de Dom Adson de Melk, traduit en français d’après l’édition de Dom J. Mabillon (Aux Presses de l’Abbaye de la Source, Paris, 1842). 

Considering that this is the first page, this does not bode well...

Thursday, February 12, 2026

ishoyahb

In the history of native Syriac linguistic tradition [1]Išoʕyahḇ Bar Malkōn (d. early 13th century) is the odd man out. It is not that he is unknown or forgotten: his grammatical works are preserved in a not insignificant number of manuscript copies and his name is listed with other grammarians in overviews of Syriac literature compiled by modern scholars, as well as his contemporaries. Of the latter, the testimony of ʕAbdīšōʕ Bar Brīḵā's (d. 1318) Catalogue of Books is particularly telling: where Eliya of Ṭirhan (d. 1049) and Yōḥanan Bar Zoʕbī (d. 13th century) are described as having composed grammars or grammatical treatises,  of Išoʕyahḇ Bar Malkōn and his grammatical works we only learn the following:

ܡܳܪܝ ܝܶܫܘܽܥܰܝܲܗܒ ܒܰܪ ܡܰܠܟܳܘܢ ܕܰܨܘܒܳܐ ܐܝܺܬ ܠܶܗ ܫ̈ܘܽܐܳܠܐ ܓܪܰܡܡܰܛܝܺܩܳܝܶܐ

"Mār Išoʕyahḇ bar Malkōn of Ṣōḇā [Nisibis]: he has some grammatical questions..."

Whether this refers to a specific genre, is meant to be read generally or anything else, that's it as far as grammar is concerned. This lack of specificity with regard to Bar Malkōn’s work as a grammarian is also typical for modern sources. When consulting one, the reader typically learns no more than that he authored at least one treatise on points and one grammar (both unedited) [2], and that in his grammatical analysis, he followed the Arabic model [3]. One prominent example is Baumstark who describes Bar Malkōn’s grammar as “sachlich ganz die Methode der arabischen Grammatik befolgend” (“in terms of content, it entirely follows the methodology of Arabic grammar”) [4]. Over time, this simple observation - repeated uncritically - morphed into a judgment and finally into a condemnation: Talmon notes of Išoʕyahḇ bar Malkōn – and his contemporaries (or fellow travelers) like Yōḥannan bar Zoʕbī and Eliya of Ṭirhan – that they “exhibit either a servile attitude to Arabic grammar or poor coverage of grammatical issues.” [5]

Talmon's "poor coverage" remark is particularly silly. For one, the comparison made here is to Jacob of Edessa's grammar of Syriac which is notorious for - not to put a too fine point on it - BEING ALMOST ENTIRELY LOST. Secondly, "poor coverage" is a relative term, even this day and age, doubly so in the 13th century. But most importantly, none of Išoʕyahḇ Bar Malkōn works have been edited or analyzed in any detail, so there is simply no way for Talmon to know.

In fact, that Talmon's (and, by extension, that of those whose judgment he relies on) assessment of Bar Malkōn is wholly wrong can be gleaned from even the most cursory of interactions with the latter's grammatical works. This applies especially to Bar Malkōn magnum opus, a grammar of Syriac titled Ktābā d-manhrānūṯā ba-gramaṭīqī sūryāytā/Kitāb al-ʔīḍāḥ fī naḥw as-suryānī (“Book of elucidation in Syriac grammar”, henceforth: Kitāb al-ʔīḍāḥ), extant in at least four manuscripts: 

  1. Paris BnF Syr. 262 (1v-112r; 16th century) 
  2. Paris BnF Syr. 370 (2r-96r; 1569) (olim Seert 101)
  3. Berlin SBB Ms. or. quart. 1050 (2v-106v; 17th century)
  4. Florence Laur. Or. 419 (1r-96r; 1589) 

Four notes on this list:

Firstly, Stadel's entry on Bar Malkōn in his recent edition of bar Brīḵā's Catalogue (Stadel 2025: 213) lists the Berlin manuscript as located in Tübingen (as does Van Rompay). This is consistent with Assfalg's catalogue, but not with the online catalogue of the Tübingen collections (which, however, contains a work called Bülbüliye, huh). I am reliably informed the manuscript is indeed in Berlin at the Stabi; in fact, this is where I consulted it a few hours ago.


Secondly, Stadel does not list BnF Syr. 262, [Added note X] which is understandable: this manuscript does not give any author and its title is also different, namely

 ܟܬܐܒ ܐܠܢܚܘ ܡܦܣܪ ܡܢ ܐܠܣܪܝܐܢܝ ܐܠܝ ܐܠܥܪܒܝ ܐܠܡܥܪܘܦ ܥܢܕ ܐܠܣܪܝܐܢ  ܓܪܰܡܰܛܺܝܩܺܝ ܬܘܳܪܰܣ ܡܰܡܳܠܐ ܝܥܢܝ ܬܨܚܝܚ ܐܠܟܠܐܡ 
"The  book of grammar translated from Syriac into Arabic known as 'Gramaṭīqī - Tūraṣ mamlō' which means 'Grammar - Correction of speech'". 

The term ܬܳܘܪܽܨ ܡܰܡܠ̱ܠܳܐ (note the correct Syriac spelling here) tūraṣ mamllō lit. 'correction of speech' is generally used to mean 'grammar' and so one finds it in titles of grammatical works modern and medieval; the lost gramar by Jacob of Edessa is reported to have born it. The phrase shows up even in the Syriac version of the title given by BnF Syr. 370 and SBB Ms. or. quart. 1050, although in those two, the first word is given as ܬܪܝܨܘܬܐ. The BnF catalogue refers to the work contained in BnF Syr. 262 as "Grammaire de la langue syriaque, divisée en quarante-cinq chapitres, par un auteur maronite". Why a maronite is a mystery; it could be because it is written entirely in garšūnī or just because it uses Serto. Regardless, even a cursory comparison of BnF Syr. 262 to BnF Syr. 370 makes it clear that they are the same work containing 46 (BnF Syr. 370 and SBB Ms. or. quart. 1050) or 45 (BnF Syr. 262) chapters. Also, 46 chapters on some 100 folios of 18-20 lines each? So much for "poor coverage".

Thirdly, Stadel adds Vat. Syr. 150 (200r-215v, 1709). This identification is clearly not correct - as should be evident from the number of folios - and likely the result of undue reliance on Baumstark (a common affliction in Syriac scholarship). Assemani's catalogue describes the manuscript as "Jesujabi Episcopi Nisibeni ... Quaestiones Grammaticae & aenigmaticae" and sure enough, this is our Išoʕyahḇ. Baumstark incorrectly assumes that this is the same work as the previous ones he lists, i.e. Seert 99 (now lost), Seert 100 (also lost) and Seert 101 (our BnF Syr. 370) [6]. This work, however, is in Syriac only; moreover, it is indeed a list of questions - maybe even the one Bar Brīḵā refers to - and not Kitāb al-ʔīḍāḥ.

And finally, I was made aware of the existence of the Florence manuscript by Margherita Farina (see also her article), to whom I hereby extend my gratitude. The text seems to be identical with that of BnF. Syr. 262, although interestingly, a colophon ascribes the authorship of the work to George (Gewārgīs) ʕAmīra, a Maronite scholar and bishop, the author of Grammatica Syriaca.

In summary, it may well be the case that there are not two, but three grammatical works written by Bar Malkōn:

  1. A treatise on points (BnF Syr. 369, 114v-125v; BnF Syr. 370, 174r-187v; London BL Add. 25,876, 276v-290v and likely many more, on which later). 
  2. Kitāb al-ʔīḍāḥ (see above)
  3. Grammatical questions (Vat. Syr. 150, 200r-215v)  

Turning back to the contents of the manuscripts of Kitāb al-ʔīḍāḥ, it is not the case – as Baumstark’s description (which most likely goes back to Scher's catalogue and which Van Rompay copies in his GEDSH entry on Bar Malkōn) would have it – that Kitāb al-ʔīḍāḥ is originally written in Syriac with a translation in Arabic in two columns ("... das syrische Original in einer Parallelkolumne mit einer arabischen Üb[er]s[etzung] ...") [7]. That is not true of any of the surviving mss Baumstark was aware of, i.e. the two Paris mss and the Berlin one.  Rather, the primary language of Kitāb al-ʔīḍāḥ is Arabic, but Syriac is employed throughout, in both examples and definitions of grammatical phenomena. Such Syriac text rarely constitutes a direct translation of any of the Arabic parts. As a rather straightforward example, consider this section from chapter 2 on parts of speech (BnF Syr. 370, fol. 9r-9v) with the Syriac portions highlighted in red (translation and numbered subsection division mine, underlined text is colored in the manuscript):


1

Chapter 2: On the division of parts of speech. Division of speech.

الباب الثابي في اقسام الكلام ܗ̄. ܦܘܲܠܓܲܐ ܕܡܡܠܠܵܐ


2

Among the Syrians, as well as the Arabs, speech is divided into three things: noun, verb and particle. That is, noun, verb and particle.

الكلام عند السريانيين والعرب. ينتظم من ثلثه اشيآ ܫܡܐ. ܘܡܸܠܬ݂ܐ ܗ̄ ܥܒ̣ܵܕܐ ܘܐܣܵܪܐ. ܗ̄ اسمٌ. وفعلٌ. وحرفٌ.

3

Some examples of nouns include: person, man, horse, mountain, command and similar.

فالسم نحو قولك ܒܪܢܫܐ. ܓܒܪܐ. ܣܘܣܝܵܐ. ܬܘܪܐ. ܐܸܡܪܐ. وما شاكل ذلك ܀

4

And know that everything that ends in an alif in the Syriac language is, for the most part, a noun. And a (word) that takes one of the four particles (lit. 'additions') BDWL BDWL is a noun.

واعلم ان كل ما اخره الف في لغه السريانيون فهو اسم علي الامر الاكثر وما يدخل عليه احدي الزوايد الاربع وهي بدول ܒܕܘܠ فهو اسم ܀

5

The definition of a noun among them [= Syriac grammarians]: sound with meaning that (is) without tense.

و حد الاسم عندهم ܀ ܩܠܐ ܡܫܘܕܥܵܢܐ ܒܫܠܡܘ̣ܬܐ ܕܠܵܐ ܙܲܒܢܵܐ. ...

6

And others define it as follows: the first part of speech designating a thing or an action.

ܐܚܪ̈ܢܐ. ܕܝܢ ܬܲܚܡܘ̣ܗܝ ܗܟܢܐ ܡܢܬܐ ܩܕܡܵܝܬܐ ܕܡܡܠܠܐ ܕܡܫܵܘܕܥܵܐ ܨܒ̣ܘ̣ܬ̣ܐ ܡܕܡ ܐܵܘ ܣܘܥܪܢܐ ܀

To be fair, the Arabic influence is indeed undeniable: it is clear, for example, from the division of the parts of speech into three classes (section 2), where the native Syriac linguistic tradition typically works with seven, i.e. the eight of Technē Grammatikē minus the definite article. I guess it is ok to be servile to the Greek model, although on the other hand, Bar Hebraeus divides his grammar into four treatises on, respectively, nouns, verbs, particles and orthography, so maybe he is servile to Arabic models as well... In any case, the influence of Arabic on Bar Malkōn's analysis is also evident from the choice of his examples: 'man' and 'horse', for example, are also given as examples of nouns in Sībawayh’s Kitāb.

The rest of the section, however, is anything but a servile copy of the Arabic method without any connection to the Syriac linguistic tradition. One such connection is the terminology: melṯā, his term for 'verb', is one that is well-established in the Syriac scientific terminology, though originally used as a translation for ῥῆμα in philosophical works. The term for 'particles', esārā, is also in common use in native Syriac linguistic tradition, although typically meaning 'conjuction', translating the Greek σύνδεσμος, both as a philosophical term, as well as the linguistic one (ch. 20 of Technē Grammatikē). Interestingly, Bar Hebraeus uses both terms in the same way Bar Malkōn does. [8]  These and other items of Syriac linguistic terminology occur all over Kitāb al-ʔīḍāḥ, both as a result of dealing with matters specific to Syriac (and not only such obvious things as vowel points), but especially due to the bilingual nature of the work. This of course requires Bar Malkōn not only to engage with the Syriac tradition, but also attempt to harmonize it with the Arabic linguistic framework and even make attempts at comparative linguistics.

The major way in which Kitāb al-ʔīḍāḥ is undoubtedly a part of the native Syriac linguistic tradition - as opposed to a mindless copy of the Arabic one - is Bar Malkōn’s constant references to the same and his insistence on working within it. The introduction (BnF Syr. 170, ff. 2v-4v) contains a brief overview of the previous work by Syriac grammarians and scholars of language, including Jacob of Edessa (d. 708), Eliya of Ṭirhan and Yawsep̱ Hūzāyā (6th cent.), the purported translator of Technē Grammatikē into Syriac. The text of Kitāb al-ʔīḍāḥ then repeatedly refers to their work (the "among them" in section 5 above and "others" in section 6) and cites them by name regularly. The chapter on parts of speech cited above also contains one very telling example in section 5, i.e. the absence of time as a major criterion for the definition of a noun. This line of reasoning is unique to Syriac linguistic tradition and can be traced to Aristotle, e.g. De Interpretatione. In contrast, Technē Grammatikē opts for a morphological/semantic definition (English translation). Now Arabic tradition is complicated, but it involves morphological and syntactic criteria; a simplified contemporary grammar uses a definition that is heavy on the morphology. True, so does Bar Malkōn’s own definition of a noun in section 4, treating the particles BDWL as morphological properties. But then again, this is a fact of Syriac, obvious to anyone with even a passing familiarity with the language. So servile attitude towards Arabic models or sensible analysis of one’s language? The latter definitely applies to the entirety of what BnF Syr. 370 calls chapter 47 (96v-173v), missing in BnF. Syr. 262 and SBB Ms. or. quart. 1050. [9] This chapter is sometimes treated as a separate work - or even genre - called De vocibus aequivocis, i.e. "On ambiguous words" - and contains a Syriac-Arabic glossary of homographs. None of this slavishly follows the Arabic model; in fact, the more I think about it, the more I am convinced that those who argue so have only ever read the section on parts of speech. The relationship of Bar Malkōn's analysis to the Arabic linguistic tradition reminds me of the way grammars of modern languages follow the Latin model: there is some, even a lot of inspiration, that may even be slavish now and then - just think of the concept of parts of speech and the terminological fustercluck that are Wolof conjugated pronouns. Latin method, however, is not all there is.

As noted above, Bar Malkōn's work remains unedited and unpublished - hell, this hastily put-together post might be the most comprehensive study of his work to date. If anyone wishes to change it, for example as an MA thesis (his short treatise on points would be perfect) or even a PhD dissertation, hit me up.


Friday, February 06, 2026

dobra

Hans Stumme (1864-1936) was a German linguist whose work is is probably known to anyone interested in Berber and North-African varieties of Arabic. Stumme travelled a lot and collected huge amounts of spoken data from - inter alia - Tunisians, Išelḥiyen and the Maltese. As far as I can tell, this is the more or less full list of his works containing such data:

Arabic

  1. Albert Socin and Hans Stumme. 1894. Der arabische Dialekt der Ho̮uwāra des Wād Sūs in Marokko. Hirzel, Leipzig. Text
  2. Albert Socin and Hans Stumme, editors. 1901. Diwan aus Centralarabien. B.G. Teubner, Leipzig. Text
  3. Hans Stumme, editor. 1893. Tunisische Märchen und Gedichte: Eine Sammlung prosaischer und poetischer Stücke im arabischen Dialecte der Stadt Tunis; nebst Einleitung und Übersetzung. Hinrichs, Leipzig. Vol. 1; Vol. 2
  4. Hans Stumme. 1894. Tripolitanisch-Tunisische Beduinenlieder. J. C. Hinrichs’sche Buchhandlung, Leipzig. Text
  5. Hans Stumme. 1896. Grammatik des tunisischen Arabisch nebst Glossar. Hinrichs, Leipzig. Text
  6. Hans Stumme, editor. 1898. Märchen und Gedichte aus der Stadt Tripolis in Nordafrika: Eine Sammlung transkribierter prosaischer und poetischer Stücke im arabischen Dialekte der Stadt Tripolis nebst Übersetzung, Skizze des Dialekts und Glossar. Hinrichs, Leipzig.
  7. Hans Stumme. 1915. Fünf arabische Kriegslieder des berühmten deutschen Kriegsfreiwilligen Fritz Klopfer: Tunisische Melodien mit arabischem und deutschen Text. Hinrichs, Leipzig. Text

Berber

  1. Hans Stumme. 1895a. Dichtkunst und Gedichte der Schluh. PhD Thesis, Zugl.: Leipzig, Univ., Habil.-Schr., 1895, Leipzig.
  2. Hans Stumme, editor. 1895b. Märchen der Schluḥ von Tázerwalt. Hinrichs, Leipzig. Text
  3. Hans Stumme. 1899. Handbuch des Schilhischen von Tazerwalt. Grammatik - Lesestücke - Gespräche - Glossar. Hinrichs, Leipzig. Text
  4. Hans Stumme, editor. 1900. Märchen der Berbern von Tamazratt in Südtunisien. Hinrichs, Leipzig. Text
  5. Hans Stumme. 1914. Eine Sammlung über den berberischen Dialekt der Oase Sîwe: Sitzung vom 12. September 1914. Teubner, Leipzig.

Maltese

  1. Bertha Kössler-Ilg and Hans Stumme, editors. 1909. Maltesische Volkslieder im Urtext mit deutscher Übersetzung. Hinrichs, Leipzig.
  2. Hans Stumme, editor. 1904a. Maltesische Märchen, Gedichte und Rätsel in deutscher Übersetzung. Hinrichs, Leipzig. Text
  3. Hans Stumme. 1904b. Maltesische Studien: Eine Sammlung prosaischer und poetischer Texte in maltesischer Sprache, nebst Erläuterungen. Hinrichs, Leipzig.
It is quite clear that Stumme was particularly interested in collecting folk literature, such fairytales and songs, where his books remain an invaluable source of data for folklorists. At the same time, Stumme's work is extremely valuable for the study of the languages involved, since Stumme expended an enormous amount of effort on meticulously capturing the phonology of the varieties he studied. As a result, his work is regularly used by those studying the varieties he covered, in some cases being an object of study in itself.
 
This applies doubly to Maltese where there have been at least two major studies of the fairytales (1, 2). As far as I can tell, there is little focus in reevaluating Stumme's dialectological work (but that might change soon), which is a shame, because there is so much fascinating stuff in there. Like for example song no. 70 from the collection of Maltese songs (Kössler-Ilg and Stumme 1909, p. 27). I am reproducing the text below in standard Maltese orthography and Stumme's original German translation accompanied by my English one based on the Maltese text.

Ta' dobra sejrin jsiefru

kemm iħallu qlub miksura!

Kif ħarġu mill-port 'il barra,

tathom qalbhom, "erġgħu lura!"


Die Slawen wollen abreisen,

wie viele gebrochene Herzen lassen sie hier zurück!

Als sie aus dem Hafen hinausgefahren waren, 

gab ihnen ihr Herz ein: "Kehrt wieder um!"


The Slavs are about to leave,

how many broken hearts they leave behind!

As they left the port,

their hearts gave out, "Come on back!" 

A note here: the phrase tathom qalbhom is a bit of a mystery. It does bring to mind the idiom qata' + IO qalb + POSS 'be discouraged, loose faith', but the morphology does not make sense: the verb is PAST.3SGF - which works, since qalb 'heart' is feminine - and the noun bears the 3PL possessive marker. The -hom in tathom looks like the direct object (P argument) marker, but semantically it designates the recipient (R argument), so we have the IO component here. But then again, the form tat definitely looks like PAST.3SGF of ta 'to give' which recalls the idiom ta + DO ras + POSS 'to panic'. So maybe there is an entire class of such idioms to which ta + DO qalb + POSS belongs, I will need to look into that.

But that is not why we are here. We are here for the multi-word expression in bold that Stumme translates as the ethnonym "Slavs". The composition of the expression is clear: the element ta' is what Arabic dialectology refers to as genitive exponent, i.e. possession marker, the equivalent of 'of'. In North African varieties, it usually takes the form mtāʕ/ntāʕ etc., the apostrophe at the end of ta' is what remained of ʕ in Maltese. ta'  (or tal- with a definite article) + NOUN is how Maltese creates group names: ta' Lejber 'Labourists', tal-PN 'nationalists (lit. of Partit Nazzjonalista)" are perhaps the most prominent examples. Similarly, in a version of the Maltese translation of Bandiera rossa, the first verse goes Tal-pinna o ħutna, ukoll tal-mazza where pinna is 'pen' and mazza is 'sledgehammer', the two expressions meaning 'intellectuals' and 'workers'.

What the of the dobra? That is quite simple; as Stumme himself puts it on p. 11, we're dealing with "die Leute, die immer dobra 'gut!' sagen" ("the people who always say dobra 'good!'"). That we do so and that we are perceived as such I can attest to from personal experience, recalling for example an Albanian lady in a B&B in Italy who upon learning that I am Slovak went "Oh you are one of the dobre dobre people!" That this is also how the Maltese thought of us back in the late 19th century is fascinating. Now the question remains which Slavs are these, since the general adverb of agreement usually takes the form dobre/dobro. The only language I can think of where people use a form with an [a] at the end is Czech, but there the vowel is long and considering the geography of the region, it is more likely that Maltese would encounter South Slavs. So probably not Czechs and definitely not the Polish or Slovaks, otherwise it would either be ta' dopxe or, of course, ta' kurva.

Monday, February 02, 2026

work

So, anyway, been a while, right? How have y'all been the last *checks notes* few years? Yeah, I know,  interesting times... How about instead of focusing on that shit, I show you what I have been up to since 2015 or so. Let's start with some of the projects I have been working on that you might find interesting.

 

HunaynNET

Named after Ḥunayn ibn Isḥāq (807-873), a physician and prolific translator of classical philosophical and scientific literature, this ERC-supported project collected all texts of classical science that were translated into both Syriac and Arabic. The translations were then re-edited and aligned on the level of semantic and syntactic units that... Well, they are not quite sentences, but we tried to keep them as small as possible. The text is also tokenized and links to dictionaries and corpora are provided; and in some cases, we also provided aligned text of translations into modern languages. Although I did contribute to the editions, my main job was processing the data and building the interface. I know, I know, it now needs some updates, especially when it comes to the aforementioned links to various dictionaries, chief among them the Glossarium Graeco-Arabicum which is thankfully now back online, completely rebuilt. This project is a wonderful resource not just for those interested in the philosophical and scientific exchange between the East and the West, but also those learning/studying any/all of the languages involved.

 

Simtho

Despite its tagline "The Syriac Thesaurus" (it's a, um, user-friendliness thing), this is an electronic corpus - the only one worthy of the name - of the Syriac language. It contains ~25 million words and represents roughly 95% of all literature in Syriac. It is largely based on printed editions, although a few manuscripts snuck in. Simtho (a Syriac word meaning "treasure" pronounced according to West Syriac conventions) is the product of thousands of hours of work by hundreds of people assembling metadata, scanning books and checking OCRd texts. Simtho is the largest project run at Beth Mardutho led by the indomitable George Kiraz and it is done without any major grants or other financial support from research agencies or governments; a bootleg operation is what George calls it.  My job is being the last link in the chain, i.e. setting up and managing the entire processing pipeline, as well as the server(s) and all the software on it. This includes the installation (and its customization, for George has many ideas) of NoSketch Engine on which the whole corpus runs. In addition to that, I have been doing some language modelling and annotation, on which perhaps later. One by-product of the work on Simtho is a set of OCR models for the recognition of Syriac printed text. These models are trained using the open source Kraken platform and available on Zenodo.

 

Zoroastrian Middle-Persian Corpus and Dictionary

This DFG-supported ongoing project seeks to collect, annotate and analyze all available Zoroastrian texts written in Middle Persian to create a searchable corpus (in transcription) and finally an updated dictionary of Middle Persian. I was largely responsible for data processing, conversion and import, so none of what you see online is my work. The web application is still very much a work in progress, but once finished, it will be a one-stop shop for all your Zoroastrian Middle Persian needs, including manuscript images and comprehensive lexical resources.