Disyllabic roots in greyfolk language, 2: creating the database

In my end of February report, I mentioned that I was able to create a database to help me create disyllabic roots while preserving Hamming distance. One big part of that was devising a macro for Excel that would do the following:

Range A is a bank of possible (according to the rules of my language) disyllabic roots. I generated this using Zompist’s Gen.
Range B is where I input the roots I have chosen.
Range C is a bank of roots that conflict with the roots in Range B. I also generated this using Gen with some really roundabout tricks.
Rule 1: If a cell appears in Range A and Range C, it is highlighted yellow.
Rule 2: If a cell appears in Range A and Range B, it is highlighted green.
Rule 3: If a cell appears in Range B more than twice, it is highlighted red.
Rule 4: If I a cell appears in Range B and Range C, it is highlighted red.
Rule 5: Otherwise, a cell should not be highlighted.

Thus, any white cells in Range A were roots that could still be used because they didn’t conflict with anything else. Even though the highlighting was all done by a macro, there was still a significant portion of manual work that took a few hours. Much more time went into figuring out how to get the most efficient set of disyllabic roots. By that, I mean that I had to figure out how to get as many roots as possible that didn’t conflict with each other from the total bank of possible roots. It always comes back to Hamming distance!

[su_spoiler title=”Macro for disyllabic roots database” open=”no” style=”default” icon=”plus” anchor=”” class=””]

Sub HighlightDuplicates()

    'Keyboard Shortcut: Ctrl+Shift+D

    Application.ScreenUpdating = False

    Dim ws As Worksheet, t0 As Single, t1 As Single
    Set ws = ThisWorkbook.Sheets("Database")
    t0 = Timer

    'Rule 5: Otherwise, a cell should not be highlighted.
    ws.cells.Interior.Color = xlNone

    Const RANGE_A As String = "B1:E2300"
    Const RANGE_B As String = "G1:G2300"
    Const RANGE_C As String = "I1:AH2300"

    Dim dictA As Object, dictB As Object, dictC As Object
    Set dictA = CreateObject("Scripting.Dictionary")
    Set dictB = CreateObject("Scripting.Dictionary")
    Set dictC = CreateObject("Scripting.Dictionary")

    Call buildDict(dictA, ws.range(RANGE_A))
    Call buildDict(dictB, ws.range(RANGE_B))
    Call buildDict(dictC, ws.range(RANGE_C))

    'Rule 1: If a cell appears in Range A and Range C,
    'I want them highlighted yellow.
    'Rule 2: Then, if a cell appears in Range A and Range B,
    'I want them highlighted green.
    
    Dim cell As range, key As String
    For Each cell In ws.range(RANGE_A)
        If Len(cell.Value) > 0 Then
            key = CStr(cell.Value)
            If dictC.exists(key) Then cell.Interior.Color = vbYellow
            If dictB.exists(key) Then cell.Interior.Color = vbGreen
        End If
    Next

    For Each cell In ws.range(RANGE_C)
        If Len(cell.Value) > 0 Then
            key = CStr(cell.Value)
            If dictA.exists(key) Then cell.Interior.Color = vbYellow
        End If
    Next

    For Each cell In ws.range(RANGE_B)
        If Len(cell.Value) > 0 Then
            key = CStr(cell.Value)
            If dictA.exists(key) Then cell.Interior.Color = vbGreen
        End If
    Next

    'Rule 3: Then, if a cell appears in Range B more than twice,
    'I want them highlighted red.
    'Rule 4: Then, if a cell appears in Range B and Range C,
    'I want them highlighted red.

    For Each cell In ws.range(RANGE_B)
        If Len(cell.Value) > 0 Then
            key = CStr(cell.Value)
            If dictB.exists(key) Then
                If dictB.Item(key) > 1 Then
                    cell.Interior.Color = vbRed
                End If
            End If
        End If
    Next

    For Each cell In ws.range(RANGE_B)
        If Len(cell.Value) > 0 Then
            key = CStr(cell.Value)
            If dictC.exists(key) Then
                If dictC.Item(key) > 0 Then
                    cell.Interior.Color = vbRed
                End If
            End If
        End If
    Next

    For Each cell In ws.range(RANGE_C)
        If Len(cell.Value) > 0 Then
            key = CStr(cell.Value)
            If dictB.exists(key) Then
                If dictC.Item(key) > 0 Then
                    cell.Interior.Color = vbRed
                End If
            End If
        End If
    Next
    
    t1 = Timer
    
    'MsgBox "Completed in " & Int(t1 - t0) & " seconds"

    Application.ScreenUpdating = True

End Sub

[/su_spoiler]

You can click on the plus icon or the name to expand that section to see the macro. It’s a biggun, so I decided that it was better to default to it being collapsed and not immediately assaulting any eyeballs.

Now, to even generate Range A, as I said, I used Zompist’s Gen tool. Making rules to create all possible disyllabic roots in greyfolk language was easy.

[su_spoiler title=”Categories for all possible disyllabic roots” open=”no” style=”default” icon=”plus” anchor=”” class=””]

C=mnptksylh
S=yl
A=a
T=mnl

[/su_spoiler]

[su_spoiler title=”Rewrite rules for all possible disyllabic roots” open=”no” style=”default” icon=”plus” anchor=”” class=””]

hl|h
hy|h
lh|l
ll|l
yy|y
mh|m
mm|m
nh|n
nn|n

[/su_spoiler]

[su_spoiler title=”Syllable types for all possible disyllabic roots” open=”no” style=”default” icon=”plus” anchor=”” class=””]

CACA
CASA
SACA
SASA

CSACA
CSASA
CATCA
CATSA
SATCA
SATSA
CACSA
CACAT
CASAT
SACSA
SACAT
SASAT

CSATCA
CSATSA
CACSAT
SACSAT
CSACSA
CSACAT
CSASAT
CATCSA
CATCAT
CATSAT
SATCSA
SATCAT
SATSAT

CSACSAT
CATCSAT
SATCSAT
CSATCSA
CSATCAT
CSATSAT

[/su_spoiler]

Of course, the output type was for all possible syllables.

To generate Range C in Gen, I had to figure out some really roundabout tricks, and, even then, I still had to generate it one chunk at a time. Think of each disyllabic root as a STUVWXYZ map where each letter corresponds to a phoneme. S and W are the first position of their respective syllable and can be «m, n, p, t, k, s, y, l, h». T and X are the second position of their respective syllable and can be «y, l» or ‘-‘. U and Y are the third position of their respective syllable, but, when working with roots, both of them are always «a». Finally, V and Z are the fourth position of their respective syllable and can be «m, n, l» or ‘-‘. For example, the root «myaman» would be ‘mya-m-an’ because the fourth position of the first syllable and the second position of the second syllable are open. If you look at the above Gen rules, it should be clear that STUVWXYZ is essentially CSATCSAT.

Then, to figure out the roots that wouldn’t be compatible with other roots, which is what Range C is, I had to switch the process for ABCDEFGH. Each letter corresponds to the letter in the same position in STUVWXYZ, but each letter of ABCDEFGH take the values of phonemes that do not have enough Hamming distance from those in STUVWXYZ. So, if S=m, then A=mnp because «m, n, p» conflict with «m» according to my Hamming distance parameters. Oh, I also set a meaningless Q=xxxxxx so I could see the separation between ABCDEFGH and STUVWXYZ at a glance.

[su_spoiler title=”Categories for «myaman»” open=”no” style=”default” icon=”plus” anchor=”” class=””]

A=mnp
B=yl
C=a
D=mnl
E=mnp
F=yl
G=a
H=mnl
Q=xxxxxx
S=m
T=y
U=a
V=-
W=m
X=-
Y=a
Z=n

[/su_spoiler]

[su_spoiler title=”Syllable types for conflicting roots” open=”no” style=”default” icon=”plus” anchor=”” class=””]

ATUVWXYZ
SBUVWXYZ
STCVWXYZ
STUDWXYZ
STUVEXYZ
STUVWFYZ
STUVWXYH

[/su_spoiler]

Of course, the output type was for all possible syllables.

[su_spoiler title=”Output for «myaman»” open=”no” style=”default” icon=”plus” anchor=”” class=””]

mla-m-an
mya-m-al
mya-m-am
mya-m-an
mya-mlan
mya-myan
mya-n-an
mya-p-an
myalm-an
myamm-an
myanm-an
nya-m-an
pya-m-an

[/su_spoiler]

From there, it was a matter of removing the dashes. However, an interesting question popped back up. What is the Hamming distance between something like ‘mya-m-an’ and ‘myamm-an’? What about ‘myamh-an’? I went ahead and decided that they were all equivalent (which is actually why ‘myamh-an’ doesn’t generate as I had already taken that into consideration). Because it was a problem I had faced before, I knew how to deal with it. However, another question popped up. What is the Hamming distance between something like ‘myamy-a-‘ and ‘mya-mya-‘? Same thing, I ended up deciding that they were the same as well. Though, they do have enough Hamming distance between them. I just wanted to simplify things and continue to get rid of roots that sounded too alike.

However, I did get some conflicts that weren’t actually conflicts. On the surface, «katya» and «kalya» might seem like they conflict because «t» and «l» conflict in the same position. However, «katya» is «k-a-tya-» (pronounced /ka.tja/) and «kalya» is «k-aly-a-» (pronounced /kal.ja/). So, the «t» and the «l» aren’t actually in conflicting positions.

Anyway, I had to run every planned disyllabic root through Gen, put the output in the database, then format it and remove dashes. It took quite a bit of time to do that manually for over 166 roots—yes, it was more than 166 because I had to add other altered roots at the cost of others after having already processed roots that had to be removed. More than anything else, it was just really repetitive and boring, and I pushed myself so hard during this entire process that I ended up really fried, stressed, and anxious. Oops!

In the next part, I will finally reveal the disyllabic roots!

Disyllabic roots in greyfolk language, 2: creating the database

Comments

Leave a Reply Cancel reply