LibreMonk

joined 8 months ago
[–] LibreMonk@linkage.ds8.zone 1 points 3 months ago* (last edited 3 months ago)

This is the decode function if anyone is interested:

decoded_reference()

decoded_reference()
{
    local yr_msd=${1:0:1}
    local yr_lsd=${1:1:1}
    local seq_enc_msd=${1:3:1}
    local seq_enc_lsd=${1:4:1}
    local seq_msd=${lookup_table_reverse[$seq_enc_msd]}
    local seq_lsd=${lookup_table_reverse[$seq_enc_lsd]}
    local seq_msd_index=$(typeset -p symbolset | grep -oP '[0-9]+(?=]="'"$seq_msd"'")')
    local seq_lsd_index=$(typeset -p symbolset | grep -oP '[0-9]+(?=]="'"$seq_lsd"'")')
    local seq=$((seq_msd_index * ln_symbolset + seq_lsd_index))
    local yr_msd_index=$(typeset -p symbolset | grep -oP '[0-9]+(?=]="'"$yr_msd"'")')
    local yr_lsd_index=$(typeset -p symbolset | grep -oP '[0-9]+(?=]="'"$yr_lsd"'")')
    local yr=$((ln_symbolset * ln_symbolset * 2 + yr_msd_index * ln_symbolset + yr_lsd_index)); # warning: the “2” is a dangerous hard-coding! Hopefully that bug manifests after I am dead

    printf '%s\n' "${yr}-$seq"
};#decoded_reference

[–] LibreMonk@linkage.ds8.zone 1 points 3 months ago* (last edited 3 months ago) (1 children)

I probably need a perfect hash function. This code seems to do the job:

encoded_reference()
{
    local -r yr=$1
    local -r seqno=$2
    
    local -ar symbolset=(a b c d e f g h   j k   m n   p q r s t u v w x y z     2 3 4 5 6 7 8 9)
    local -a seedset=("${symbolset[@]}")
    local -r ln_symbolset=${#symbolset[@]}; # 31
    local ln_seedset=${#seedset[@]}
    local -A lookup_table=()

    for sym in "${symbolset[@]}"
    do
        pos=$((50 % ln_seedset)); # 50 is just an arbitrary static number
        lookup_table+=(["$sym"]=${seedset["$pos"]})
        seedset=(${seedset[@]/${seedset[$pos]}}); # remove used elements from the seedset
        ln_seedset=${#seedset[@]}
    done
    
    local yr_enc=${symbolset[$(((yr / ln_symbolset) % ln_symbolset))]}${symbolset[$(($yr % ln_symbolset))]}
    local most_sig_fig=$((seqno / ln_symbolset))
    local least_sig_fig=$((seqno % ln_symbolset))
    
    # caution: if the seqno exceeds ln_symbolset², this calculation is out of range
    local seq_enc=${lookup_table[${symbolset[$most_sig_fig]}]}${lookup_table[${symbolset[$least_sig_fig]}]}
    
    printf '%s\n' "answer → ${yr_enc}-$seq_enc"
};#encoded_reference

for yr in 2024 2025 2026
do
    for seqno in {1..20}
    do
        encoded_reference "$yr" "$seqno"
    done
done

outputanswer → js-wy answer → js-w2 answer → js-w4 answer → js-w6 answer → js-w8 answer → js-wa answer → js-wd answer → js-wg answer → js-wk answer → js-wp answer → js-ws answer → js-wv answer → js-w3 answer → js-w9 answer → js-we answer → js-wm answer → js-wt answer → js-w5 answer → js-wf answer → js-wr answer → jt-wy answer → jt-w2 answer → jt-w4 answer → jt-w6 answer → jt-w8 answer → jt-wa answer → jt-wd answer → jt-wg answer → jt-wk answer → jt-wp answer → jt-ws answer → jt-wv answer → jt-w3 answer → jt-w9 answer → jt-we answer → jt-wm answer → jt-wt answer → jt-w5 answer → jt-wf answer → jt-wr answer → ju-wy answer → ju-w2 answer → ju-w4 answer → ju-w6 answer → ju-w8 answer → ju-wa answer → ju-wd answer → ju-wg answer → ju-wk answer → ju-wp answer → ju-ws answer → ju-wv answer → ju-w3 answer → ju-w9 answer → ju-we answer → ju-wm answer → ju-wt answer → ju-w5 answer → ju-wf answer → ju-wr

This is close to ideal, but I just thought of another problem: what if a year-seq pair were to derive an encoded number like “fy-ou” or “us-uk” or “sh-it”? A bias that nearly ensures a digit is used would help avoid generating offending words. But I guess I’m getting well into over-engineering territory.

[–] LibreMonk@linkage.ds8.zone 1 points 3 months ago* (last edited 3 months ago)

That is certainly a winner from the standpoint of code simplicity. And it’s trivially reversible. But I’m also prioritizing simplicity for human recipients above code simplicity. Base64 output is case sensitive and someone writing back and referencing a ref number would not necessarily preserve case. It’s also intolerant of human errors like confusing a “1” for a “l”.

(edit) I think base32 would avoid the case sensitivity problem. So here’s a sample:

for seq in {1..60}; do printf '%s → ' 2024-"$seq"; printf '%s\n' 2024-"$seq" | base32 | awk '{print tolower($1)}' | sed 's/=//g'; done

output:

2024-1 → giydenbngefa
2024-2 → giydenbngifa
2024-3 → giydenbngmfa
2024-4 → giydenbngqfa
2024-5 → giydenbngufa
2024-6 → giydenbngyfa
2024-7 → giydenbng4fa
2024-8 → giydenbnhafa
2024-9 → giydenbnhefa
2024-10 → giydenbngeyau
2024-11 → giydenbngeyqu
2024-12 → giydenbngezau
2024-13 → giydenbngezqu
2024-14 → giydenbnge2au
2024-15 → giydenbnge2qu
2024-16 → giydenbnge3au
2024-17 → giydenbnge3qu
2024-18 → giydenbnge4au
2024-19 → giydenbnge4qu
2024-20 → giydenbngiyau
2024-21 → giydenbngiyqu
2024-22 → giydenbngizau
2024-23 → giydenbngizqu
2024-24 → giydenbngi2au
2024-25 → giydenbngi2qu

[–] LibreMonk@linkage.ds8.zone 1 points 3 months ago* (last edited 3 months ago)

The “js” example is just to encode the year which is a prefix to the encoded sequence number. So if 2024 gives “js”, then ref numbers would look like this:
js-aa
js-ab
js-ac
…etc.

And I do not reset the counter at the beginning of the year. So 2025 would be like:

jt-ad
jt-ae
jt-af
…etc.

(update)

Rereading, maybe I misunderstood - would the full string include the date? so 2024-js as a complete example?

Yes, but note that “js” in my example was for an encoding of the year, which helps shrink the reference number and mask the fact that the 2nd token is a sequence.

 

TL;DR → The main problem is coming up with a way to reorder an array non-randomly but without introducing bulky code. Like the effect of shuffling a deck of cards in a deterministic cheating way.


Full background:

I would like to generate reference numbers for letters sent via postal mail. An sqlite db is used to track the sequence numbers (but not the reference numbers). This is the bash code I have so far:

typeset -a symbolset=(a b c d e f g h   j k   m n   p q r s t u v w x y z     2 3 4 5 6 7 8 9)
ln_symbolset=${#symbolset[@]}; # 41 is the answer, not 42
itemseq=$(sqlite3 ltr_tracking.db "select max(counter) from $tbl;")
printf '%s\n' "next letter reference number is: $(date +%Y)-${symbolset[$((itemseq / ln_symbolset))]}${symbolset[$((itemseq % ln_symbolset))]}"

An array is defined with alphanumeric symbols, taking care to eliminate symbols that humans struggle to distinguish (e.g. 1l0o). Then integer div and mod operations produce a two character number which is then prefixed with the year. So e.g. 2024-aa. Just two chars gives more numbers than would ever be generated in one calandar year.

This code mostly satisfies the need. But there’s a problem: a recipient who receives two letters can easily realise how many letters were sent in the time span of the two letters they receive. Most numbers will start with “a” “b” or “c”.

I do not need or want a cryptographic level of security which then leads to ungodly 16 byte numbers. Simplicity¹ is far more important than confidentiality. Just a small tweak to stifle the most trivial analysis would be useful.

One temptation is to simply manually mix up the order of chars in the symbolset array, hard-coded. But then that makes the code less readible. So I probably need to create a 2nd array “symbolseq” which arbitrarily unorders the symbolset array. I say arbitrary and not random because the sequence must be deterministic and static from one execution to the next.

An associative array is one idea:

typeset -A symbolset_lookup_table=(
[a]=k
[b]=3
[c]=s
…

I’m just slightly put off by the fact that it’s not readily evident that the RHS values are all used from the same set as the LHS keys exactly once.

I should probably encode the year as well. This would give a two char year:

printf '%s ' "$(((2024/41) % 41))" "$((2024 % 41))" "→ ${symbolset[$(((2024 / 41) % 41))]}" "${symbolset[$((2024 % 41))]}"

output:
8 15 → j s

(edit)
All the calculations must be easily reversible so a ref number can be converted back into a sequence number for DB queries.

¹ simplicity in both the code and in the numbers generated.

[–] LibreMonk@linkage.ds8.zone 1 points 4 months ago* (last edited 4 months ago)

Your client would make a difference. What you are probably seeing is the mirrored version of !tex@lemmy.sdfeu.org on lemm.ee. You cannot possibly be interacting with a non-existent community. If I post to https://linkage.ds8.zone/c/tex@lemmy.sdfeu.org, then I don’t suppose you would see it on https://lemm.ee/c/tex@lemmy.sdfeu.org.

(edit) just saw your test msg. Well, that’s interesting. Even though !tex@lemmy.sdfeu.org no longer exists, it seems the mirrored versions of it can still collaborate. I’m not sure how that works.

[–] LibreMonk@linkage.ds8.zone -1 points 4 months ago* (last edited 4 months ago)

If i build a shitty house and it collapses, I own it, I don’t write a manifesto about how it’s all lumber’s fault.

If you sell the house in a high-pressure sales tactic way (“buy in the next 5 min or deal is off the table”) and deny inspection to the buyer before it collapses, that would be as close as this stupid analogy can get to the JS scenario.

As does FOSS C

Nonsense. As you were told, C is not dynamically fetched and spontaneously executed upon visiting a website.

do you install linux from the source tree and build everything yourself? no, you download an .iso, so you are bound to the whims of the OS maintainer,

Nonsense. Have a look at gentoo. You absolutely can build everything from source. You can inspect it and you can also benefit from the inspection of others. Also, look into “reproduceable builds”.

Literally every JS package I’ve ever used does this.

Nonsense. The web is unavoidably littered with unpublished JS that’s dynamically fetched every time you visit the page.

[–] LibreMonk@linkage.ds8.zone -1 points 4 months ago* (last edited 4 months ago) (2 children)

they attribute buggy sites to the company, not the underlying language (rightly so)

Precisely my point. Recall what I wrote about conflict of interest. I’m not talking about a problem of the language syntax and semantics. I’m talking about JavaScript products (in the mathematical sense of a product not in the commercial sense; the code artifacts, iow).

JS runs client side and you can see what scripts are downloaded and running

That does nothing to remedy the conflict of interest. They can also push obfuscated JS but that’s beside the point. The problem is users are not going to review that code even the first time they visit a site, much less every single time due to the nature of dynamically re-fetching the code every single time you visit a page. Even if some OCD nutty user had that level of motivation, they do not benefit from the reviews of others because the code is not being reviewed from a static centralised space. Your idea that software freedom will somehow escape the conflict of interest problem is nonsense. A site admin can do whatever they want to the code to serve themselves and you end up with users running code that is designed to serve someone else.

So open source projects written in C benefit the user, but open source projects written in JS do not?

FOSS C projects hard and fast benefit the user because of the distribution of the code. We do not fetch a dynamically changing version of unreviewable unverified C code every time we visit a website. Distribution of C code is more controlled than that.

FOSS JS depends on how it’s distributed. Someone can write JS in their basement with no public oversight, license it to pass the LibreJS plugin test, and technically it’s FOSS but because of how it’s reviewed and distributed the benefits are diminishing. If the FOSS JS is in a public repo and statically downloadable (e.g. electronmail), then the conflict of interest is removed and the code is static (not fetched on-the-fly upon every execution which escapes a QA process).

Electronmail demonstrates FOSS JS that avoids the conflict of interest problem but that’s exceptional. That’s not how most JS is distributed. Most JS is distributed from a stakeholder, thus presents a conflict of interest.

 

I was thinking about the problem with JavaScript and the misery it brings to people. I think I’ve pinned it down to a conflict of interest.

Software is supposed to serve the user who runs it. That’s the expectation, and rightfully so. It’s not supposed to serve anyone else. Free software is true to this principle, loosely under the FSF “freedom 0” principle.

Non-free software is problematic because the user cannot see the code. The code only has to pretend to serve the user while in reality it serves the real master (the corporation who profits from it).

JavaScript has a similar conflict of interest. It’s distributed by the same entity who operates API services -- a stakeholder. Regardless of whether the JS is free software or not, there is an inherent conflict of interest whereby the JS is produced by a non-user party to the digital transactions. This means the software is not working for the user. It’s only pretending to.

[–] LibreMonk@linkage.ds8.zone 1 points 4 months ago (2 children)

I’m not sure what that is. vger.social and voyager.lemmy.ml don’t seem to have anything relevant. But I found !tex_typesetting@lemmy.sdf.org.

 

I just started using the LaTeX community (!tex@lemmy.sdfeu.org). Sad to see it go.

update


Just noticed it’s back up, but there are no communities. That’s bizarre. So if someone not on lemmy.sdfeu.org were to post to !tex@lemmy.sdfeu.org, I guess it’d still be like a ghost node because the post would have nowhere to go on the hosting node.