this post was submitted on 13 Aug 2024
11 points (100.0% liked)

techsupport

2466 readers
32 users here now

The Lemmy community will help you with your tech problems and questions about anything here. Do not be shy, we will try to help you.

If something works or if you find a solution to your problem let us know it will be greatly apreciated.

Rules: instance rules + stay on topic

Partnered communities:

You Should Know

Reddit

Software gore

Recommendations

founded 1 year ago
MODERATORS
 

Nursing student here!

So we get a shit load of reading assignments, and since everything's digital nowadays, I've been leaning a lot on text-to-speech software that effectively converts reading assignments to listening assignments.

The problem is textbooks have a LOT of just... noise. Every image has something like "FIGURE 13.5 SURGICAL DISASTERS!" "FIGURE 13.6 YOU GOT SUMMONED TO COURT!" etc. In-text citations are EVERYWHERE, copyright info is EVERYWHERE... reading the content, you just skip over all that crap, but pasting it into a TTS service, all that trash gets spoken aloud and adds up to a huge time sink every chapter, and distracts from the actual lesson.

Googling it, the best I've been able to come up with is doing a find and replace in MS word for things like FIGURE **.*^13 with wildcards on and the replace field blank... but it's not very consistent - sometimes it works, sometimes not. Same with nuking parenthesis and the text within with \(*\)

All that said, I'm wondering if I'm approaching this wrong by using MS word in the first place. Would be absolutely amazing if I could save all the commands on standby, then run them at the same time. By end of the school program, we're talking like 100 chapters from multiple books, so anything that lets me just nuke huge batches of BS as quickly as possible and dive right into the listening would be a godsend.

Thanks all!!

top 9 comments
sorted by: hot top controversial new old
[–] LANA_DEL_KARENINA@lemmy.world 9 points 3 months ago (1 children)

The good news: there is a tool built to solve this exact problem: regular expressions (aka regex)

The bad news: regular expressions are famously frustrating to read and write

Depending on how badly you want the problem solved and how patient you are, using online resources to craft some regular expressions would be the ticket

[–] Sterile_Technique@lemmy.world 4 points 3 months ago (1 children)

hmmm "famously frustrating", presumably to people who know what they're doing, very likely translates to "WAY outside of my skill level". Worth some digging though, especially now that I have a keyword! Thank you!!

[–] BearOfaTime@lemm.ee 1 points 3 months ago (1 children)

There are regex tutorials online, and you can test your regex there.

I'd say, since you're learning, this could be an opportunity that may be useful later.

Just start with one relatively simple thing, like maybe copyright stuff. Work on getting regex to match that properly throughout a doc, and enjoy the improvement. Then when ready, tackle the next thing.

[–] Sterile_Technique@lemmy.world 1 points 3 months ago* (last edited 3 months ago)

I wish I had asked this sooner. I don't know really any code at all, but this might be the thing that pushes me to learn some. This looks crazy useful. Time is the enemy right now though - I've only got a few free evenings left before class starts, and I don't trust that I'd know it well enough not to shoot myself in the foot.

When the next break rolls around though, I think regex will be my project. Any foundation you'd recommend learning first? From the bit of searching I've done, regex seems to feed straight into conversations about Python or Java - I don't know any of that. Would it even make sense to try to learn regex without first knowing the basics of a coding language?

 

I did manage to fine-tune MS Word's find and replace commands... I've got a list of 10 or so find-and-replace searches that does close-enough-for-now to what I want it to do.

[–] theneverfox@pawb.social 2 points 3 months ago (1 children)

If you want to go down the regex route, I highly recommend Melody

You can click on one of the playground examples on the page, then modify it to your needs. It will spit out regex for, but it's much more readable and has a smoother learning curve

[–] Sterile_Technique@lemmy.world 1 points 3 months ago (1 children)

Bookmarked! Yeah whichever option is the absolute most idiot-proof is the one I might stand a chance with. I'm surface-level computer literate, but never got into any coding... other than MySpace back when it was HTML and some Excel formula shenanigans.

We're days away from the next semester of nursing school kicking off, so I don't think diving into full regex would be feasible for me at this point. I'm tinkering with MS Word find/replace wildcards and have a nice list of commands going. Prof already assigned us 8 chapters of reading to be completed BY day 1, and I just stacked em all up in one doc and obliterated an insane amount of nonsense in like 30 mins (not including the several days of cobbling the commands together, but they're on standby for instant use now!).

I don't think it'll be any problem to hit entire textbooks from here out, so hell yeah!

Anyway, MS Word wildcards seem like a good start to dip my feet in now. Once the semester is done and I have some free time again, a real dive into regex is the next project!

[–] theneverfox@pawb.social 2 points 3 months ago

In that case, if wildcards aren't enough I'd use an LLM - chat gpt or llama can handle simple regex, and you can just try it out and see if it worked right

Honestly, as a programmer, I'd advise you to learn Python or JavaScript before diving into regex. If you could mess with html without guidance, you've passed the big gap that separates people who can code from those who can't - your eyes didn't glaze over when you looked at something you didn't understand. Writing a script to do custom string replacements isn't hard, it's less efficient but it'll stick with you in a way that regex won't

I use regex when I need speed, but it's a very powerful one trick pony - the problem is it's extremely dense

You can write code that follows your thoughts, you write regex that matches your intentions

[–] otter@lemmy.dbzer0.com -1 points 3 months ago

IMHO, this is one of the applications wherein "AI" in its current form can really shine. Even the low monthly cost of GPT could be worth it if only to be able to train your own bot on specifics for your own use-case. Hell, there might even be one already made that's close enough? If you'd like me to give a quick look, LMK. 🖖🏽

[–] mozz@mbin.grits.dev -4 points 3 months ago

Copy paste into an LLM, piece by piece; ask it to keep the text the same but strip out the crap you don’t need. Give it specific examples of what things you want it to strip out. Then feed that output to the text-to-speech.

My recommendations in order of which to try first would be:

  • ChatGPT free tier
  • Claude.ai $20/mo tier
  • ChatGPT $20/mo tier

Re-give the instructions with every page; it won’t remember what to do. Honestly you probably want to start a new chat for each page / 2 pages / 5 pages or whatever. With simple instructions like this it can handle quite a lot of text at once so I wouldn’t be shy about how much you paste unless you start to hit limits or something.

Good luck, hope it helps