this post was submitted on 14 Dec 2023
14 points (100.0% liked)
LocalLLaMA
2237 readers
1 users here now
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Sure! You’ll probably want to look at train-text-from-scratch in the llama.cpp project, it runs on pure CPU. The (admittedly little docs) should help, otherwise ChatGPT is a good help if you show it the code. NanoGPT is fine too.
For dataset, maybe you could train on French Wikipedia, or scrape from a French story site or fan fiction or whatever. Wikipedia is probably easiest, since they provide downloadable offline versions that are only a couple gigs.