As AI (specifically Large Language Model), blown up thanks to the popularity of ChatGPT, I started to take a look deeper about it. From the first sight, it's seems like many product launched lately is more about "stuff + openai", which not really a bad thing, but makes me wonder, is Open AI will monopoly the AI fields?
Then, a project called llama.cpp comes out, that claims able to runs Facebook LLM model just utilizing CPU. A quick look into the repository documentations, and it blows my mind. As a believer in ability to self hosting application, keeping privacy in check in an unknown world of Artificial Intelegence seems beneficial.
I dig through the repository, reads some issues, and play around with it, and it runs really well, both on my M1 Pro MacBook, and my Orange Pi 5. The accuracy is not highest, and the speed is usable for a simple local “ChatGPT” like application. From this experience, I think we could start considering build stuff on top of it more seriously.
Luckily, that llama.cpp project not only proof it possible, but also encourage many other open source project to build their own LLM models, one of them is GPT4All, which claims has decent performance and accuracy compares to ChatGPT with GPT 3.5 model. And this project has an official NodeJS binding. This is great since I’m quite confident develop tools / apps using NodeJS.
Ideas: Personal Email Summarizer
A simple ideas for LLM is summarizer. I‘m using Spark (a really great mail client btw, you should definitely check that out), and it already did great job to filter out low priority email. Low priority email includes newsletter, which sometimes, I’m curious and wants to learn more.
With that said, I start to tinkering the GPT4All binding, and mix and match it with other self hosted tools that make sense for this use case. Then I remember a great workflow tools called n8n, that easily hosted and extended, using NodeJS. N8n basically a workflow tools that enable users to define a workflow of many 3rd party integrations like SMTP, GMail, Telegram, etc. Considering the LLM capability and the main missing piece is access to data, n8n basically provide the bridge of it.
Designing Workflow
on n8n I start with creating a workflow that listen SMTP of my gmail, hook it with my custom gpt4all node to do AI Summary, and then push it to Telegram, all runs on Orange PI 5 beind Tailscale, so I could access it from all my device, and keep it secure.
At first it works as expected, all emails got summarized and pushed to my Telegram bot. But then some of the jobs got stuck. Turns out the event from SMTP might contains multiple email, and n8n by nature, runs all of them in parallel. Considering my Orange Pi 5 compute resources (4 performance cores, 4 efficiency cores), the jobs easily piled up and takes so long to complete.

Based on my needs, I don’t need a quick summary when each of the email arrives. So I start updating my workflow to break all the events into a serial job in queue, with a simple built in n8n node called Split In Batches. Configure it to only forward 1 message each batch, process it, then send it to telegram, now I had my own AI Email Summarizer.
Conclusion
I personally believe that AI could help humanity going forward, but I also believe that we need to learn from our past about data privacy. We know giants like Google, Microsoft, Meta, and now OpenAI had all it takes on compute resources to built this awesome AI stuff, but we can’t easily give up all our data to them, and let them do monopoly in this area. This AI model will be an additional great tools alongside other existing tools, and democratizing it (by let anyone host their own AI with their own data), is the correct way to move AI forward.