![](https://res.cloudinary.com/read-cv/image/upload/c_limit,h_2048,w_2048/v1/1/pages/fraozGmnpng3dN94Irr4Yu39HAD3/dfpcuVnoj4bKAVAxOqr5/c395467d-da64-445a-b5d6-85a89e191fa4.jpg?_a=DATAdtfiZAA0)
⏱️ So the 48-hour countdown has begun.
First, the plan for creating Persóna was the following:
-
Create a Web app (home page, chat UI etc.).
-
Clone the Voice using ElevenLabs.
-
Explore solutions for creating embedding for GPT.
-
Combine all of this into the product.
After constructing the plan, I began creating the web app, chat UI, thinking about best ways for a user to interact with an app.
What technologies am I going to use?
Scrolling through twitter, youtube, hackernews I make a list of interesting technologies that I would like to study or use in the future, and now in this project I decided to use those technologies.
So, the tech stack is the following:
-
Next.js (pages)
-
TypeScript
-
Tailwind
-
Prisma (ORM)
-
tRPC (nice DX for APIs, client/server logic)
-
Vercel
-
PlanetScale (database)
also, I wanted to add auth so i used Clerk for it
- Clerk (authentication, login)
and OpenAI's GPT-3.5-Turbo model as LLM for chatting
- GPT-3.5-Turbo
To kickstart the project I used create-t3-app(https://create.t3.gg) as a template, this comes with configured tRPC, Prisma, Tailwind, TypeScript, Next.js. I used shadcn-ui(https://ui.shadcn.com) as my design system/ui-library.
I thought about making the chat interface/interaction look like a native app, and used a few things like svh, dvh, lvh to make the app on mobiles nice to use, and to make the messages have some animation I added transitions on send and receive, to make interaction little bit fun when user sends or receives a message the "tick" sound will play. The app will scroll the view on each new message so the user will always see the latest messages first.
Now let's talk about the voice cloning part.
I found out about ElevenLabs from Twitter and decided to clone Arman's voice there. Unfortunately, I couldn't find any of Arman's speeches in English (voice cloning only works in english) to use for cloning, but when I asked Arman about it, it turned out that he had some recorded videos from his past podcasts/talks and also shared his recordings with me (so I could use them as embeds for LLM).
I cut out the right parts of the video to clone the voice and got impressive results on first try, so I even thought I might call it a day, start integrating voice into app and not do the embedding. But decided that there was still plenty of time and I would still have time to do all that was in the plan.
to be continued...