Choosing an AI With Minimal Censorship for Telegram Roleplay

How I searched for an AI that can stay in character, including rough language and swearing, without constant refusals - and why Grok became the final choice.

A "small" idea that quickly got bigger

About four months ago I had a simple idea: build a lightweight Telegram AI bot, give it a specific personality (historical figure, opinionated political character, or fictional role), add it to a private group chat, and let it generate lively interactions with people.

Easy plan, right? Not exactly. The experiment expanded far beyond the original scope and took much longer than expected.

The concept: hard requirements for the model

Model choice was the first and most important step. I rejected both fine-tuning existing models and training my own LLM from scratch — too expensive in terms of time, expertise, and infrastructure. So I focused on ready-to-use models where behavior could be controlled by prompt.

My requirements were strict:

Requirements one and two were manageable. Requirement three was the real pain point.

ChatGPT and other mainstream models: where it broke

My first test was obviously ChatGPT. Great quality, great language performance, but the strictest boundaries for role behavior in my tests. Any attempt at "edgy" character play often led to refusal.

Screenshot of a roleplay request and a refusal response from ChatGPT.
Image 1: screenshot of a ChatGPT refusal on a roleplay request.

Gemini, Llama, Qwen, and DeepSeek were somewhat more flexible, but still far from the freedom level this project needed.

Moving to uncensored open-source models

Next I moved toward uncensored model variants: fewer built-in moral filters, fewer refusals, more control on the developer side. I searched on Hugging Face and ended up testing Dolphin 2.7 Mixtral 8x7B by Eric Hartford.

In terms of flexibility, it was close to ideal. But infrastructure became the next major blocker.

Ollama helps, but hardware reality still wins

Mainstream models usually come with clean hosted APIs. Open-source models are different: you either self-host or rent servers. Yes, Ollama makes local deployment and API access very convenient — but large models still demand serious compute.

With Mixtral 8x7B, the architecture and parameter scale quickly become a VRAM problem in real life. Running it without heavy compromises was not realistic on my machine.

Server screenshot showing hardware limitations for running a large open-source LLM locally.
Image 2: my little server was not ready for this workload.

Quantization: useful, but not enough

I did test quantization. On paper it is a great tradeoff: lower memory usage with acceptable quality loss. On my hardware, Q4_K_M was the most practical option.

In practice, response quality dropped too much for my use case. I got stuck in a classic trap: mainstream models were too restrictive, while open-source models were too expensive to run at the quality I needed.

A dead end, some doomscrolling, and an obvious answer

At one point I was close to dropping the project. Then, while reading another "interesting" thread on X, I saw Grok again and thought: why not test it properly for this exact scenario?

Early Grok release screenshot that became a turning point in the project.
Image 3: early Grok release — the point where things clicked.

Why Grok became the final base

The funny part: I spent a lot of time searching for a complex workaround, built up a mini knowledge base on LLM internals, quantization, and tradeoffs, and the production path ended up being much more straightforward.

Final result: the bot is live

If you are still reading — yes, the bot was finished. Model selection alone took more than a week while working full-time, before the core bot logic was even fully written.

Proof: @ROLEPLAY_AI_BOT. Enter at your own risk.