The Origins of Kim
From the first idea to a productive AI of our own, it was a long journey. Kim did not emerge from simply connecting OFORK to external services, but from the clear goal of building an independent, technically robust and credible AI solution for OFORK.
Why We Developed Kim in the First Place
The idea behind Kim was never to connect OFORK to an external AI service and present it as our own solution. Our approach was different from the beginning: we wanted an AI that truly fits OFORK, is technically understandable, and is built on a foundation that we ourselves can understand, develop further and take responsibility for.
This exact standard made the path longer, more demanding and technically more complex. But that is also why Kim is now more than just a showcase for someone else’s services.
What Mattered to Us
- building an independent AI for OFORK
- not marketing a mere interface to large third-party providers
- keeping control over technology, training and further development
- creating a solution that is actually useful in everyday support work
- taking data security and credibility seriously
The Search for the Right Foundation
At the beginning, the key question was which base model would be suitable. With OpenHermes-2.5-Mistral-7B we found an initial technical foundation. It quickly became clear, however, that base models in this form are primarily intended for GPU environments.
Our goal, however, was a solution that also makes sense for customers who do not necessarily operate a large GPU infrastructure. That made one thing clear very early on: we needed to find a path that is both powerful and flexible in deployment.
The First Stable CPU Tests
- Kim ran stably on a CPU server for the first time – still slowly, but functional.
- However, this basis was not yet sufficient for productive use.
- Parallel processing, training effort and higher performance requirements made a move to GPU servers necessary.
- An important factor was a server location in Europe, in line with our requirements for security and availability.
- With our long-standing provider, we ultimately found a suitable solution.
The Right Hardware
Productive GPU Server:
- AMD EPYC™ 7313P
- Zen 3 (Milan)
- 16 C / 32 T
- 3.0–3.7 GHz
- 128 GB DDR4 ECC
- 960 GB NVMe SSD (2 × 960 GB, hardware RAID 1)
- NVIDIA® A10 GPU
The purchase price for such a server is around €7,000 to €15,000. On this foundation, Kim runs fast and can process many parallel requests.
CPU Operation with Smaller Hardware
Test Server with 32 GB:
- IX6-32 NVMe
- Intel® Xeon® E-2356G
- Rocket Lake
- 6 C / 12 T | 3.2–5.0 GHz
- 32 GB DDR4 ECC
- 512 GB NVMe SSD (2 × 512 GB, software RAID 1)
Kim also runs quickly on this server, although parallel requests are only possible there to a limited extent.
Training Began
- the actual training of Kim began
- many parameters had interdependencies with one another
- time-consuming tests and constant fine-tuning were required
First Real Signs of Learning
- for the first time, Kim showed behavior that had actually been learned
- the results were encouraging, but not yet satisfactory
- datasets, training parameters and prompts had to work together precisely
- prompts became an important building block for quality
Refinement and Clarity
- our understanding of AI training had grown significantly in the months before
- the refinement of our own AI began
- at the same time, it became increasingly clear to us how many vendors in the market simply relabel third-party AI services
- that is exactly why our focus on independence remained central
Technical Facts About Kim
- Our base model is called Llama-3.1-8B-Instruct.
- The model has 8 billion parameters.
- It answers questions, supports spell checking and works in multiple languages.
- Our trained model is called Llama-OFORK.
- The CPU version for servers with at least 32 GB is called Llama-OFORK.Q4_K_M.gguf.
Qdrant and Infrastructure
- In addition, we use Qdrant as a RAG component.
- This improves Kim’s answers and supports ticket search.
- Qdrant also searches attachments and runs locally on the same server where OFORK is installed.
- Questions in the Kim chat and spell-check requests are sent to our GPU server.
- Kim runs there productively – without permanent storage of the content.
Kim Was Built as an Independent Solution by Conviction
The journey was technically demanding and much more complex than simply connecting to external AI services. That is exactly why Kim now stands for something that matters to us: independence, transparency and a solution that truly fits OFORK.
View Features Request a Demo
DE