<RAG> 1. Tuning LLM Using Langchain + LM Studio

Friday, May 2, 2025 | 1 minute read | Updated at Friday, May 2, 2025

Jun Yeop(Johnny) Na

I’m working on a story evaluation AI Agent. I’m going to tune the model using Langchain with a local LLM using LM Studio

1. Setting up LM Studio

Tool for downloading and running local LLM model that fits your local environment.

Download: https://lmstudio.ai/
Install and we’ll be asked to download Qwen B7 for onboarding.

onboarding Qwen B7

After downloading, we can run the model

onboarding2

Run the model and we’ll se a chatbot environment pop up.

click on load model on the bottom right pop up screen.

onboarding3

Now we can send messages to check that the model works well

onboarding4

Doing some vibe check, it seems like it doesn’t understand Korean well. From now I’ll proceed using English in the prompt.

onboarding5

2. Testing API

When we deploy our model locally, we can see from the script that it uses port 1234 as default:

port

We can use the OpenAI API to get response from our string

from openai import OpenAI

from tqdm import tqdm
import pickle

client = OpenAI(
    api_key="API_KEY", # any random string since we're hosting our own model
    base_url="http://localhost:1234"
)

movies_char_responses = []
tvs_char_responses = []

client.chat.completions.create(
	model="deepseek-r1-distill-qwen-7b",
    messages="test",
    temperature=0.2,
    max_tokens=15000
)