Integrating Claude Code with LM Studio
Power vs. Limits: Running LLMs on the new RTX 5070 Mobile architecture.
"Claude Code can talk to LM Studio via the Anthropic-compatible POST /v1/messages endpoint. However, usually you are going to have problems with the model if you don't set the right context. If the window is too small, the CLI overhead will cause the agent to fail."
1 How to fix context in LM Studio
- Navigate to Server: Open the "AI Chat" or "Local Server" tab (the double arrows or server icon on the left).
- Select Your Model: Choose your active model (e.g.,
openai/gpt-oss-20b) from the dropdown. - Adjust Context Length: On the right-hand sidebar, find Context Length (or
n_ctx). It is likely set to 2048 or 4096. - Increase the Value: Change this to at least 16384 (16k) or 32768 (32k).
Note: Be careful—if you go too high (like 131k), you may run out of VRAM/RAM.
- Apply Changes: Click "Reload Model" at the top to activate the new limit.
2 Recommended Models
For coding use cases with Claude Code:
- gpt-oss (20B)
- devstral-small-2 (24B)
- qwen3-coder (30B)
- glm-4.7-flash (30B)
- glm-4.7:cloud
- minimax-m2.1:cloud
3 Network & Dedicated Machine Setup
You aren't limited to your local machine. You can use a dedicated machine for running LM Studio server to offload processing. Simply point your terminal to the server's IP address:
export ANTHROPIC_BASE_URL=http://[SERVER_IP]:1234
export ANTHROPIC_AUTH_TOKEN=lmstudio
馃挕 The 25k Rule
Official docs suggest using a model and server settings with more than ~25k context length. Tools like Claude Code can consume a lot of context very quickly.
Comentarios
Publicar un comentario