Skip to content
logo AIFinitee

All things AI & LLM

  • Home
  • Articles
  • Blog
  • facebook.com
  • twitter.com
  • t.me
  • instagram.com
  • youtube.com
Subscribe

Distributed Inference

Home ยป Distributed Inference
Posted inAMD Distributed Inference

ComfyUI on AMD Ryzen AI MAX: 96 GB Unified Memory vs 16 GB NVIDIA

At AIFinitee, we have spent months benchmarking LLM inference on our dual-node AMD Ryzen AI MAX 395+ cluster. We have measured tokens per second across MiniMax-M2 and Qwen3.5-397B. We have…
Speed vs. Smarts: When Bigger Models Win for Local AI Coding
Posted inAMD Distributed Inference

Speed vs. Smarts: When Bigger Models Win for Local AI Coding

At AIFinitee, we've spent months chasing tokens per second. Our two-node AMD Ryzen AI MAX cluster hits 17-20 tok/s with MiniMax-M2. Our Linux-vs-Windows benchmarks showed how your OS quietly taxes…
Posted inAMD Distributed Inference

Linux vs Windows for LLM Inference: More Tokens/Second on Linux

Same GPU, different OS = different performance. See why Linux delivers 5-30% more tokens/sec than Windows across NVIDIA, AMD, and Intel GPUs.
AMD Ryzen AI MAX Two-Node Cluster Guide | LLM Setup (17-20 tok/s)
Posted inAMD Distributed Inference MiniMax

AMD Ryzen AI MAX Two-Node Cluster Guide | LLM Setup (17-20 tok/s)

One of the beliefs we hold at AIfinitee is that LLMs will only get bigger. Bigger LLMs will mean more memory for usage in either cloud compute or local compute.…
Copyright 2026 — AIFinitee. All rights reserved. Bloghash WordPress Theme
Scroll to Top