Slickdeals is community-supported.  We may get paid by brands for deals, including promoted items.
popularDr.W posted May 06, 2026 04:48 PM
popularDr.W posted May 06, 2026 04:48 PM

Lenovo Legion Pro 7i: 16" 2.5K 240Hz OLED, Intel Ultra 9 275HX, RTX 5090, 32GB DDR5, 1TB SSD $2999

$2,999

$3,999

25% off
B&H Photo Video
22 Comments 5,751 Views
Get Deal at B&H Photo Video
Good Deal
Save
Share
Deal Details
Includes 2 Free Items:
  • Bitdefender Total Security (Download, 5 Devices, 6 Months)
  • NVIDIA GeForce RTX 50 Series Pragmata Game Bundle
SPECS:
  • 2.7 GHz Intel Core Ultra 9 24-Core
  • 32GB of DDR5 RAM | 2TB M.2 SSD Storage
  • 16" 2560 x 1600 OLED 240 Hz Display
  • NVIDIA GeForce RTX 5090 (24GB GDDR7)
  • Thunderbolt 4 | HDMI 2.1 | USB-C | USB-A
  • 2.5 GbE | Wi-Fi 7 (802.11be) | BT 5.4
  • 5MP Webcam with Privacy Shutter
  • RGB Backlit Keyboard | Touchpad
  • Windows 11 Home
https://www.bhphotovideo.com/c/pr...ro_7i.html
Community Notes
About the Poster
Deal Details
Community Notes
About the Poster
Includes 2 Free Items:
  • Bitdefender Total Security (Download, 5 Devices, 6 Months)
  • NVIDIA GeForce RTX 50 Series Pragmata Game Bundle
SPECS:
  • 2.7 GHz Intel Core Ultra 9 24-Core
  • 32GB of DDR5 RAM | 2TB M.2 SSD Storage
  • 16" 2560 x 1600 OLED 240 Hz Display
  • NVIDIA GeForce RTX 5090 (24GB GDDR7)
  • Thunderbolt 4 | HDMI 2.1 | USB-C | USB-A
  • 2.5 GbE | Wi-Fi 7 (802.11be) | BT 5.4
  • 5MP Webcam with Privacy Shutter
  • RGB Backlit Keyboard | Touchpad
  • Windows 11 Home
https://www.bhphotovideo.com/c/pr...ro_7i.html

Community Voting

Deal Score
+18
Good Deal
Get Deal at B&H Photo Video

Leave a Comment

Unregistered (You)

22 Comments

Sign up for a Slickdeals account to remove this ad.

May 06, 2026 04:59 PM
2,546 Posts
Joined Nov 2010
Elon69May 06, 2026 04:59 PM
2,546 Posts
Decent deal. I got the same machine with 2TB and 64GB ram. For I think $3200 or $300 at micro center on a deal posts here I forgot exact price.
May 06, 2026 05:09 PM
577 Posts
Joined Nov 2007
pixelmasterMay 06, 2026 05:09 PM
577 Posts
Quote from Elon69 :
Decent deal. I got the same machine with 2TB and 64GB ram. For I think $3200 or $300 at micro center on a deal posts here I forgot exact price.
Are you running local AI, overnight processes comfortably?
I'm leaning toward a desktop setup but this seems to be a good deal and portability is nice to have.
May 06, 2026 05:13 PM
67 Posts
Joined Dec 2015
dars.boxMay 06, 2026 05:13 PM
67 Posts
Quote from pixelmaster :
Are you running local AI, overnight processes comfortably?
I'm leaning toward a desktop setup but this seems to be a good deal and portability is nice to have.
Rtx 3090 gets you 24 gb vram at 1k but you'd still need PC components which would be another 1k ish. So maybe 1k cheaper
May 06, 2026 05:38 PM
515 Posts
Joined Apr 2016
grampyjoeMay 06, 2026 05:38 PM
515 Posts
Quote from Elon69 :
Decent deal. I got the same machine with 2TB and 64GB ram. For I think $3200 or $300 at micro center on a deal posts here I forgot exact price.
Why did you get 64gb RAM? Games don't benefit from that much or at all
1
May 06, 2026 06:25 PM
941 Posts
Joined Jan 2021
BlownCamaroMay 06, 2026 06:25 PM
941 Posts
How does a laptop cool with a 5090 in it? Either it sounds like a jet engine, or it's getting throttled.
May 06, 2026 06:52 PM
752 Posts
Joined Dec 2021
CyanWriter8569May 06, 2026 06:52 PM
752 Posts
Quote from BlownCamaro :
How does a laptop cool with a 5090 in it? Either it sounds like a jet engine, or it's getting throttled.
They nerf the power and it's honestly not worth it at that price.
May 06, 2026 09:14 PM
173 Posts
Joined Jan 2024
JustyourdadMay 06, 2026 09:14 PM
173 Posts
God promised this laptop to me 3000 years ago
1

Sign up for a Slickdeals account to remove this ad.

May 06, 2026 09:17 PM
467 Posts
Joined Apr 2009
crazygideonMay 06, 2026 09:17 PM
467 Posts
Quote from BlownCamaro :
How does a laptop cool with a 5090 in it? Either it sounds like a jet engine, or it's getting throttled.
It's a 5080 (desktop) die but with 24GB VRAM and a 150W power limit.
May 06, 2026 10:02 PM
2,546 Posts
Joined Nov 2010
Elon69May 06, 2026 10:02 PM
2,546 Posts
Quote from pixelmaster :
Are you running local AI, overnight processes comfortably? I'm leaning toward a desktop setup but this seems to be a good deal and portability is nice to have.
Yes. But I am ended up getting a 5090 desktop also... lol. I I'll explain later.
May 06, 2026 10:03 PM
2,546 Posts
Joined Nov 2010
Elon69May 06, 2026 10:03 PM
2,546 Posts
Quote from grampyjoe :
Why did you get 64gb RAM? Games don't benefit from that much or at all
It came with it and I didn't get it for games. 64gb comes very handy as I max out vram and system ram often enough
May 07, 2026 01:23 AM
914 Posts
Joined Mar 2009
lip008May 07, 2026 01:23 AM
914 Posts
Quote from Elon69 :
It came with it and I didn't get it for games. 64gb comes very handy as I max out vram and system ram often enough
IDK if he ever considered running a few VM's or anything with AI....
May 07, 2026 02:32 AM
2,546 Posts
Joined Nov 2010
Elon69May 07, 2026 02:32 AM
2,546 Posts

Our community has rated this post as helpful. If you agree, why not thank Elon69

Quote from pixelmaster :
Are you running local AI, overnight processes comfortably?
I'm leaning toward a desktop setup but this seems to be a good deal and portability is nice to have.
More than you asked for but I'm sharing this because I ran into issues you might face when building a local AI stack. I should have researched deeper before buying, but I learned a lot through trial and error. My goal was to avoid renting GPUs or buying an RTX 6000, so I optimized for consumer hardware (RTX 5090 Laptop + Desktop 5090 + M5 Max lol).


too lazy to re-read again and fix some of the formatting.

1. The Core Problem: Resource Contention in my use-case
Running local LLMs, image generation, and video generation simultaneously on a single machine creates severe bottlenecks, even with 24GB or 32GB of VRAM.

The Conflict: When running ComfyUI (media) and LM Studio (LLMs) together, they fight for VRAM and GPU cycles. Neither app respects the other's resource limits. Tuning GPU slices is largely ineffective in this scenario.

The "Land Grab": One process will seize VRAM, starving the other. If VRAM fills up, you hit an Out of Memory (OOM) crash.

LM Studio: Smartly stops loading models if VRAM is insufficient. However, if LM Studio is already loaded, it rarely releases VRAM when ComfyUI starts, causing ComfyUI to starve.

ComfyUI: If it crashes due to OOM, it often hangs or pauses (especially if launched via default batch files), killing the workflow until manual intervention.
The Queue Killer: Both apps have internal queues. When running separately, this works well. When running together, they don't share queues. They each launch one generation, slowing the machine to a crawl or triggering OOM errors.
Performance Note: Even with a Desktop RTX 5090, heavy generation (30s–600s+) blocks the GPU entirely. Nothing else can run on the GPU during this time.

2. Hardware Constraints & Model Limits
A. Laptop RTX 5090 (24GB VRAM)

LLMs: Limited to ~18–20B parameter models.
Qwen 3.5 (9B): Runs well.
Gemma 4 (20B): Usable but unsatisfying for complex tasks.
Qwen 3.6 (27B/35B): Will not fit in VRAM. If forced into system RAM, it becomes painfully slow, so I removed these models to prevent accidental loading.
Images/Video: Works, but you must use the --low-vram flag in ComfyUI to prevent OOM. I used this for LTX video models. While newer low-VRAM workflows exist, this was a necessary workaround at the time.

Concurrency: Local LLMs are significantly slower than online models (~100 tokens/sec for 35B). You cannot run multiple simultaneous coding sessions or sub-agents. For real development, I rely on online APIs for concurrency.



B. Desktop RTX 5090 (32GB VRAM)

Better Capacity: Can handle larger models than the laptop, but still faces OOM risks during long video generations.

System RAM is Critical: I observed my desktop using all 32GB of system RAM. Upgrade to 64GB+ if you can (this will drive the price to ~$4000). If local LLMs overflow VRAM, they will bleed into system RAM. More system RAM prevents crashes during these overflow events.

Performance: I hit ~200 tokens/sec on a 35B Qwen model. While fast for a single stream, it lacks the concurrency of online APIs.



How I overcome OOM hangs or other problems...

3. My Solution: The "Watchdog" Strategy
To run unmanned tasks overnight without babysitting the machine, I implemented a custom Watchdog Wrapper.

Priority Hierarchy: I prioritize Image/Video generation over LLM requests. Local media generation is significantly faster on NVIDIA chips than on my Mac.

The Watchdog Script Workflow:

Monitor: Watches memory and process status.

Kill Switch: If a ComfyUI request starts, the script instantly kills LM Studio to free up VRAM. I don't care about pending LLM requests; they are negligible now (mostly used for Meta prompting only now).

Auto-Recovery: If ComfyUI crashes due to OOM (which happens occasionally with long video clips >30s), the script detects it and automatically restarts ComfyUI. This allows for smoother, longer generations without manual stitching.

Restore: Once ComfyUI finishes, it restarts LM Studio to handle any pending LLM requests.





4. Why I Don't Use My Mac for LLMs - lol I mean I do use it but not every single time and definitely not for normal coding sessions!.

Speed: Even with 128GB of RAM, running large models like Qwen Coder Next (80B) or Qwen 27B is painfully slow (~20 tokens/sec). QCN was faster, I think 60 tks?

Frustration: Watching a local LLM process prompts and generate tokens one by one is too slow for active coding.

Strategy: I use the M5 Max for general tasks and lighter LLM inference, but I rely on online models for:
Real-time coding assistance.
Running sub-agents (Hermes/OpenClaw).
High-concurrency tasks (5+ simultaneous requests).


someone mentioned something about VMs. lol
VMs: I run VMs on my Mac (not the RTX machines) because the Mac has more CPU cycles and RAM available. ie: I host Hermes in a VM on the Mac, which calls local LM Studio on the Mac for a semi-local setup. However, true "local" agents are useless without internet access for external tools. my other true 24/7 hermes agent will use an online model. I have general sub already and I can use it until it's done.


Is Local AI GEAR Worth It?

For Learning: Yes. It allows for rapid iteration and experimentation.
For Cost: I generated ~3,000 video clips locally. Doing this via online APIs would have cost $1,000–$1,500.

As online subsidies decrease and prices rise, local hardware is becoming cost-effective.

For Privacy/Uncensored Use: Not necessary for me, as I don't fear data theft from online APIs and there's no porn coming out of my RTX GPUs.

For Coding: No. Online APIs still win on speed and concurrency.

6. Final Recommendation
If you need portability: The Laptop 5090 is great for coding on the go, but the battery dies instantly under GPU load.

If you need power: The Desktop 5090 is a workhorse, but ensure you have 64GB+ of system RAM.

Don't splurge blindly: If you plan to do overnight processing or heavy experimentation, the Watchdog strategy and proper hardware configuration are essential. Without them, you will spend more time fixing crashes than generating content. Unless someone else has a better strategy, I just decided to create something that works for me.

I have some algo to load distribute image and video requests to both machines also but that's something you can build. All of my machines are always connected to my Macbook no matter where I am also. I can generate a lot of nice graphics for example and I've them for "real work" at times.

Of course you can use nano banana and it would be cheaper but sometimes I just auto-generate 10-100 concept prompt variations and I can go back and just pick a good one I liked best. It will cost money and hit rate limits with Gemini or OpenAI's image models. There are also super cheap online image services like 2 cents a picture so ROI is very very slow. I also built app interfaces so I can create image/videos (without going to Comfyui). Again, existing online services are available, I just feel like replicating it as part of the learning just checking off a box on something I know how to do.

yes, I typed out the key points then AI-edited it but still needed some manual editing.


oh the only thing Mac is good for LLM is very very large models but I really have no need for it that. the 128GB Ram is overkill, I could have gotten away with 64GB since I doubt I will go past 35B model use on the Mac.

Any image / video generation will suck on the Mac compared to RTX GPUs. don't buy a Mac at all for content generation. I don't even bother installing ComfyUI on the Macbook despite it is M5 Max 40GPUs because I can just route image/video requests to my RTX machines - they are always connected together as long as I have internet access on my mac.
Last edited by Elon69 May 6, 2026 at 07:56 PM.
3
May 07, 2026 09:55 AM
577 Posts
Joined Nov 2007
pixelmasterMay 07, 2026 09:55 AM
577 Posts
Quote from Elon69 :
More than you asked for but I'm sharing this because I ran into issues you might face when building a local AI stack. I should have researched deeper before buying, but I learned a lot through trial and error. My goal was to avoid renting GPUs or buying an RTX 6000, so I optimized for consumer hardware (RTX 5090 Laptop + Desktop 5090 + M5 Max lol).too lazy to re-read again and fix some of the formatting. 1. The Core Problem: Resource Contention in my use-caseRunning local LLMs, image generation, and video generation simultaneously on a single machine creates severe bottlenecks, even with 24GB or 32GB of VRAM.The Conflict: When running ComfyUI (media) and LM Studio (LLMs) together, they fight for VRAM and GPU cycles. Neither app respects the other's resource limits. Tuning GPU slices is largely ineffective in this scenario.The "Land Grab": One process will seize VRAM, starving the other. If VRAM fills up, you hit an Out of Memory (OOM) crash.LM Studio: Smartly stops loading models if VRAM is insufficient. However, if LM Studio is already loaded, it rarely releases VRAM when ComfyUI starts, causing ComfyUI to starve.ComfyUI: If it crashes due to OOM, it often hangs or pauses (especially if launched via default batch files), killing the workflow until manual intervention.The Queue Killer: Both apps have internal queues. When running separately, this works well. When running together, they don't share queues. They each launch one generation, slowing the machine to a crawl or triggering OOM errors.Performance Note: Even with a Desktop RTX 5090, heavy generation (30s–600s+) blocks the GPU entirely. Nothing else can run on the GPU during this time.2. Hardware Constraints & Model LimitsA. Laptop RTX 5090 (24GB VRAM)LLMs: Limited to ~18–20B parameter models.Qwen 3.5 (9B): Runs well.Gemma 4 (20B): Usable but unsatisfying for complex tasks.Qwen 3.6 (27B/35B): Will not fit in VRAM. If forced into system RAM, it becomes painfully slow, so I removed these models to prevent accidental loading.Images/Video: Works, but you must use the --low-vram flag in ComfyUI to prevent OOM. I used this for LTX video models. While newer low-VRAM workflows exist, this was a necessary workaround at the time.Concurrency: Local LLMs are significantly slower than online models (~100 tokens/sec for 35B). You cannot run multiple simultaneous coding sessions or sub-agents. For real development, I rely on online APIs for concurrency.B. Desktop RTX 5090 (32GB VRAM)Better Capacity: Can handle larger models than the laptop, but still faces OOM risks during long video generations.System RAM is Critical: I observed my desktop using all 32GB of system RAM. Upgrade to 64GB+ if you can (this will drive the price to ~$4000). If local LLMs overflow VRAM, they will bleed into system RAM. More system RAM prevents crashes during these overflow events.Performance: I hit ~200 tokens/sec on a 35B Qwen model. While fast for a single stream, it lacks the concurrency of online APIs.How I overcome OOM hangs or other problems...3. My Solution: The "Watchdog" StrategyTo run unmanned tasks overnight without babysitting the machine, I implemented a custom Watchdog Wrapper.Priority Hierarchy: I prioritize Image/Video generation over LLM requests. Local media generation is significantly faster on NVIDIA chips than on my Mac.The Watchdog Script Workflow:Monitor: Watches memory and process status.Kill Switch: If a ComfyUI request starts, the script instantly kills LM Studio to free up VRAM. I don't care about pending LLM requests; they are negligible now (mostly used for Meta prompting only now).Auto-Recovery: If ComfyUI crashes due to OOM (which happens occasionally with long video clips >30s), the script detects it and automatically restarts ComfyUI. This allows for smoother, longer generations without manual stitching.Restore: Once ComfyUI finishes, it restarts LM Studio to handle any pending LLM requests.4. Why I Don't Use My Mac for LLMs - lol I mean I do use it but not every single time and definitely not for normal coding sessions!.Speed: Even with 128GB of RAM, running large models like Qwen Coder Next (80B) or Qwen 27B is painfully slow (~20 tokens/sec). QCN was faster, I think 60 tks?Frustration: Watching a local LLM process prompts and generate tokens one by one is too slow for active coding.Strategy: I use the M5 Max for general tasks and lighter LLM inference, but I rely on online models for:Real-time coding assistance.Running sub-agents (Hermes/OpenClaw).High-concurrency tasks (5+ simultaneous requests).someone mentioned something about VMs. lolVMs: I run VMs on my Mac (not the RTX machines) because the Mac has more CPU cycles and RAM available. ie: I host Hermes in a VM on the Mac, which calls local LM Studio on the Mac for a semi-local setup. However, true "local" agents are useless without internet access for external tools. my other true 24/7 hermes agent will use an online model. I have general sub already and I can use it until it's done.Is Local AI GEAR Worth It?For Learning: Yes. It allows for rapid iteration and experimentation.For Cost: I generated ~3,000 video clips locally. Doing this via online APIs would have cost $1,000–$1,500. As online subsidies decrease and prices rise, local hardware is becoming cost-effective.For Privacy/Uncensored Use: Not necessary for me, as I don't fear data theft from online APIs and there's no porn coming out of my RTX GPUs. For Coding: No. Online APIs still win on speed and concurrency.6. Final RecommendationIf you need portability: The Laptop 5090 is great for coding on the go, but the battery dies instantly under GPU load.If you need power: The Desktop 5090 is a workhorse, but ensure you have 64GB+ of system RAM.Don't splurge blindly: If you plan to do overnight processing or heavy experimentation, the Watchdog strategy and proper hardware configuration are essential. Without them, you will spend more time fixing crashes than generating content. Unless someone else has a better strategy, I just decided to create something that works for me.I have some algo to load distribute image and video requests to both machines also but that's something you can build. All of my machines are always connected to my Macbook no matter where I am also. I can generate a lot of nice graphics for example and I've them for "real work" at times. Of course you can use nano banana and it would be cheaper but sometimes I just auto-generate 10-100 concept prompt variations and I can go back and just pick a good one I liked best. It will cost money and hit rate limits with Gemini or OpenAI's image models. There are also super cheap online image services like 2 cents a picture so ROI is very very slow. I also built app interfaces so I can create image/videos (without going to Comfyui). Again, existing online services are available, I just feel like replicating it as part of the learning just checking off a box on something I know how to do.yes, I typed out the key points then AI-edited it but still needed some manual editing.oh the only thing Mac is good for LLM is very very large models but I really have no need for it that. the 128GB Ram is overkill, I could have gotten away with 64GB since I doubt I will go past 35B model use on the Mac.Any image / video generation will suck on the Mac compared to RTX GPUs. don't buy a Mac at all for content generation. I don't even bother installing ComfyUI on the Macbook despite it is M5 Max 40GPUs because I can just route image/video requests to my RTX machines - they are always connected together as long as I have internet access on my mac.
Thanks for the detailed reply. Real world account without the hype.
Much appreciated.
May 07, 2026 11:05 AM
2,546 Posts
Joined Nov 2010
Elon69May 07, 2026 11:05 AM
2,546 Posts
Quote from pixelmaster :
Thanks for the detailed reply. Real world account without the hype.
Much appreciated.
You can tune the LLM part big time with llama.cpp that I haven't done to speed up the generations. Various guides can run 35B parameter on 8GB VRAM card also runs pretty fast also.

One example here where he this guy 40 tokens/s on Qwen 3.6 35B with some value tweaks on an 3060 card with only 8GB of RAM.


Also, someone gave me PC parts to build a 5080 desktop for them (and I can use it first) but because it only has 16GB VRAM, I left it in the corner for a month already cuz I don't feel like building it and deal with just 16GB of VRAM. lol.

https://youtu.be/8F_5pdcD3HY?si=fs3hn9BgWdjafn_F


I just haven't bothered because I will still run into the concurrency issue and competing with ComfyUI. There are also turbo quants to reduce ram usage on something called KV Cache that I didn't try because I still don't need to use the local LLM heavily yet and I have enough VRAM for now.
Last edited by Elon69 May 7, 2026 at 09:39 AM.

Sign up for a Slickdeals account to remove this ad.

May 07, 2026 11:26 AM
9 Posts
Joined Nov 2018
SedN9142May 07, 2026 11:26 AM
9 Posts
Quote from Elon69 :
More than you asked for but I'm sharing this because I ran into issues you might face when building a local AI stack. I should have researched deeper before buying, but I learned a lot through trial and error. My goal was to avoid renting GPUs or buying an RTX 6000, so I optimized for consumer hardware (RTX 5090 Laptop + Desktop 5090 + M5 Max lol).too lazy to re-read again and fix some of the formatting. 1. The Core Problem: Resource Contention in my use-caseRunning local LLMs, image generation, and video generation simultaneously on a single machine creates severe bottlenecks, even with 24GB or 32GB of VRAM.The Conflict: When running ComfyUI (media) and LM Studio (LLMs) together, they fight for VRAM and GPU cycles. Neither app respects the other's resource limits. Tuning GPU slices is largely ineffective in this scenario.The "Land Grab": One process will seize VRAM, starving the other. If VRAM fills up, you hit an Out of Memory (OOM) crash.LM Studio: Smartly stops loading models if VRAM is insufficient. However, if LM Studio is already loaded, it rarely releases VRAM when ComfyUI starts, causing ComfyUI to starve.ComfyUI: If it crashes due to OOM, it often hangs or pauses (especially if launched via default batch files), killing the workflow until manual intervention.The Queue Killer: Both apps have internal queues. When running separately, this works well. When running together, they don't share queues. They each launch one generation, slowing the machine to a crawl or triggering OOM errors.Performance Note: Even with a Desktop RTX 5090, heavy generation (30s–600s+) blocks the GPU entirely. Nothing else can run on the GPU during this time.2. Hardware Constraints & Model LimitsA. Laptop RTX 5090 (24GB VRAM)LLMs: Limited to ~18–20B parameter models.Qwen 3.5 (9B): Runs well.Gemma 4 (20B): Usable but unsatisfying for complex tasks.Qwen 3.6 (27B/35B): Will not fit in VRAM. If forced into system RAM, it becomes painfully slow, so I removed these models to prevent accidental loading.Images/Video: Works, but you must use the --low-vram flag in ComfyUI to prevent OOM. I used this for LTX video models. While newer low-VRAM workflows exist, this was a necessary workaround at the time.Concurrency: Local LLMs are significantly slower than online models (~100 tokens/sec for 35B). You cannot run multiple simultaneous coding sessions or sub-agents. For real development, I rely on online APIs for concurrency.B. Desktop RTX 5090 (32GB VRAM)Better Capacity: Can handle larger models than the laptop, but still faces OOM risks during long video generations.System RAM is Critical: I observed my desktop using all 32GB of system RAM. Upgrade to 64GB+ if you can (this will drive the price to ~$4000). If local LLMs overflow VRAM, they will bleed into system RAM. More system RAM prevents crashes during these overflow events.Performance: I hit ~200 tokens/sec on a 35B Qwen model. While fast for a single stream, it lacks the concurrency of online APIs.How I overcome OOM hangs or other problems...3. My Solution: The "Watchdog" StrategyTo run unmanned tasks overnight without babysitting the machine, I implemented a custom Watchdog Wrapper.Priority Hierarchy: I prioritize Image/Video generation over LLM requests. Local media generation is significantly faster on NVIDIA chips than on my Mac.The Watchdog Script Workflow:Monitor: Watches memory and process status.Kill Switch: If a ComfyUI request starts, the script instantly kills LM Studio to free up VRAM. I don't care about pending LLM requests; they are negligible now (mostly used for Meta prompting only now).Auto-Recovery: If ComfyUI crashes due to OOM (which happens occasionally with long video clips >30s), the script detects it and automatically restarts ComfyUI. This allows for smoother, longer generations without manual stitching.Restore: Once ComfyUI finishes, it restarts LM Studio to handle any pending LLM requests.4. Why I Don't Use My Mac for LLMs - lol I mean I do use it but not every single time and definitely not for normal coding sessions!.Speed: Even with 128GB of RAM, running large models like Qwen Coder Next (80B) or Qwen 27B is painfully slow (~20 tokens/sec). QCN was faster, I think 60 tks?Frustration: Watching a local LLM process prompts and generate tokens one by one is too slow for active coding.Strategy: I use the M5 Max for general tasks and lighter LLM inference, but I rely on online models for:Real-time coding assistance.Running sub-agents (Hermes/OpenClaw).High-concurrency tasks (5+ simultaneous requests).someone mentioned something about VMs. lolVMs: I run VMs on my Mac (not the RTX machines) because the Mac has more CPU cycles and RAM available. ie: I host Hermes in a VM on the Mac, which calls local LM Studio on the Mac for a semi-local setup. However, true "local" agents are useless without internet access for external tools. my other true 24/7 hermes agent will use an online model. I have general sub already and I can use it until it's done.Is Local AI GEAR Worth It?For Learning: Yes. It allows for rapid iteration and experimentation.For Cost: I generated ~3,000 video clips locally. Doing this via online APIs would have cost $1,000–$1,500. As online subsidies decrease and prices rise, local hardware is becoming cost-effective.For Privacy/Uncensored Use: Not necessary for me, as I don't fear data theft from online APIs and there's no porn coming out of my RTX GPUs. For Coding: No. Online APIs still win on speed and concurrency.6. Final RecommendationIf you need portability: The Laptop 5090 is great for coding on the go, but the battery dies instantly under GPU load.If you need power: The Desktop 5090 is a workhorse, but ensure you have 64GB+ of system RAM.Don't splurge blindly: If you plan to do overnight processing or heavy experimentation, the Watchdog strategy and proper hardware configuration are essential. Without them, you will spend more time fixing crashes than generating content. Unless someone else has a better strategy, I just decided to create something that works for me.I have some algo to load distribute image and video requests to both machines also but that's something you can build. All of my machines are always connected to my Macbook no matter where I am also. I can generate a lot of nice graphics for example and I've them for "real work" at times. Of course you can use nano banana and it would be cheaper but sometimes I just auto-generate 10-100 concept prompt variations and I can go back and just pick a good one I liked best. It will cost money and hit rate limits with Gemini or OpenAI's image models. There are also super cheap online image services like 2 cents a picture so ROI is very very slow. I also built app interfaces so I can create image/videos (without going to Comfyui). Again, existing online services are available, I just feel like replicating it as part of the learning just checking off a box on something I know how to do.yes, I typed out the key points then AI-edited it but still needed some manual editing.oh the only thing Mac is good for LLM is very very large models but I really have no need for it that. the 128GB Ram is overkill, I could have gotten away with 64GB since I doubt I will go past 35B model use on the Mac.Any image / video generation will suck on the Mac compared to RTX GPUs. don't buy a Mac at all for content generation. I don't even bother installing ComfyUI on the Macbook despite it is M5 Max 40GPUs because I can just route image/video requests to my RTX machines - they are always connected together as long as I have internet access on my mac.
hmm, I thought the MacBook would run a LLM faster than that because of the SoC. What's the bottleneck there? I'm running an eGPU and was on the fence for either the M5 128GB or AMD A
395 AI Max 128GB both use SoC.

Leave a Comment

Unregistered (You)

Popular Deals

Trending Deals