expired Posted by SehoneyDP • Mar 15, 2024
Mar 15, 2024 4:09 PM
Item 1 of 1
expired Posted by SehoneyDP • Mar 15, 2024
Mar 15, 2024 4:09 PM
NVIDIA GeForce RTX 3090 Founders Edition Dual Fan 24GB GDDR6X GPU Card (Refurb)
(Select Stores) + Free Store Pickup$700
Micro Center
Visit Micro CenterGood Deal
Bad Deal
Save
Share
Top Comments
Check out /r/localllama on reddit.
148 Comments
Sign up for a Slickdeals account to remove this ad.
Our community has rated this post as helpful. If you agree, why not thank slimdunkin117
I can play with GPT to see what it spits out - I am curious to hear from someone that has done it.
I can play with GPT to see what it spits out - I am curious to hear from someone that has done it.
Once you get things working, you should checkout oobabooga's UI. its really popular in the community https://github.com/oobabooga/text...tion-webui
allows you to do all sorts of things from trying different models, adjusting parameters, and even finetuning
Sign up for a Slickdeals account to remove this ad.
Check out /r/localllama on reddit.
Hardware minimums are pretty low to run a 7B parameter LLM, but can ramp up substantially if you want to run a 30 or 60B parameter LLM and get more than a couple of Tokens/s.
It's super efficient during idle and consumes only 3w without connect to any monitor.
Our community has rated this post as helpful. If you agree, why not thank HappyAccident
If you know what is going on inside a transformer model (I barely have a rough idea) it is computing by comparing different possibilities against its set of weights in order to ultimately find a response that fits within its constraints (the parameters you set when you load it -- not sure how LM studio does it, but if you see a 'temperature' slider, that set of options is what I am talking about). To do this it has to run through millions of tries over and over running through layers until it pops out a token. This requires vast amounts of data to be evaluated constantly and thus memory bandwidth is the key limiting factor for operation.
Hope I didn't get too much wrong with that description.
This is all related only to inference by the way (having the models compute responses), not training (teaching the models how to compute responses) which has similar constraints but is a different process and can rely on different factors.
Sign up for a Slickdeals account to remove this ad.