New Node, Who is This?

If it isn’t clear I have a GPU addiction. This post will render away all doubt about my technolust. I love hardware. I love exotic hardware that I’ve never seen before. I love to program or configure directly against this hardware to squeeze every last bit of performance out of it. Think how some (even old) XBox or Playstation games still are amazing on the platform that is a decade old. They programmed down to the metal.

Where do you host your website? R730 running stacked on an old P40 desktop-hack with converted blowers

>>> tell the story of saddam huessein and the playstation super computer he built

Saddam Hussein, the former President of Iraq, was known for many things, but building a PlayStation “supercomputer” is not one of them. There have been rumors and conspiracy theories that Hussein’s regime had developed a powerful gaming console-based computer for military purposes, but these claims have been widely debunked.

The story likely originated from a 2000 article in the Iraqi newspaper Al-Nasiriya Al-Gadida, which claimed that Iraqi scientists had built a supercomputer using 3,000 PlayStation 2 consoles. The computer was said to be used for weather forecasting and oil exploration, not military applications. However, there is no evidence that such a machine ever existed, and the claim has been dismissed as propaganda by many experts.

It’s worth noting that while the PlayStation 2 did have impressive technical specifications for its time, it was not designed to function as a supercomputer. Building a true supercomputer requires specialized hardware and software, as well as a deep understanding of computer science and engineering principles. It is highly unlikely that such a machine could be built using off-the-shelf gaming consoles.

In summary, while Saddam Hussein was known for his military ambitions and authoritarian rule, there is no evidence to support the claim that he built a supercomputer using PlayStation consoles. The story appears to have been exaggerated or fabricated for propaganda purposes.
Mixtral24

I guess I’ve been snoped. Moving on. If you want ME to build a super computer out of playstation2’s let me know.

Building a Better Large Language Model System

If you have been following along, I have a consumer level machine learning rig and a Poweredge r720. Things are going great but I figured I’d work my way towards the newer GPUs for servers. I needed a new server and, why not let’s try the r730. It is a little more expensive on e-bay but it looks a bit better spec wise in some areas I find desirable.

What area’s are those?

>>> what is difference from a PCIe bus between a r720 and an r730?

Both the Dell PowerEdge R720 and R730 servers use Peripheral Component Interconnect Express (PCIe) slots to connect various expansion cards, such as graphics cards, network interface cards, and storage controllers. However, there are some differences in the PCIe bus architecture between these two server models.

The Dell PowerEdge R720 supports up to eight PCIe 3.0 slots with a maximum bandwidth of 8 GT/s per lane. The server has a total of 40 PCIe lanes available for expansion cards. The R720 also supports the use of mezzanine cards, which are smaller expansion cards that can be mounted directly onto the motherboard.

On the other hand, the Dell PowerEdge R730 supports up to twelve PCIe 3.0 slots with a maximum bandwidth of 8 GT/s per lane. The server has a total of 64 PCIe lanes available for expansion cards. The R730 also introduces support for PCIe Gen 3 x16 slot, which provides higher bandwidth than the previous generation.

In summary, while both servers use PCIe 3.0 slots with similar maximum bandwidth per lane, the R730 supports more PCIe lanes and slots compared to the R720. Additionally, the R730 introduces support for a higher bandwidth PCIe Gen 3 x16 slot, which is not available on the R720. These differences can affect the overall performance and expandability of the server depending on the specific workload requirements and expansion cards being used.
Mixtral24

Poweredge R730

Really struggled this time to get ollama, you can see in the history (setup) below

setup Download

this is all you should need. starting point: fresh kubuntu install.

sudo apt install nvtop
nvtop # you will see no gpus probably
sudo ubuntu-drivers –gpgpu install
sudo shutdown -r
…
nvtop # hopefully you see gpus!
# install ollama (hopefully seemless for you)
# looking at vllm next

Nvidia Tesla T40 16gb

I’m spending more than i’d like on these but I think the power savings will be worth it in the “Total Cost of Ownership” (TCO). Take a look:

It is amazing the size difference these two GPUs:

kinda scammy site: https://technical.city/en/video/Tesla-P40-vs-Tesla-T40-24-GB

https://www.topcpu.net/en/gpu-c/tesla-p40-vs-tesla-t40-24-gb

Rob Web Services

idle machines make me sad
soon, shoot me email/dm/tweet.

Still Here 2022

[old draft that got rehydrated]

So I figured I’d push something out this year to justify the bill of hosting this site…

Let’s just say I’ve lost something over the past few years in programming computers. The magic is gone. I know how it works, I know why it works, I’m not impressed.

Every time I use a piece of kit from someone, I feel I’m being funneled into something. Save on OneDrive, save on iCloud… Why can’t I find my actual hard drive in Finder anymore?

End key doesn’t work in editor on WordPress to go to end of line…

I’m grumpy.

Nobody respects the user. You know who calls their customers users? Drug Dealers.

Ok enough complaining.

What are you up to?

So my last post was written in an Airbnb in Pensacola, FL while I was selling my house in Seattle. That sale went well, and now I’m back in the panhandle.

April 6, 2024

from a narrator perspective this was right before we shipped the final touch of “the rewrite” of the BAW PDK. I feel quite different about the above post than when I wrote it. I have my mojo back…

Technological Patina

The first time I recall hearing the idea of using “tried-and-true” technology aka OLD tech was when someone described Nintendo’s approach to hardware. After about a decade of experience I catch myself often saying that I prefer to have my stack to have a patina.

patina
noun
pa·ti·na pə-ˈtē-nə ˈpa-tə-nə
plural patinas pə-ˈtē-nəz ˈpa-tə-nəz or patinae pə-ˈtē-ˌnē -ˌnī, ˈpa-tə-
Synonyms of patina
1
a
: a usually green film formed naturally on copper and bronze by long exposure or artificially (as by acids) and often valued aesthetically for its color
b
: a surface appearance of something grown beautiful especially with age or use
the beautiful patina of this antique table — https://www.merriam-webster.com/dictionary/patina

As a young engineer I kind of laughed at this idea because of course I knew everything and wanted to use the latest and greatest always. I wanted speed, I wanted new, I wanted it all.

Cutting Edge

Sometimes there is no choice, you must access a new feature or API endpoint that requires something new. Often time it is a choice though. The risk I think is high. It has to be worth it.

Ok so some more AI gen:

If you would like to join my “Generative AI Group Chat” drop me a note in one of the many forms on this site.

Brain Trust

A “Brain Trust” refers to the collective intelligence and knowledge contained within a group or team in a company or organization. This valuable resource often consists of tribal knowledge, which is the information and expertise that is shared through informal channels and passed down from experienced members. By tapping into this Brain Trust, organizations can foster innovation, improve decision-making, and enhance overall performance. Encouraging open communication, collaboration, and continuous learning can help harness the power of a company’s Brain Trust and ensure that its tribal knowledge is preserved, shared, and leveraged to drive success.

or with a slightly different prompt

A Brain Trust is an informal group of experts who provide advice and insights to an individual or organization. The term was popularized during the Roosevelt administration, where a group of advisers, known as the “Brain Trust,” provided guidance on various policy matters.
Mixtral24

Primary Knowledge

Some people have knowledge that is not easily acquired through a typical or obvious learning process. They were there, they built it, they know why it was built that way. They understand the constraints, the tradeoffs, the baggage & luggage.

Tribal

Sometimes this knowledge is guarded. Not even in in secret internal wiki. Never written down. Why would you write it down? If you have nobody to share it with, why write it down. Enter the new age of auto-transcriptions. If you aren’t transcribing all your internal meetings into a curated corpus, you are being blocked by legal. or you are missing out.

Some will be actively hostile to sharing. Need to know lists (N2K) create vast arrays of silos of information. They may not be allowed to tell you.

Back to techno optimism.

I think we are just back to having secretaries
commandpattern.org – on LLMs and GenAI

Examples of Tribal Knowledge

Where the Water Main master shutoff valve is
…

Brain Trust as a Service. I think the only two companies you are competing against right now is Google and Microsoft. Good luck.

What are the consequences of this tribal knowledge concept in your organization?

Lets sus out this architecture….

Hardware Requirements
Why is legal mad at us?
Software Stack

Others on the topic

Would you pay me to make this for you?

← Back

Thank you for your response. ✨

10,000 Lines of (Legacy) Code

I think it maybe clear that you could think of this in the Malcom Gladwell’s metric, one line of code for one hour. It actually seems reasonable. I remember reading an article about Billy G’s company that average Microsoft programmer added about 10 lines of code a day to codebase.

It seems painfully slow. Obviously, yes, you can sling code quicker than this. But average it out over lifetime of project. Code writing is minimal. You spend most of your time on marketing and support, at least you should. This being said, 10,000 lines of code can do a lot, but as you approach this number the structure could get harder to extend and add features/fixes.

Legacy

Some people look down on this piece of language. I love legacy codebases, they often make me the most money. They are easier to maintain and keep in production*. Rewriting in the latest trend is very uninteresting to me. Normally legacy projects have hit that wall already they are on rewrite number 4. (the best big number release of any software?)

Lines of code (LoC) may be the only true measurement of code I can reason about… It is arbitrary and subjective but it is a metric. This magic number of 10k LoC is somewhat historic and perhaps it should be adjusted based on language and style of application. It takes 2 lines to declare and set a variable in some variants of BASIC for instance. This is a 2x in line count for really nothing.

What happens when you are in the ballpark of this many lines of code? You start to have a mess! Unfortunately the garden must be pruned and rearchitected and this gets harder and harder as you approach this level of code mass.

It’s a UNIX system! I know this
Jurassic Park

UNIX

Famously the original Unix source from Ritchie and Thompson cloc’s in at about 10k LoC. They had some assembly in there as well. I’d think if you can fit an entire operating system into that small of a codebase surely we can do more with less these days.

Codebases are divided into two sections: Luggage & Baggage. Luggage is what you need and baggage is because of what the state of the world you are building upon..

(OOPs) I Made a Class

I learned to program in the 1990s when terms like Object Oriented Programming (OOP) and Rapid Application Development (RAD) were where people thought the future was headed. The exact way I “learned” C++ was through a yellow “learn c++ in 24 hours”

I guess I got the title wrong, it was yellow for sure:

I quit programming after about 3-4 months with this book. I remember they taught OOP through making terminal GUIs for DOS. I went back to Photoshop4 and decided I wasn’t smart enough.

Perhaps teaching OOP to a BASIC programmer with mid 90s C++ is what ruined me.

Main Topic

Leaving behind the baggage, let’s discuss what Object Oriented Programming is:

Objects are like people. They’re living, breathing things that have knowledge inside them about how to do things and have memory inside them so they can remember things. And rather than interacting with them at a very low level, you interact with them at a very high level of abstraction, like we’re doing right here
Steve Jobs ’94 Rolling Stone

I tell you what, that sounds great, but also terrifying. I think his quote actually shows a deep understanding of what Objects or “Classes” are in and the consequences in OOP.

Functions

Lets look at my Ray Tracer Challenge starting point for primitives:

import math

EPSILON = 1e-6 # [0-1] - grid size or fidelity (smaller is, is smaller)

tuple4 = tuple[float,float,float,float] #vector t[3]=0, point t[3]=1

# this can be replaced with math.isclose()
def float_is_equal(a:float, b:float, eps=EPSILON)->bool:
    return math.fabs(a-b) < eps
    
def tuple_is_equal(a:tuple4, b:tuple4, eps=EPSILON)->bool:
    return \
        float_is_equal(a[0], b[0], eps) and \
        float_is_equal(a[1], b[1], eps) and \
        float_is_equal(a[2], b[2], eps) and \
        float_is_equal(a[3], b[3], eps)

def tuple_neg(a:tuple4)->tuple4:
    return (-a[0], -a[1], -a[2], -a[3])

def tuple_add(a:tuple4,b:tuple4)->tuple4:
    return (a[0]+b[0], a[1]+b[1], a[2]+b[2], a[3]+b[3])

def tuple_sub(a:tuple4,b:tuple4)->tuple4:
    return (a[0]-b[0], a[1]-b[1], a[2]-b[2], a[3]-b[3])

def tuple_is_point(tuple:tuple4)->bool:
    return int(tuple[-1]) == 1 

def get_vector(x:float,y:float,z:float)->tuple4:
    return (x,y,z,0)

def get_point(x:float,y:float,z:float)->tuple4:
    return (x,y,z,1)

Ok lets try and use this:

Its not bad, but when we start adding more features to tuple it may become harder for a reasonable person to clearly see how to use this Application Programming Interface (API) for your tuple4 type.

Towards an Object

As you can see when you start down this path, the implementation can get annoying to program against. You could, with a small penalty performance wise*, wrap this functional interface in an object.

Lets motivate this a bit by looking at this code here:

import prim1 as pr
import canvas_io as cvi

ball_pos = pr.get_point(1,1,0)
#ball_vel = pr.tuple_norm(pr.tuple_mul_scale(pr.get_vector(1,1,0), 1.25))
ball_vel = pr.get_vector(15,15,0)

env_gravity = pr.get_vector(0,-1,0)
#env_wind = pr.get_vector(-0.01,0,0)
env_wind = pr.get_vector(0,0,0)

def tick(ball_pos, ball_vel, env_gravity, env_wind):
    return (pr.tuple_add(ball_pos,ball_vel), pr.tuple_add(pr.tuple_add(ball_vel,env_gravity),env_wind))


def sim(ball_pos, ball_vel, max_iters=1e3):
    pos_t = []
    for t in range(int(max_iters)):
        ball_pos, ball_vel = tick(ball_pos,ball_vel,env_gravity, env_wind)
        pos_t.append(ball_pos)
        if (ball_pos[1] <= 0): 
            print('ball hit ground')
            break

    return pos_t


pos = sim(ball_pos,ball_vel)
canvas = cvi.get_canvas(900,550)
for p in pos:
    x = int(p[0])
    y = int(550-p[1]) #swap y axis
    #y = int(p[1])
    cvi.canvas_write_pixel(canvas, x, y, (1,2,3))

#for i in range(550):
#    cvi.canvas_write_pixel(canvas, 10, i, (0,3,0))

#for i in range(900):
#    cvi.canvas_write_pixel(canvas, i, 10, (0,0,3))

#for i in range(900):
    #cvi.canvas_write_pixel(canvas, i, i, (3,3,3))

cvi.canvas_write_pixel(canvas, 10, 10, (3,0,0))
cvi.canvas_write_pixel(canvas, 899, 539, (3,0,0))

print("writing to disk")
import time
tic = time.perf_counter()
# cvi.canvas_to_ppm(canvas, "ball2.ppm")
toc = time.perf_counter()
#print("done: " + str(toc-tic) + " seconds")

import matplotlib.pyplot as plt
plt.imshow(canvas.swapaxes(0,1))
plt.show()

Python Class

I have something to admit. I looked up how to do everything you are about to see. Take it with grain of salt, and if you have feedback please check form at bottom of post (click title to break out of infinite scroll 🙂

class Tuple4:
    def __init__(self, x:float, y:float, z:float, p:float):
        self.x = x
        self.y = y
        self.z = z
        self.p = p

    def from_tuple4(t4:tuple4):
        return Tuple4(t4[0],t4[1],t4[2],t4[3])
        
    def get_point(x:float, y:float, z:float):
        return Tuple4(x,y,z,1)

    def get_vector(x:float, y:float, z:float):
        return Tuple4(x,y,z,0)
    
    def __repr__(self): #this will give us something nice in jupyter notebook when we eval in place
        return "(" + str(self.x) + ", " + str(self.y) + ", " + str(self.z) + ", " + str(self.p) + ")"
        
    def __eq__(self:Tuple4, other):
        return tuple_is_equal((self.x, self.y, self.z, self.p), (other.x, other.y, other.z, other.p))

    def __add__(self:Tuple4, other):
        if (isinstance(other, Tuple4)):
            return tuple_add((self.x, self.y, self.z, self.p), (other.x, other.y, other.z, other.p))

New Usage Pattern

Honestly, we need a part ][ of this post to continue 😉

I’m object oriented out. I ❤ Functions. Teaser for next post (objects are entire programs onto themselves, this can be troublesome)

Jupyter Notebook

prim_tuple4_ipynb_rebame_ext_is_a_ipython.txt Download

sorry about that… some bugs with ipython forbidden. Main github project: https://github.com/robmurrer/pyrtc

Why do a Masters Degree?

It is honestly a question I ask myself often as I’m approaching the last 2 semesters of my own journey.

Maybe take a step back and think about your educational arc.

Why do a Bachelors Degree?

So you can get a good job and make some money! Most engineering jobs at large companies and governments require it. I made 60k when I got out of college. My highest paying job before that was 60+ hour weeks running a kitchen for 40k. Your mileage may vary and I was hired in the Gold Rush of tech. I feel for you people trying to get jobs now.

Why do engineering?

If you want to know how things are built, study engineering. It will force you to learn all the math you care to ever learn. Then you will be forced to apply it. This is the critical stage. Going back to your fundamentals and organize your thoughts. Hammock Time. Optimize for your particular problem, solving for a trade-off. It is how shit gets done. Da Vinci was an engineer as well as an artist. Who doesn’t want to be Da Vinci?

Wait I have to take 5 years of Math?

Yes. Next question. It’s why you should be homeschooled. You could get to calc def by 8th grade if you had private tutor and no schedule from kindergarten.

Wait it takes 5 years for me to get Bachelors Degree?

Yes. If you start in Intermediate Algebra like I did. It was worth it to take them semester after semester. Algebra, Trig, Pre-Calc, Calculus I-III, Differential Equations, Linear Algebra, Discrete Math, Computer Science 2 is essentially a math class as well as all your circuit classes, statics, dynamics…

Ok I am done with this format I think. Back to original question. Why Masters? Because as you get to the end of bachelors that is when all the interesting stuff happens. You finally got through the math to unlock the classes that you find interesting

Programming Language Design
Artificial Intelligence
Robotics
Databases
3D Graphics
Advanced Data Structures
Parallel Computing
…

By this time you are already quite burned out and you just need to get done with your 5 year tour and get a job and start putting money in the bank and pay off the student debts. But let’s talk about the content of these classes that are “new” in the sense that calculus and the laws of motion are hundreds of years old. We are talking things invented or commercialized in the last 50ish years.

The content of these classes that are “soft” on science I would say and more on applied engineering. They can be a mixed bag. It often feels wrong while you are in the class if you are in the know.

Let’s talk about AI at the University of Central Florida in the early 2010s. Dr. Gomez taught the class. It was cool class. But he basically taught us LISP for first month of class. He was Natural Language Processing (NLP) guy that adored Minsky and had professional acquiantance with him as far as he said.

But to ignore neural networks in an artificial intelligence class in 2014 is kind of bonkers. To use LISP. I am glad I learned it, but this class was stuck right there in 1980. NLP was moving quickly to statistical methods rather than formal grammars. But we learned CDR and how to really run code in our heads with our Mind Compiler for LISP. Oatmeal and Fingernail Clippings (())()())

<clipped>

Overall, while Minsky’s NLP techniques were innovative and influential in their time, they have largely been superseded by modern statistical and machine learning approaches. However, some of the ideas behind dynamic predictive analysis are still used in some NLP systems today, and the focus on parallelism and efficiency remains relevant in the context of modern computing architectures.
Mixtral24b

Minsky famously shit all over the perceptron and killed neural network public opinion for 20 years cause he said it couldn’t learn XOR. He used one neuron. Da Vinci Garfield doesn’t like this.

Ok so why go get masters?

I wanted to finally take all the courses I missed in undergrad.

Machine Learning (was lame unfortunately, I’m replacing it with Andrew Ng’s coursera series 2.5/5 done)
Digital Signal Processing (recommend)
Aerial Robotics
Arm Robotics
Very Large Scale Integration VSLI (bucket list!)
Power Electronics
Advance Applied Differential Equations and Friends (hardest math class yet, Matlab is star)

Never stop learning. Most of the learning is done in addition to classroom. It is a framework that you build out with your individual efforts.

There is also something to be said about networking and meeting new people and hearing new ideas. The University is a special place.

Sea Change

I’ve been promised Virtual Reality and Artificial Intelligence (AI) all my life. I bought the Power Glove. It was all mostly just hype and small toys that never stuck. But current iterations?

What we are seeing now with AI with regards to Large Language Models (LLMs such as GPT) and Stable Diffusion (Image Generation) is nothing short of a change in how we use computers. Models, weights, and LoRas? are now the “Programs” we run.

I’ve spent last month with these products like InvokeAI and Ollama, they are wonderful, but they aren’t even close to where we will be in 2 years for a consumer. But I can’t help but think of the giant foundational models trained on the entire human corpus being compressed into little tiny chips that can be queried anywhere as some type of “Holographic” computing.

I can understand why that one google engineer freaked out talking to internal chatbot.

“I think, therefore I am” is a famous philosophical statement made by René Descartes in his Meditations on First Philosophy. The statement is often translated from the original French as “Cogito, ergo sum,” and it is meant to express the idea that the very act of doubting one’s existence serves as proof of one’s own consciousness.

In other words, if you are able to think about whether or not you exist, then you must exist in some form in order to have that thought. This statement has been interpreted and debated by philosophers for centuries, but at its core, it is a powerful reminder of the connection between thinking and being.
Mistral24b

Philosophy aside and lets talk Hardware

We are in the “Mainframe” era of AI. These models literally take up rooms (rows of racks in a datacenter) and it seems its One Big GPU per User at a time. I’m trying to fathom what it takes to support a Copilot or ChatGPT at Microsoft and oAI’s scale.

As you see now the problem with size is combination of power and memory bandwidth, but these will be solved with tricks like they always are. INT8 quantization is so hilariously simple optimization that I’m quite shocked it works so well.

Overview of quantization that I will spend the rest of the month fact checking…

Compressing the weights of neural networks from floating-point to int8 (also known as quantization) can significantly reduce the computational requirements and memory footprint of deep learning models, making them more suitable for deployment on low-power hardware such as mobile devices or embedded systems.

The general idea behind weight compression is to represent the weights of a neural network using fewer bits than their original floating-point representation. By reducing the precision of the weights, it is possible to significantly reduce the storage requirements and computational complexity of the model. In particular, int8 quantization involves representing the weights as 8-bit integers rather than 32-bit or 64-bit floating-point numbers.

There are several steps involved in compressing neural network weights from floating-point to int8:

Weight normalization: This step involves scaling the weights of the neural network so that they fit within the range of an 8-bit integer. This is typically done by computing the maximum absolute value of the weights and then scaling them accordingly.

Quantization: This step involves converting the normalized weights from floating-point to int8 format. There are several methods for doing this, including linear quantization and logarithmic quantization.

Decomposition: In some cases, it may be possible to decompose the weight matrix of a neural network into smaller matrices that can be more easily compressed. This is known as low-rank decomposition or tensor decomposition.

Pruning: This step involves removing unnecessary connections between neurons in the neural network, further reducing the computational requirements and memory footprint of the model.

Codebook generation: In some cases, it may be possible to generate a codebook that maps int8 values back to their original floating-point values. This can help improve the accuracy of the compressed model.

Model fine-tuning: After compressing the weights of a neural network, it is typically necessary to fine-tune the model to ensure that its performance is not significantly degraded. This can be done using techniques such as knowledge distillation or iterative quantization.

Overall, compressing neural network weights from floating-point to int8 format can greatly improve the performance and energy efficiency of deep learning models on low-power hardware. However, it requires careful consideration of factors such as weight normalization, quantization, decomposition, pruning, codebook generation, and model fine-tuning in order to ensure that the compressed model retains its accuracy and performance.
Mistral24b

Google sells the Coral TPU system that is a usb accelerator that is capable of few TeraFlops (TOPS – Trillion Operations per Second). Great for old school CNN style networks but pretty much useless for the current generation of AI models of Transformers and giant Video RAM (VRAM 24GB+)

I’m awaiting the LLM/StableDiffusion version of the Coral TPU or Jetson Nano (NVIDIA)

Make sure you board is waxed, this is going to be a giant wave of VR and AI coming in this next 3 years.

The Homelab

So here we are in 2024 and I’m quite deep into my 4th PC build in 2 years. This one is the first one for myself. Its one of those things that I used to do every 3-5 years, but… Custom building PCs are not really useful in the world of portable computing. I have a rack now for running various equipment.

But I’m kind of building a bit of what some are calling a “boondoggle”

The Build

Spared no expense, we have the top Intel i9 CPU with an NVIDIA RTX4090 GPU running in a rack mount case. Stable diffusion running sub 10s for most prompts/models. On my buddies 4080 same prompt/model the 4090 is 30% faster. Not bad.
Noice

Operating Systems

Started out with the idea I’d run the latest Windows Server, problem here is intel is really terrible at providing NIC drivers for this OS when it isn’t “server” gear 😦 – On to install Kubuntu.

Sticking with Kubuntu for now, as it is working great and it is what I have on the PowerEdge R720. Which is a old Xeon machine I got off eBay (later)

Software Stack

Intel provides a version of python 3.9.x so that is what I’m basing my virtual environments on for local training and inference inside Jupiter notebooks. But for off-the-shelf most of the inference software suites for Stable Diffusion (Image Generation) and Large Language Models they ship with their own docker image etc.

InvokeAI

This is a great tool for local and they offer a cloud version of running Stable Diffusion or image generation. I have this running and it is quite an interesting way to explore the models.

Models

Models comes in a few flavors and are all based off of some “foundational?” model that was trained on large large datasets. Everything they generate is probably copyright grey area…

RealVisXL_V3.0
juggernautXL_v8RunDiffusion

These are some of the models I’ve tried out. Exited to expand into other areas of image generation but this gets us started.

Ollama

So Facebook released the original Llama model and that is kind of the standard “open” model for Large Language Models (LLMs) and Ollama Web UI provides a nice interface to it and other models.

LLama2 Runs quite well, but I am interested in largest model I can run.

I am running Mixtral24B and asking it a question based on a fictional timeline I also had it generate and I had printed to a PDF. Then I attach PDF to a new context and ask a question based on it. Quite impressive!

If you notice in the above I’m not using the wick1 system which is the RTX4090 on the Ollama Mixtral screenshot. I am using my main RTX4090 system for image generation and this other system for LLM as the performance is great even with 8 year old GPUs.

Old GPUs vs New GPUs

So as I was thinking about building a machine learning rig for the homelab, I really wanted to have a standard server for running some local development tasks and other stuff. I hit up eBay and got a R720 Dell Poweredge for $400. Not a bad system with quite the specs:

Ok this system ran fine, but what about adding some GPUs?

Look back in time to 2016 and you have the TESLA GPUs with 16-24GB RAM for less than $200. These need special cable but again easy these days.

The most shocking thing is that both GPUs work in parallel with Ollama. I think we have a 1k LLM machine!

What’s Next?

I have a chatbot in the works and trying to figure out how to pipeline and use the 3 GPUs. I have many questions about keeping models in cache and then fully integrating into a NSFW filter that is almost a requirement… Stay tuned and Happy Inferring.

The Tool Maker’s Dilemma

Programmers love their tools. It can be a fetish. How many tools is too many tools between your software and your customer?

Sometimes having more tools in your toolbox is a hinderance not an asset. Organizational breakdowns. Lost tools. Normally there are a few key tools you always have close by if needed.

As engineers or programmers it is often our bench software: Matlab, python, various editors and programming environments. Everyone loves good tools.

Editors and IDEs

Typing is so important. The speed of thought into your editor, the interface between man and machine. Keyboard shortcuts, specialized editors, beyond qwerty… they all tie you to a configuration. It is not a deal-breaker, but its important to reduce the friction between you and your data entry point. But the more you configure the harder it is to work on another’s system.

Do I have to mention emacs and vim? Yes. These ancient editors require months of training to use effectively and perhaps years to master.

VIM and EMACS fighting in a ring in a hellscape outside in a post-apocalyptic setting with a mortal kombat style

1970s computing was defined by big iron machines or mainframes. Big servers, dumb clients. PDP-10’s main storage was tape machines and TECO was the editor of choice. A single page editor. This was great for editing LISP programs and eventually an Editor of Macros (Emacs) was developed mid-70s. Somehow it is still the preferred editor of many programmers.

On the other end of the editor spectrum was “line editors” which were more attuned to working on teletypes. Teletypes are typewriters with serial links. Could it be over a phone line? Vi is a visual line editor created shortly after Emacs. Its modern version was rebooted in the 1980s as Vim. Again, the editor is still used at large (in 2024) and there is a large key-binding extension/plugin for most Integrated Development Environments (IDEs).

Is it worth it to learn emacs or vim? Yes, but that is my opinion and I wouldn’t suggest learning to ship a product and learn vim at same time. It is a lifestyle change that seems less and less important these days. It’s a fun distraction from real work.

Programming outside of a UNIX environment is largely done in a IDE. Visual Studio being the de-facto standard for C++ development. As opposed to makefiles and shell scripts, the idea of an IDE is everything is in one spot. Editor, Compiler, Debugger. Very convenient, you can setup IDE like features in emacs and vim, you could say emacs is a IDE for LISP. Borland IDEs have a special place in my heart as this is where it all started for me, outside of QBasic of course.

The point I’m trying to make is, you can approximate one with the other. Your customer doesn’t care which IDE or text editor you use. The language, framework, etc is a builder’s artifact. A tool mark.

Custom Keybindings & Hardware

If you listen intently you can hear the crackle of mechanical keyboards chording and creating arpeggiations of clicks echoing. Mechanical Keyboards are nice, but I think if you wanted to become inept on the other peoples machines. Just learn DVORAK or COLMAN instead of QWERTY. Still awaiting learning how to use my ErgoDox.

The command interface or how you move around the file are what I would say splits emacs and vim editors. Modal (vim) vs normal. Let’s leave that for another post. This is all to say the mouse is not the most important tool of a programmer when writing code. Your hands re on your keyboard.

Some people (ahem) have even added a foot pedal for shift or other keyboard strokes.

Vendor Software

On a unix you are almost guaranteed to have a C compiler. I think that is POSIX standard. Your distribution will provide it. Microsoft provides the Visual Studio. One you thing you can do is make these IDEs emulate emacs or vim. I do this for vim in both Visual Studio (Proper) and VS Code. Its pretty close and I get some breakpoints for free. On Linux I’ve heard good things about Qt Creator as an IDE.

You may have other vendors outside of your main development tool kit provided by vendor 1. Every piece of software you add beyond this point is normally business related. Libraries are the most important consideration when writing software in a reasonable amount of time. I would vet your libraries thoroughly and “vendor them in” to your main source control.

Then there is the subscription as a service (SaaS). This means you normally don’t host it yourself. Github is a perfect example of a legit reason to include a SaaS product into your development pipeline. But the real thing here is with a remote service, you run risk to disruptions you have no control over. Sometimes the website goes bankrupt. I would avoid these in your flows if possible.

Of course there is our own customization of your tools on top of this. Configurations, Plugins, Extensions, our own shell scripts. Often it’s the “dotfiles”. These build up over the years.

3 Levels

It is important for you to customize your development environment so that you can build and ship your product to customer. Often these setups can be cumbersome and fragile, but that is ok, the product doesn’t suffer from this directly. Now developer tool fatigue is a real thing. Developer ergonomics are important, but taken to the extreme, turns into a naval gazing exercise.

Rolling Stock – Minimum settings changed, As it was intended
Custom Setup – Tinkering deeper, voiding warranty
Fired or Getting Paid – How many apps/scripts are between you and your product or customer?

I like to roll stock when I can. Just because you CAN customize and extend, doesn’t mean you should. Is it worth it to add this new complication to my codebase?

Fired or Getting Paid? — I think this is an important question, will you create an unmaintainable mess while getting paid and then get fired because the business cannot support the liability you have created, or will you keep it running as best you can and improve the system reliability slowly over time?

endcap

I may or may have not written a kernel extension for windows that maps caps lock to escape. The autohotkey hack wasn’t fast enough. I love the idea of “jig’s” I love making tools, but sometimes it gets me into deep trouble…