Why do a Masters Degree?

It is honestly a question I ask myself often as I’m approaching the last 2 semesters of my own journey.

Maybe take a step back and think about your educational arc.

Why do a Bachelors Degree?

So you can get a good job and make some money! Most engineering jobs at large companies and governments require it. I made 60k when I got out of college. My highest paying job before that was 60+ hour weeks running a kitchen for 40k. Your mileage may vary and I was hired in the Gold Rush of tech. I feel for you people trying to get jobs now.

Why do engineering?

If you want to know how things are built, study engineering. It will force you to learn all the math you care to ever learn. Then you will be forced to apply it. This is the critical stage. Going back to your fundamentals and organize your thoughts. Hammock Time. Optimize for your particular problem, solving for a trade-off. It is how shit gets done. Da Vinci was an engineer as well as an artist. Who doesn’t want to be Da Vinci?

Wait I have to take 5 years of Math?

Yes. Next question. It’s why you should be homeschooled. You could get to calc def by 8th grade if you had private tutor and no schedule from kindergarten.

Wait it takes 5 years for me to get Bachelors Degree?

Yes. If you start in Intermediate Algebra like I did. It was worth it to take them semester after semester. Algebra, Trig, Pre-Calc, Calculus I-III, Differential Equations, Linear Algebra, Discrete Math, Computer Science 2 is essentially a math class as well as all your circuit classes, statics, dynamics…

Ok I am done with this format I think. Back to original question. Why Masters? Because as you get to the end of bachelors that is when all the interesting stuff happens. You finally got through the math to unlock the classes that you find interesting

Programming Language Design
Artificial Intelligence
Robotics
Databases
3D Graphics
Advanced Data Structures
Parallel Computing
…

By this time you are already quite burned out and you just need to get done with your 5 year tour and get a job and start putting money in the bank and pay off the student debts. But let’s talk about the content of these classes that are “new” in the sense that calculus and the laws of motion are hundreds of years old. We are talking things invented or commercialized in the last 50ish years.

The content of these classes that are “soft” on science I would say and more on applied engineering. They can be a mixed bag. It often feels wrong while you are in the class if you are in the know.

Let’s talk about AI at the University of Central Florida in the early 2010s. Dr. Gomez taught the class. It was cool class. But he basically taught us LISP for first month of class. He was Natural Language Processing (NLP) guy that adored Minsky and had professional acquiantance with him as far as he said.

But to ignore neural networks in an artificial intelligence class in 2014 is kind of bonkers. To use LISP. I am glad I learned it, but this class was stuck right there in 1980. NLP was moving quickly to statistical methods rather than formal grammars. But we learned CDR and how to really run code in our heads with our Mind Compiler for LISP. Oatmeal and Fingernail Clippings (())()())

<clipped>

Overall, while Minsky’s NLP techniques were innovative and influential in their time, they have largely been superseded by modern statistical and machine learning approaches. However, some of the ideas behind dynamic predictive analysis are still used in some NLP systems today, and the focus on parallelism and efficiency remains relevant in the context of modern computing architectures.
Mixtral24b

Minsky famously shit all over the perceptron and killed neural network public opinion for 20 years cause he said it couldn’t learn XOR. He used one neuron. Da Vinci Garfield doesn’t like this.

Ok so why go get masters?

I wanted to finally take all the courses I missed in undergrad.

Machine Learning (was lame unfortunately, I’m replacing it with Andrew Ng’s coursera series 2.5/5 done)
Digital Signal Processing (recommend)
Aerial Robotics
Arm Robotics
Very Large Scale Integration VSLI (bucket list!)
Power Electronics
Advance Applied Differential Equations and Friends (hardest math class yet, Matlab is star)

Never stop learning. Most of the learning is done in addition to classroom. It is a framework that you build out with your individual efforts.

There is also something to be said about networking and meeting new people and hearing new ideas. The University is a special place.

Sea Change

I’ve been promised Virtual Reality and Artificial Intelligence (AI) all my life. I bought the Power Glove. It was all mostly just hype and small toys that never stuck. But current iterations?

What we are seeing now with AI with regards to Large Language Models (LLMs such as GPT) and Stable Diffusion (Image Generation) is nothing short of a change in how we use computers. Models, weights, and LoRas? are now the “Programs” we run.

I’ve spent last month with these products like InvokeAI and Ollama, they are wonderful, but they aren’t even close to where we will be in 2 years for a consumer. But I can’t help but think of the giant foundational models trained on the entire human corpus being compressed into little tiny chips that can be queried anywhere as some type of “Holographic” computing.

I can understand why that one google engineer freaked out talking to internal chatbot.

“I think, therefore I am” is a famous philosophical statement made by René Descartes in his Meditations on First Philosophy. The statement is often translated from the original French as “Cogito, ergo sum,” and it is meant to express the idea that the very act of doubting one’s existence serves as proof of one’s own consciousness.

In other words, if you are able to think about whether or not you exist, then you must exist in some form in order to have that thought. This statement has been interpreted and debated by philosophers for centuries, but at its core, it is a powerful reminder of the connection between thinking and being.
Mistral24b

Philosophy aside and lets talk Hardware

We are in the “Mainframe” era of AI. These models literally take up rooms (rows of racks in a datacenter) and it seems its One Big GPU per User at a time. I’m trying to fathom what it takes to support a Copilot or ChatGPT at Microsoft and oAI’s scale.

As you see now the problem with size is combination of power and memory bandwidth, but these will be solved with tricks like they always are. INT8 quantization is so hilariously simple optimization that I’m quite shocked it works so well.

Overview of quantization that I will spend the rest of the month fact checking…

Compressing the weights of neural networks from floating-point to int8 (also known as quantization) can significantly reduce the computational requirements and memory footprint of deep learning models, making them more suitable for deployment on low-power hardware such as mobile devices or embedded systems.

The general idea behind weight compression is to represent the weights of a neural network using fewer bits than their original floating-point representation. By reducing the precision of the weights, it is possible to significantly reduce the storage requirements and computational complexity of the model. In particular, int8 quantization involves representing the weights as 8-bit integers rather than 32-bit or 64-bit floating-point numbers.

There are several steps involved in compressing neural network weights from floating-point to int8:

Weight normalization: This step involves scaling the weights of the neural network so that they fit within the range of an 8-bit integer. This is typically done by computing the maximum absolute value of the weights and then scaling them accordingly.

Quantization: This step involves converting the normalized weights from floating-point to int8 format. There are several methods for doing this, including linear quantization and logarithmic quantization.

Decomposition: In some cases, it may be possible to decompose the weight matrix of a neural network into smaller matrices that can be more easily compressed. This is known as low-rank decomposition or tensor decomposition.

Pruning: This step involves removing unnecessary connections between neurons in the neural network, further reducing the computational requirements and memory footprint of the model.

Codebook generation: In some cases, it may be possible to generate a codebook that maps int8 values back to their original floating-point values. This can help improve the accuracy of the compressed model.

Model fine-tuning: After compressing the weights of a neural network, it is typically necessary to fine-tune the model to ensure that its performance is not significantly degraded. This can be done using techniques such as knowledge distillation or iterative quantization.

Overall, compressing neural network weights from floating-point to int8 format can greatly improve the performance and energy efficiency of deep learning models on low-power hardware. However, it requires careful consideration of factors such as weight normalization, quantization, decomposition, pruning, codebook generation, and model fine-tuning in order to ensure that the compressed model retains its accuracy and performance.
Mistral24b

Google sells the Coral TPU system that is a usb accelerator that is capable of few TeraFlops (TOPS – Trillion Operations per Second). Great for old school CNN style networks but pretty much useless for the current generation of AI models of Transformers and giant Video RAM (VRAM 24GB+)

I’m awaiting the LLM/StableDiffusion version of the Coral TPU or Jetson Nano (NVIDIA)

Make sure you board is waxed, this is going to be a giant wave of VR and AI coming in this next 3 years.

The Homelab

So here we are in 2024 and I’m quite deep into my 4th PC build in 2 years. This one is the first one for myself. Its one of those things that I used to do every 3-5 years, but… Custom building PCs are not really useful in the world of portable computing. I have a rack now for running various equipment.

But I’m kind of building a bit of what some are calling a “boondoggle”

The Build

Spared no expense, we have the top Intel i9 CPU with an NVIDIA RTX4090 GPU running in a rack mount case. Stable diffusion running sub 10s for most prompts/models. On my buddies 4080 same prompt/model the 4090 is 30% faster. Not bad.
Noice

Operating Systems

Started out with the idea I’d run the latest Windows Server, problem here is intel is really terrible at providing NIC drivers for this OS when it isn’t “server” gear 😦 – On to install Kubuntu.

Sticking with Kubuntu for now, as it is working great and it is what I have on the PowerEdge R720. Which is a old Xeon machine I got off eBay (later)

Software Stack

Intel provides a version of python 3.9.x so that is what I’m basing my virtual environments on for local training and inference inside Jupiter notebooks. But for off-the-shelf most of the inference software suites for Stable Diffusion (Image Generation) and Large Language Models they ship with their own docker image etc.

InvokeAI

This is a great tool for local and they offer a cloud version of running Stable Diffusion or image generation. I have this running and it is quite an interesting way to explore the models.

Models

Models comes in a few flavors and are all based off of some “foundational?” model that was trained on large large datasets. Everything they generate is probably copyright grey area…

RealVisXL_V3.0
juggernautXL_v8RunDiffusion

These are some of the models I’ve tried out. Exited to expand into other areas of image generation but this gets us started.

Ollama

So Facebook released the original Llama model and that is kind of the standard “open” model for Large Language Models (LLMs) and Ollama Web UI provides a nice interface to it and other models.

LLama2 Runs quite well, but I am interested in largest model I can run.

I am running Mixtral24B and asking it a question based on a fictional timeline I also had it generate and I had printed to a PDF. Then I attach PDF to a new context and ask a question based on it. Quite impressive!

If you notice in the above I’m not using the wick1 system which is the RTX4090 on the Ollama Mixtral screenshot. I am using my main RTX4090 system for image generation and this other system for LLM as the performance is great even with 8 year old GPUs.

Old GPUs vs New GPUs

So as I was thinking about building a machine learning rig for the homelab, I really wanted to have a standard server for running some local development tasks and other stuff. I hit up eBay and got a R720 Dell Poweredge for $400. Not a bad system with quite the specs:

Ok this system ran fine, but what about adding some GPUs?

Look back in time to 2016 and you have the TESLA GPUs with 16-24GB RAM for less than $200. These need special cable but again easy these days.

The most shocking thing is that both GPUs work in parallel with Ollama. I think we have a 1k LLM machine!

What’s Next?

I have a chatbot in the works and trying to figure out how to pipeline and use the 3 GPUs. I have many questions about keeping models in cache and then fully integrating into a NSFW filter that is almost a requirement… Stay tuned and Happy Inferring.