Just One More Prompt

Why do drug dealers and software companies both call their customers “users”?

Not all of us have been enamored with the token spewing monster that has taken over the entire discourse around “productivity”. I’ll admit that I’ve been waffling between the idea that “it’s so over” and “we’re so back” around generative artificial intelligence (GENAI). This will be a pseudo-scientific post on my current feelings and that I really do think we are approaching the limits of our current methods.

I am thankful to the HTMX discord community which has pushed back completely on vibe coding. I still think it might have its place, but I think in this post I hopefully will lay out a few reasons why I am bearish on this topic.

Whomever is telling you to not learn to code now… don’t listen to them. I am still putting in my reps daily on https://executeprogram.com! When I first demoed “vibe coding” to my wife and I thought this was the future of coding her words brought me right back to reality. I was mouthing that ‘oh it’s trying to do this’ I’ll need to prompt it to do this instead cause I _KNOW_ better.

“Oh, this is only for people who already know how to program…”

My wife

I think it is important to note that I am a “classically trained” computer engineer. I was fortunate to do my undergraduate degree at a solid school that still taught the C programming language. This was about a decade ago and after spending many years in industry, I’m finishing up a masters degree also in engineering. This being said, my dad taught me how to program in BASIC when I was 11 years old (I’m now 42).

I learned how to program before stack overflow, and I was exposed to all the fads and trends around rapid application development (RAD) and Agile methods. Let’s just say I’m a bit surprised the current fad of vibe coding took over my lizard brain and I was enthralled.

Original meme I shitposted on LinkedIn bastardizing the ikagai

This post is largely a penance around this hype wave I feel I added energy to. Let’s just go ahead and get started why I believe we are hitting the wall. I was waiting for this moment:

OpenAI famously only trains on your inputs/outputs to its system if you don’t pay them. Anthropic (Claude) said it never trained on your chats. I figured this was mostly a temporal thing. They could always revise their terms of service and do whatever they want. They increased their data retention policy also in this announcement to 5 years… How can they safely incorporate my chats into its training corpus!? The fact that I have to opt-out is interesting and I don’t believe everyone who clicks “agree” fully understands what this implies.

Revisiting how LLMs and Agents Work

The large unlock in the modern “Transformer” era was the fact that we could now do sequence to sequence translation for the equivalent of paragraphs of text. This is a very very hard problem and as the size of input increased it seemed impossible to scale with traditional approaches. This all changed with a simple training method of forcing the machine to predict the next word or token with a self-attention mechanism. The fact that it could translate between arbitrary domains was an incredible emergence of “intelligence”.

Slide from my Agents talk

This idea of knowledge compression is great and it shows that something is definitely happening in the “latent space” between your prompt and the AI’s output. Because we are training only on language and predicting the next token or word. Let us just say this is not a thinking or reasoning being. It is a really great parlor trick. In some contexts I believe it is a net positive, but it can be detrimental in others. It is a bit Orwellian to say this is thinking or reasoning. As you can see in the video no matter what people tell you, it is just producing the next token. Perhaps between 2 <thinking></thinking> tags 🙂

Predicting the next token. Transformer Explainer

So back to training on all the user data that Anthropic has been collecting… now we are going to be either doing a knowledge compression of this into the weights and biases or parameters of the model or they will be using it in the fine-tuning on using the chats as annotated datasets doing a reinforcement learning approach like the Chinese published. Or a combination of both. Who knows… Let’s just say this giant copyright infringement mixer is snowballing.

You cannot copyright anything that a large language model outputs if you don’t own the copyright to all its training corpus. It is a derivative work. This might change in the future think about how long things stay out of the public domain now? Who makes laws? Lawyers. Who likes licensing? Lawyers!

Agents as a Savior?

Setting aside the copyright issue. Because of how LLMs simply produce the next token it will happily just do that. We call it hallucinations but let’s be real, it is just doing what it has been trained to do. Produce the most likely next token.

Ok so let’s add another layer in the lasagna. I’m tired of babysitting this machine, let us write a program to do this, that is controlled by… another LLM. See a problem?

My man lost a mustache after being scolded

So the models seem to keep getting better, but you always seem to need a human in the loop. I was demoing something and I was sure to sandbag everything. But it still failed. I actually couldn’t figure it out on the spot. Take a gander at the screenshot below to see why it failed. Granted this is an older model but I feel we are dealing with something that is unsafe at any speed.

Even with a good system prompt the LLM emitted a valid python program that it read instead of the calculated response 😦

I will not be reading the slop, but godspeed

pop punk pelosi

Ok so in a reactive nature we will always be optimizing towards a better solution. But it will not be through anything magical, only feeding back in the failures. They have to be annotated by PEOPLE. This is why they want your chats to train on.

Context Window Non-Linearity

So this idea that it is somehow back to the prompter to add the right context and say the right words. We are right back to this idea of “programming”. The gaming of the metric around context window length… there is a non-linear quality of the results of the LLMs when the context window grows. Just because it says on the tin it has a context length of 32k tokens doesn’t mean it will produce quality output at that length. I’m actually a little bit disappointed in the meta game some of my colleagues are playing. Being stuck trying to make this context engineering machine magic work, instead of just solving the real problem. It is the bias of creating a game engine instead of creating a great game. You get stuck in this meta machine to make more machines and I’m sorry but this has never worked. It is a Ouroboros or a snake eating its own tail.

Driving can only be done safely while not using your cell phone. This “overflows” your context window and makes driving unsafe. Do LLMs produce quality content most of the time? Yes?… until this arbitrary window fills us and starts to get “dumber”

Somehow 50 years of software engineering and security practices are out the window. Running random code on your computer is dangerous. But that is what we are doing now. Its fast, fun, and dangerous. Please put your Agents in a Docker container at least. Ideally in Virtual Machine. I do like Google’s vibe coding approach with Jules.

Perception is Reality

What we feel really matters, but science doesn’t care about our feelings. I always get anecdotes from “users” and they all seem plausible but they are all based around feeling and the output that these people claim are rarely beyond a demos. Where are all the products? Also: Code is a liability, I wouldn’t brag about how large your codebase is. We need to be more grug brained.

I gain enough value from LLMs that I now deliberately consider this when picking a library—I try to stick with libraries with good stability and that are popular enough that many examples of them will have made it into the training data. I like applying the principles of boring technology—innovate on your project’s unique selling points, stick with tried and tested solutions for everything else.

Simon Willison

I feel like greenfield projects and using the LLM or agent as a cookie-cutter template is its sweet spot for value. But there have been a few studies and I think it is interesting the juxtaposition of these 2 studies:

  1. LLM Productivity Study: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
    • “Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.”
    • “This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.”
  2. Adderall Student Study: https://pmc.ncbi.nlm.nih.gov/articles/PMC6165228/
    • “These findings indicate that healthy college students experience substantive increases in emotional and autonomic activation in the period following Adderall consumption.”
    • “In summary, the present pilot study indicates that a moderate dose of Adderall has small to minimal effects on cognitive processes relevant to academic enhancement (i.e., on reading comprehension, fluency, cognitive functioning), in contrast with its significant, large effects on activated positive emotion…”
    • https://www.netflix.com/title/80117831

In both cases the people who have the active part of the experiment rather than the placebo “feel” like it is working… I do feel like the coding study is flawed because of the “warm-up” time it takes to learn how to prompt correctly etc. Also vibe coding excels at greenfield projects. I don’t think anyone can argue that vibe coding demos and MVPs won’t be faster upfront especially if you have little experience with the frameworks and platforms. But what happens when you have to actually depend on the slop?

It is fascinating to read about someone who is doing real science with NLP states what it takes to really “improve” your responses is jaw-dropping. 1000 annotated samples to prove 1% increase. No wonder Anthropic needs all our personal chat logs…

Why are we hitting a wall? Models can’t get any bigger, we can’t really train them and retain the information in the training because of something, something scaling laws:

 As a result, raising their reliability to meet the standards of scientific inquiry is intractable by any reasonable measure. We argue that the very mechanism which fuels much of the learning power of LLMs, namely the ability to generate non-Gaussian output distributions from Gaussian input ones, might well be at the roots of their propensity to produce error pileup, ensuing information catastrophes and degenerative AI behaviour.

https://arxiv.org/abs/2507.19703

DRAFT NOTICE

I am appreciative of your feedback and suggestions for fixing this article.

What was the point of all this? Why is there a limit? If we have to go say the magic words correctly to get the Agent to respond properly AND we also have to double check it. What is the point? Aren’t we just programming again?

Further Reading

https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect

https://blog.langchain.com/context-engineering-for-agents/

https://arxiv.org/abs/2506.02153

Security Tangent

Great breakdown of this new vector of attack. Unescaped user input (PR Title) injection to arbitrary code execution in the build system that lead 10k develops poisoned at the well:

https://www.kaspersky.com/blog/nx-build-s1ngularity-supply-chain-attack/54223/

A YEAR INTO THE HACKATHON `24

We started this project about a year ago. We thought it would be easy. There was an event GOAT CONF ONE where we were doing a hackathon for the GENAi Photo booth. Chuck er… Renan Barbosa commented this was too easy. Boasting we should do the Jetracer from NVIDIA also known as Project Leatherback. It is a really cool concept. An “Ai RC Racer” built around the Jetston Nano. The car hardware was built and assembled by the event. but the Simulation to Reality (Sim2Real) was still being developed. And still is…

We did the event. It was great for what it was, a gathering of like minded individuals talking about cool stuff. But we aren’t done yet and we had to kind of finish a few other things in the mean-time. The GENAI Photobooth ended up with 2-3 subsequent hackathons and many events put on and we also had it at Makerfaire.

An Interactive Generative Ai Photobooth is what?

  • Generate a background image based on request or canned theme response
  • We take a picture and remove the background, without using a green screen
  • We put you with that background and tag it with the event name / frame
  • You get QR code to download it [WONTFIX]

We actually started “vibe” coding this on the side to prepare for next hackathon. I picked this project because I knew it was possible in short amount of time. (ahem) Turns out vibe coding does work and it is only getting better. It is a Jupyter Notebook so it isn’t pretty and it isn’t finished but it works. It is a great teaching project.

We have to haul the 4090 machine round though. 50lbs to ship these Docker containers. It was fun but, now it is over. To get quality pictures you still need professional lighting and generally knowing what you are doing. This technology will exist in near realtime soon on your phone. Back to the workstations with the 4090s.

NVIDIA 4090 RTX MACHINE

Modern Workstations

Between fighting with GPU drivers and out of memory problems, we built a few of these systems as backups because they are hard to source. 4090s have been sold out for over 6 months. 3090s are available as the best bang for your buck right now. All of our development is against NVIDIA GPUs. This is a practical choice as all GPUs in the market are sub-standard in tooling compared to NVIDIA’s for AI training/inference definitely and probably graphics performance.

With its 24GB of RAM and over 10,000 CUDA cores, you can run an LLM or two and maybe do image and video generation flows in a reasonable amount of time. This is also a great system to run IsaacSim and to train robots in a simulator.

At the conference we had to provision cloud instances for everyone as each workstation costs upwards of $4K. We had the computers in the cloud with Infinite Compute‘s donation of credits to rent them by the hour. Having a local workstation is a must when doing IsaacSim work in my experience.

Racer Update May

Moving the lab 3 times has been three too many. But we landed at the college while we finish up our last semester. Plan is still to graduate in fall but hopefully the bulk of the work will be done in the next 60 days. The one year point of when we wanted to be racing. The hardware is all there and the reinforcement-learning training of the models is working. Still trying to figure out vision.

What’s Next?

With the RF Kill Switch now designed and tested we need to fabricate a few more cables and assemble the car “finally”. With this safety feature we will begin with teleoperation again which is just publishing commands over wifi to the car. We never ran the car not on blocks because we didn’t have the kill switch. We hope to start with teleop next week.

Sim2Real

Currently the flow is to use IsaacLab and SKRL a python library to train 2 models for a reinforcement learning policy to be trained. Once a policy is trained it can be inferred in the simulator. In the case of the above video it is simple way-point navigation. The next step is to transfer this model from the workstation to the Jetson Nano.

This is where the fun begins. ONNX, TensorRT. We are still stuck on vision. Stay Tuned!

Microwaving Code

Introduction

New coding tools always face the same reaction: immediate adoption by some, outright dismissal by coding purists. This pattern repeats with every technological leap.

Vibe coding is already here – and soon we’ll just call it coding. Just like we dropped “smart” from smartphones, the distinction between AI-assisted and traditional coding is rapidly disappearing. What matters isn’t the tool but knowing when to use it.

The Microwave Principle

Nobody microwaves a prime rib. A good cook knows when to use a microwave and when to use the oven. Microwaving popcorn makes perfect sense – the result is good, it’s convenient, and alternatives require special equipment. For certain tasks, the microwave is clearly best.

How Industries Work

Industries move toward tools that save time, effort, and money. That’s how economies function. Markets prefer “good enough and affordable” over “perfect but expensive.” Throughout history, craftspeople who adapted to new technologies often thrived, while those who resisted change sometimes found themselves with fewer opportunities.

Better Tools Make You Stronger

Good tools don’t replace skill—they amplify it. A beginner with a microwave can’t fix a complex meal. Automation tools work best when used by people who understand programming.

Code You Can “Microwave”

Before Product-Market Fit: When testing ideas, fast prototyping is essential. Get something working quickly, then refine if it has potential.

UI Work: Interfaces follow patterns that can be generated faster than coding from scratch.

Data Transformation: Moving data between formats is perfect for automation – it’s tedious and follows consistent patterns.

Boilerplate: Project scaffolding and configuration should be automated because they follow templates.

Single-Shot Prompts: Here’s the magic – translating vague ideas into working demos. Taking your boss’s intent and turning it into something real that you can iterate on later.

Code You Shouldn’t “Microwave”

The truth is, we don’t yet know the limits of vibe coding. The boundaries are shifting rapidly.

What seemed impossible for automation yesterday becomes routine today. Perhaps the only constant is that each new capability shifts our attention to the next level of complexity.

For now, use your judgment. What parts of your system need your deepest expertise? Focus your hands-on attention there, and let the microwave handle the rest.

Beyond the Microwave: Programming’s Next Evolution

Right now, we’re stuck in a transition phase. We express intent in human language, which gets translated into human-readable code that compilers then execute. This multi-step translation process introduces inefficiencies and vulnerabilities at each layer.

Imagine instead a different paradigm: What if we operated in a space where we had fundamental building blocks that were mathematically guaranteed to run correctly? Where AI systems understood how to assemble these components without introducing security flaws or runtime errors?

Someone will still need to verify these foundational components, but this verification only needs to happen once. Once library code is verified and trusted, we can build upon that foundation with confidence. The code becomes provably correct by construction.

Human-written programming languages will always exist for experts who need precise control. But to truly break through to the next level of software development, we may need to let machines operate in a different computational space entirely—one optimized for how AI thinks rather than how humans think.

This isn’t about tools becoming better than experts; the experts are the ones who created the tools in the first place. Rather, it’s about designing new computational environments where computers have the advantage—where we write the game for them. In domains like massive data processing and pattern recognition, machines already outperform us. By creating new abstraction layers that leverage these strengths, we can unlock capabilities beyond what traditional programming paradigms allow.

The Smart Approach

The best developers know when to use quick tools and when to craft by hand. They save their expertise for where it adds the most value.

Conclusion

The future belongs to those who use automation strategically. Knowing when to “microwave code” and when to craft it by hand isn’t about laziness—it’s about maximizing impact.

Nobody microwaves a prime rib. But nobody ignores their microwave either.

In the end, the distinction may fade entirely as we move toward computational systems that combine the best of human creativity with machine reliability. The goal isn’t replacing human programmers—it’s augmenting them with tools that handle the tedious and error-prone aspects of coding, freeing them to focus on the truly creative work of solving problems that matter.

https://charlespetzold.com/etc/DoesVisualStudioRotTheMind.html

The Command Pattern [BETA]

You would figure that the namesake of this website would have a post about it. Welcome to the Command Pattern. I first read about it in the famous Gang of Four book [Gamma et al]

Notes

This design pattern is useful to implement scripting and undo/redo behavior.

Updates:

Prerequisites

Explanations & Definitions

If I had to describe it to another programmer or technically adept person, I would say its just functions. Functions all the way down. But the special part is how you handle state and the ability to undo state changes. Or, script it!

Let us let the machine try and explain it:

Imagine you’re playing with a toy robot. Instead of controlling the robot directly, you have a special remote control. Each button on the remote is like a “command” that tells the robot what to do.

When you press a button, the remote doesn’t actually make the robot move. Instead, it sends a message to the robot saying “do this action”. The robot then follows that instruction.

This is kind of like how the Command pattern works in computer programs. Instead of one part of the program directly telling another part what to do, it creates a “command” that can be sent and followed later. This makes it easier to add new commands, undo actions, or even save a list of commands to do later.

– Claude3.5 Sonnet

Implementation Motivation

This is my implementation, there are many like this but this what I would consider an easy to teach and implement solution. You can always add features, ERRRR… complications, later 🙂

State or “World”

State is part of most developer’s lives, but really you can think about this as the data you care about in the program. It could be your air-line reservation, forum post, baby photos. You don’t want your data to be lost. State is your data or world, and often like the real-world the state is hierarchical.

Mutable World or Immutable World?

I’ll skip right past the mutable world. It is useful for large simulations, but that is another topic and I will leave this for another moment. Sometimes the memory overhead actually matters, because you cannot afford a machine with that much RAM. If you have limited hardware, consider a hybrid-mutable-immutable-world.

I consider Immutable World the cleanest approach and most ideal for scripting.

The Immutable World

new_world = get_next_state(world, command) 

So it is an agreement that any time the “World” will be mutated, we make a copy and return new state. We never mutate state.

We will also keep all of our state objects in a stack linked to their commands that have been performed.

all_states = [] # list of all worlds ever

Function Interfaces

Finite State Machines (FSM) are very important for system design. In this scenario we will ensure to call our functions on our state/world objects to maintain consistent design. From this point on we will refer to world as state as it is the preferred term of the author.

It helps to have an actual problem to solve to teach the command pattern. We will start with the humble calculator.

Command Processor

# CRL LAB - COMMAND PATTERN - A
# COPYRIGHT 2024 CRYPTIDE RESEARCH - ALL RIGHTS RESERVED
# LICENSED UNDER GPL3

# generic processor that is slightly tuned for us to teach
class CommandProcessor:
    def __init__(self, config):
        self.config = config
        self.history = [config.get_default_cmd()]
        self._value = config.get_default_value()

    def exec(self, op, a, b=None):
        self._value, cmd_link = self.config.exec(op, a, b)
        self.history.append((self._value, cmd_link))

        return self._value

    def clear(self):
        self.history.clear()
        self.config.clear()

    def undo(self):
        if len(self.history) == 0:
            raise ValueError("No operations to undo")

        undo_value, undo_cmd = self.history.pop()

        new_value, _ = self.history[-1] #look at last one and get value
        self._value = new_value

        #this following approach is half working for side-effect based systems example
        #new_value = self.config.undo(self.history[-1])
        #but what else do we need?

        return self._value


    def value(self):
        return self._value

Note that this processor implementation is just a starting point. The core ingredients. As we develop our application we may dream up new features of the interfaces.

You can note that there is no specific implementation baked into the design. It is meant to operate as a shim or a harness to run specific state/command patterns.

We have a configuration object passed in which contains the domain specific code we will be wrapping in the façade. A history list and helper value_history to aid in debugging.

This is the core function, this executes a command. a, b are arguments to the command. op is the operation to be performed.

History is preserved and we return the result of the execution is returned for easy of use of the API.

Calculator Commando

from enum import Enum

# simple calculator command processor
class Calculator:
    def __init__(self):
        self._value = 0

    # for referencing the payload
    class CALC_ROW(Enum):
        OP = 0
        A = 1
        B = 2
        RESULT = 3

    # public interface
    def exec(self, op, a, b=None):
      print(f"executing {op} {a} {b}")
      coms = self._get_command_map()
      if op not in coms:
        raise ValueError(f"Invalid operation: {op}")
      self._value = coms[op](a, b)

      return (self._value, (op, a, b, self._value))

    def value(self):
        return self._value

    def undo(self, tuple4):
        op, b, a, _ = tuple4 # pull arguments in reverse order!!!

        if a is None:
            a = self._value

        self._value = self._get_undo_map(op)(a, b)
        return (self._value, (op, a, b, self._value))

    def reset(self):
        self._value = self.get_default_value()

    def get_default_value(self):
        return 0
        
    def get_default_cmd(self):
        return (self.get_default_value(), ('+', 0, 0, 0))

    def clear(self):
        self._value = 0

    # private implementation

    def _add(self, a, b):
        if b is None:
            b = self._value
        return a + b

    def _multiply(self, a, b):
        if b is None:
            b = self._value
        return a * b

    def _subtract(self, a, b):
        if b is None:
            b = self._value
        return a - b

    def _divide(self, a, b):
        if b is None:
            b = self._value
        if b == 0:
            raise ValueError("Division by zero is not allowed")
        return a / b

    def _get_command_map(self):
      return {
          '+': self._add,
          '*': self._multiply,
          '-': self._subtract,
          '/': self._divide,
      }

    def _get_undo_map(self, op:str):

          switch = {
              '+': self._subtract,
              '*': self._divide,
              '-': self._add,
              '/': self._multiply
          }
          return switch[op]

This is the domain specific part of the code. It would change depending on the task that the application programmer might need. Example usage of the latest code:

base_calc = Calculator()
x0 = base_calc.exec('+', 1, 2)
print('using implementation itself')
print(x0) # (3, ('+', 1, 2, 3))
print(base_calc.value()) # 3
x1 = base_calc.exec('+', 1) 
print(x1) # (4, ('+', 1, None, 4))
print(base_calc.value()) # 4

At this point you might be wondering why we’ve done all this boiler plate. The final details:

# you can see the (value, (op, a, b, value))
# data structure here as an artifact to help with undo
# clear up and do the real Command Pattern

base_calc.clear()
print(base_calc.value())

print('wrap it in command processor')
calc = CommandProcessor(base_calc)
calc.exec('-', 10, 4)
print(calc.history[-1])
print(calc.value())

calc.exec('+', 1)
print(calc.history[-1])
print(calc.value())

print('undo last calc...')
calc.undo()
print(calc.value())
print('undo last calc...')

calc.undo()
print(calc.value())
0
wrap it in command processor
executing - 10 4
(6, ('-', 10, 4, 6))
6
executing + 1 None
(7, ('+', 1, None, 7))
7
undo last calc...
6
undo last calc...
0

Discussion

With the encoding of the inverse mapping of the operations between add/subtract and divide/multiply we can safely undo any operation but applying it in reverse. Not all commando’s will have this luxury. Think if there is a function that creates a file on disk. The undo of that command should do what?

DELETE THE FILE

Ok. we can end here for this section for the lab.

There will be an on-going discussion, but part 2 of this lab will transition to image manipulation and scripting support.

Thanks for reading!

What Curve Am I Riding?

Me teaching the youth how to wire up lasers in 2014

If you think programming is hard now. You are listening to the wrong gurus. Programming computers has never been easier. I will not justify that statement in a single breathe, and this is kind of a venting post, so please take with a grain of salt.


Exponential Complexity

Where are we in this sigmoid? What is a sigmoid? Lets ask our AI brethren.

A sigmoid is a special type of mathematical function that is often used in machine learning and data analysis. It looks like an “S” shape, with the middle part being steep and the two ends being flat. Sigmoids have a really useful property: they can take any number as input, but they always output a value between 0 and 1.

Here’s an example of a sigmoid function:

f(x) = 1 / (1 + e^-x)

This formula might look scary, but it’s actually pretty simple. The “e” in the formula is just a special number called Euler’s number (it’s about 2.71828). The “-x” inside the parentheses means that we take the opposite of whatever number we plug in for x. So if x is 5, then -x would be -5.

e^-x == 1/e^x

-rob

When we plug a negative number into this formula, we get a number very close to 0. When we plug in a positive number, we get a number very close to 1. And when we plug in zero, we get 0.5. So you can see how the sigmoid function maps any input number to a value between 0 and 1.

-Mixtral22b


Sigmoids & Friends

I ❤ functions. Sigmoids are very cool. I actually used a sigmoid back when I did my Genetic Algorithm Caterpillar thing for undergrad a decade ago. I used a slightly different but same effect.

Creepture circa 2013 – sigmoid activation for network between “muscles”
Excerpt from Creepture paper (appendix)

Ok so why is this useful [late update]

In neural networks in order for a neuron to become excited and “fire” the signals into the neuron through a “synapse” must be added and the fed to a sigmoid function. This will fire a zero or one to the other connected neurons. And then each layer this happens and is forward or backward propagated through the network to the output/input.

In this simple caterpillar example each joint represents a “motor” or “muscle” that is activated by the sigmoid. Zero or One. Move + Stay


So what does this have to do with the exponential complexity topic. Lets talk e^x

Put them into your graphing calculators. play around with sweep.


The Natural Number

e has this special feature. It is often called “Euler’s Number” approximately 2.718…

Deriving it links to compound interest. So you know it’s good. It is irrational meaning you cannot represent it fully with a decimal or fraction. Go take a look at your precalculus books 🙂

(e^x) != (1 / (1 + e^-x))

You can be tricked you are in an exponential function. You might be in a sigmoid 😦

Wolframe output: https://www.wolframalpha.com/input?i=plot+e%5Ex+vs+%281%2F%281%2Be%5E-x%29%29

Developing Now Has Never Been Easier

With the amount of documentation, tutorials, influencers, companies shilling products…. I get it. Your signal to noise ratio is approaching zero. But let me give you some signal in that numerator.


  • Go back to first principles when possible (the math!!!)
  • Avoid new language (versions), frameworks, fads/trends
    • Unless you understand this could be completely a waste of time and you are ok with it. schedule your treks.
  • Remember programming has a fashion sine curve. If you stick around long enough it will become advant-garde again 🙂

Focus on product, use your product, programmers that are NOT project/product driven are doomed to make buttons that nobody clicks. Make something so valuable and ship it that so that if it goes down, PEOPLE CARE.


Appendix

Copy Pasta Machine

“Dangerous Dave in Copyright Infringement”

Tried out GitHub’s Copilot. I was coding my way through the Ray Tracer Challenge. Turn on copilot. Wow. Just tab to complete. Microsoft (M$) shipped this thing? The lawyers let them!?


The Crutch

Of all the dumb things I’ve talked about in interviews, talking about integrated development environments as a crutch, is probably the dumbest. I think the special mix of jet lag, booze, and delays in travel really set me up for anti-success for my Microsoft interview right out of undergrad. It couldnt have been a skillissue.org.

Twasn’t good but here is how it went.

Ready to Move to Washington State

So how do you land a job interview at M$? Well… they come to your campus, if you are going to a college with a decent programming reputation. Once or twice a year they will be there. At the career fair, or just sometimes they setup in the Harris Building at UCF. Then look at your resume and pass you to a very, very, very easy interview on campus.

If you pass, you get free trip to Washington! At this point I bought a copy of cracking the coding interview and pretty much made plans how I’d slice up the company when I was running it. I was surely hired. I’m best programmer I know. Besides Arup maybe.

Hiring Pipeline

Microsoft has one of these impressive hiring pipelines. They send out their jesters into the universities of the world. And send them all back to the mothership. Right away I knew something was wrong, I was overdressed. Though my scrum master look was on point with corduroy jacket. There was at-least a HUNDRED other people interviewing at same time as me. Of course they bussed us all around Redmond. Washington is beautiful.

The Lead Interview

Let’s just skip to the main dude for the coding. I got bussed to a building that was pretty nice and old. It reminds me of a 90s mall. Looks legit though, I could work here.

Meet guy, guy is nice. Indian (not that it matters) — impeccable skills/english/etc a pro. I think he is the lead of his team. Ok, turns out he is running some division of Onenote. Due diligence has not been done, but I do know Onenote, well some of it.

Pleasant meeting start. He is obviously passionate about the tool he creates. We talk multi-player editors and how Onenote was first before google docs in collaborative editing. I point out you can actually see the cursor change in google’s implementation 🙂

Anyways. Leet code challenge.

Order and remove duplicates in a list. Can only pass through list one time, and You cannot allocate any new memory?

Leet Code Challenge 1 – M$ Onenote 2014

Calculating the hash is too expensive (he says), can’t use dictionaries… ugh. I struggle through and think i get a decent whiteboard of flow. But I pretty much try and get more help than the Lead is willing to give me. This is my worst nightmare. Score 5/10 (my scoring) — I feel like there was another gotcha, but I blocked it out.

Coding on Demand

Coding on a whiteboard with no auto-complete, google (at the time), and now no co-pilot. It is a different skill than anything you normally do for the job. Except when you teach people how to code, of course. This is what you are doing when you run code in your head with 2 people, 1 marker, and a whiteboard: The Mind Compiler

This is the opposite of what I do when I try to hire a programmer BTW :*

The Nerd Interview

I get the sense that I failed the first interview, but I have 2 more. He is moving me again to another person on the team. I am in the Onenote enclave i’m realizing.

Ok so I get dropped off with another team member, immigrant from western Europe. Nice older fella. This interview is more personal and we talk about my projects and OpenCV and he is shocked that it has is a C++ API. Dude is living under a rock. But really nice and we both learned something from each other. The coding challenge was tough, but I can’t remember it now. I probably got 9/10 on it.

The Yeoman Interview

Ok so now i get dropped off into an office with no windows. This dude is cookin’ something up in here. Probably building serious shit. He worked in somewhat of a lab. He was closest to my age, although I may have been older than him. The other two were def 10+ years my senior.

He hits me how to see if there is a cycle in a linked list. I wink at him and regurgitate it. We goto lunch. In this lunch I open up about some philosophies.

“I love IDEs but they are crutch” he nods in agreement, but you can see the look in his eyes. I will not fit in here. I can remember it turning quickly. Maybe it was my shit performance in the beginning with the lead. But I only had 3 interviews that day.

I went back to central FL with a story and a fire in my belly.


Back to Copilot

Please understand that I’m not a acc/decc but this technology is not to be taken lightly. Ok so what is Copilot?

GitHub Copilot has multiple offerings for organizations and an offering for individual developers. All the offerings include both code completion and chat assistance. The primary differences between the organization offerings and the individual offering are license management, policy management, and IP indemnity.

Organizations can choose between GitHub Copilot Business and GitHub Copilot Enterprise. GitHub Copilot Business primarily features GitHub Copilot in the coding environment - that is the IDE and CLI. In early-2024, it will also include GitHub Copilot in GitHub Mobile. GitHub Copilot Enterprise includes everything in GitHub Copilot Business and adds an additional layer of customization for organizations as well as GitHub Copilot integrated into GitHub.com as a chat interface to allow developers to converse about their codebase and action buttons throughout the platform. GitHub Copilot Enterprise can index an organization’s codebase for a deeper understanding of the customer’s knowledge for more tailored suggestions and will offer customers access to fine-tuned custom, private models for code completion.

GitHub Copilot Individual is designed for individual developers, freelancers, students, educators, and open source maintainers. The plan includes all the features of GitHub Copilot Business except organizational license management, policy management, and IP indemnity.

I turned it on fully leaded at first. Is this fair use? Hell nah. GPL is viral. You can’t spoon feed off entire codebase like this. This isn’t clean roomed reverse engineering for sure.

Ok so lets try turning on “don’t produce public code” whatever that means.

I was going to make videos, and talk and aand and… just go use it.

Beware.

Admonitions

This copy pasta machine will need to be severely neutered to be useful in a codebase you want to maintain ownership of. The virality of some of the licenses, you will lose control. For instance at the job, we cannot use copilot. Legal will not allow it!

  • copy and pasting code is generally frowned on, you should go through the motions to actually type. think and internalize it. just like you are playing a song on a guitar 🙂
  • remember: most of us old timers learned by copy pasta from a 1980s tech magazine basic.
  • Use the crutch, but walk without it sometimes!
A giant spaghetti monster that is realistic to a degree, not too cartoonish. At the end of the spaghetti arms that he is slinging are 0\'s and 1\'s

Around the Horn

Why do a Masters Degree?

It is honestly a question I ask myself often as I’m approaching the last 2 semesters of my own journey.

Maybe take a step back and think about your educational arc.

Why do a Bachelors Degree?

So you can get a good job and make some money! Most engineering jobs at large companies and governments require it. I made 60k when I got out of college. My highest paying job before that was 60+ hour weeks running a kitchen for 40k. Your mileage may vary and I was hired in the Gold Rush of tech. I feel for you people trying to get jobs now.

Why do engineering?

If you want to know how things are built, study engineering. It will force you to learn all the math you care to ever learn. Then you will be forced to apply it. This is the critical stage. Going back to your fundamentals and organize your thoughts. Hammock Time. Optimize for your particular problem, solving for a trade-off. It is how shit gets done. Da Vinci was an engineer as well as an artist. Who doesn’t want to be Da Vinci?

Wait I have to take 5 years of Math?

Yes. Next question. It’s why you should be homeschooled. You could get to calc def by 8th grade if you had private tutor and no schedule from kindergarten.

Wait it takes 5 years for me to get Bachelors Degree?

Yes. If you start in Intermediate Algebra like I did. It was worth it to take them semester after semester. Algebra, Trig, Pre-Calc, Calculus I-III, Differential Equations, Linear Algebra, Discrete Math, Computer Science 2 is essentially a math class as well as all your circuit classes, statics, dynamics…

Ok I am done with this format I think. Back to original question. Why Masters? Because as you get to the end of bachelors that is when all the interesting stuff happens. You finally got through the math to unlock the classes that you find interesting

  • Programming Language Design
  • Artificial Intelligence
  • Robotics
  • Databases
  • 3D Graphics
  • Advanced Data Structures
  • Parallel Computing

By this time you are already quite burned out and you just need to get done with your 5 year tour and get a job and start putting money in the bank and pay off the student debts. But let’s talk about the content of these classes that are “new” in the sense that calculus and the laws of motion are hundreds of years old. We are talking things invented or commercialized in the last 50ish years.

The content of these classes that are “soft” on science I would say and more on applied engineering. They can be a mixed bag. It often feels wrong while you are in the class if you are in the know.

Let’s talk about AI at the University of Central Florida in the early 2010s. Dr. Gomez taught the class. It was cool class. But he basically taught us LISP for first month of class. He was Natural Language Processing (NLP) guy that adored Minsky and had professional acquiantance with him as far as he said.

But to ignore neural networks in an artificial intelligence class in 2014 is kind of bonkers. To use LISP. I am glad I learned it, but this class was stuck right there in 1980. NLP was moving quickly to statistical methods rather than formal grammars. But we learned CDR and how to really run code in our heads with our Mind Compiler for LISP. Oatmeal and Fingernail Clippings (())()())

Symbolics LISP Machine

<clipped>

Overall, while Minsky’s NLP techniques were innovative and influential in their time, they have largely been superseded by modern statistical and machine learning approaches. However, some of the ideas behind dynamic predictive analysis are still used in some NLP systems today, and the focus on parallelism and efficiency remains relevant in the context of modern computing architectures.

Mixtral24b

Minsky famously shit all over the perceptron and killed neural network public opinion for 20 years cause he said it couldn’t learn XOR. He used one neuron. Da Vinci Garfield doesn’t like this.

Ok so why go get masters?

I wanted to finally take all the courses I missed in undergrad.

  • Machine Learning (was lame unfortunately, I’m replacing it with Andrew Ng’s coursera series 2.5/5 done)
  • Digital Signal Processing (recommend)
  • Aerial Robotics
  • Arm Robotics
  • Very Large Scale Integration VSLI (bucket list!)
  • Power Electronics
  • Advance Applied Differential Equations and Friends (hardest math class yet, Matlab is star)

Never stop learning. Most of the learning is done in addition to classroom. It is a framework that you build out with your individual efforts.

There is also something to be said about networking and meeting new people and hearing new ideas. The University is a special place.

Sea Change

I’ve been promised Virtual Reality and Artificial Intelligence (AI) all my life. I bought the Power Glove. It was all mostly just hype and small toys that never stuck. But current iterations?

What we are seeing now with AI with regards to Large Language Models (LLMs such as GPT) and Stable Diffusion (Image Generation) is nothing short of a change in how we use computers. Models, weights, and LoRas? are now the “Programs” we run.

I’ve spent last month with these products like InvokeAI and Ollama, they are wonderful, but they aren’t even close to where we will be in 2 years for a consumer. But I can’t help but think of the giant foundational models trained on the entire human corpus being compressed into little tiny chips that can be queried anywhere as some type of “Holographic” computing.

I can understand why that one google engineer freaked out talking to internal chatbot.

“I think, therefore I am” is a famous philosophical statement made by René Descartes in his Meditations on First Philosophy. The statement is often translated from the original French as “Cogito, ergo sum,” and it is meant to express the idea that the very act of doubting one’s existence serves as proof of one’s own consciousness.

In other words, if you are able to think about whether or not you exist, then you must exist in some form in order to have that thought. This statement has been interpreted and debated by philosophers for centuries, but at its core, it is a powerful reminder of the connection between thinking and being.

Mistral24b

Philosophy aside and lets talk Hardware

We are in the “Mainframe” era of AI. These models literally take up rooms (rows of racks in a datacenter) and it seems its One Big GPU per User at a time. I’m trying to fathom what it takes to support a Copilot or ChatGPT at Microsoft and oAI’s scale.

As you see now the problem with size is combination of power and memory bandwidth, but these will be solved with tricks like they always are. INT8 quantization is so hilariously simple optimization that I’m quite shocked it works so well.

Overview of quantization that I will spend the rest of the month fact checking…

Compressing the weights of neural networks from floating-point to int8 (also known as quantization) can significantly reduce the computational requirements and memory footprint of deep learning models, making them more suitable for deployment on low-power hardware such as mobile devices or embedded systems.

The general idea behind weight compression is to represent the weights of a neural network using fewer bits than their original floating-point representation. By reducing the precision of the weights, it is possible to significantly reduce the storage requirements and computational complexity of the model. In particular, int8 quantization involves representing the weights as 8-bit integers rather than 32-bit or 64-bit floating-point numbers.

There are several steps involved in compressing neural network weights from floating-point to int8:

  1. Weight normalization: This step involves scaling the weights of the neural network so that they fit within the range of an 8-bit integer. This is typically done by computing the maximum absolute value of the weights and then scaling them accordingly.
  2. Quantization: This step involves converting the normalized weights from floating-point to int8 format. There are several methods for doing this, including linear quantization and logarithmic quantization.
  3. Decomposition: In some cases, it may be possible to decompose the weight matrix of a neural network into smaller matrices that can be more easily compressed. This is known as low-rank decomposition or tensor decomposition.
  4. Pruning: This step involves removing unnecessary connections between neurons in the neural network, further reducing the computational requirements and memory footprint of the model.
  5. Codebook generation: In some cases, it may be possible to generate a codebook that maps int8 values back to their original floating-point values. This can help improve the accuracy of the compressed model.
  6. Model fine-tuning: After compressing the weights of a neural network, it is typically necessary to fine-tune the model to ensure that its performance is not significantly degraded. This can be done using techniques such as knowledge distillation or iterative quantization.

Overall, compressing neural network weights from floating-point to int8 format can greatly improve the performance and energy efficiency of deep learning models on low-power hardware. However, it requires careful consideration of factors such as weight normalization, quantization, decomposition, pruning, codebook generation, and model fine-tuning in order to ensure that the compressed model retains its accuracy and performance.

Mistral24b

Google sells the Coral TPU system that is a usb accelerator that is capable of few TeraFlops (TOPS – Trillion Operations per Second). Great for old school CNN style networks but pretty much useless for the current generation of AI models of Transformers and giant Video RAM (VRAM 24GB+)

I’m awaiting the LLM/StableDiffusion version of the Coral TPU or Jetson Nano (NVIDIA)

Make sure you board is waxed, this is going to be a giant wave of VR and AI coming in this next 3 years.

The Homelab

So here we are in 2024 and I’m quite deep into my 4th PC build in 2 years. This one is the first one for myself. Its one of those things that I used to do every 3-5 years, but… Custom building PCs are not really useful in the world of portable computing. I have a rack now for running various equipment.

But I’m kind of building a bit of what some are calling a “boondoggle”

The Build

Spared no expense, we have the top Intel i9 CPU with an NVIDIA RTX4090 GPU running in a rack mount case. Stable diffusion running sub 10s for most prompts/models. On my buddies 4080 same prompt/model the 4090 is 30% faster. Not bad.

Noice

Operating Systems

Started out with the idea I’d run the latest Windows Server, problem here is intel is really terrible at providing NIC drivers for this OS when it isn’t “server” gear 😦 – On to install Kubuntu.

Sticking with Kubuntu for now, as it is working great and it is what I have on the PowerEdge R720. Which is a old Xeon machine I got off eBay (later)

Software Stack

Intel provides a version of python 3.9.x so that is what I’m basing my virtual environments on for local training and inference inside Jupiter notebooks. But for off-the-shelf most of the inference software suites for Stable Diffusion (Image Generation) and Large Language Models they ship with their own docker image etc.

InvokeAI

This is a great tool for local and they offer a cloud version of running Stable Diffusion or image generation. I have this running and it is quite an interesting way to explore the models.

Models

Models comes in a few flavors and are all based off of some “foundational?” model that was trained on large large datasets. Everything they generate is probably copyright grey area…

  • RealVisXL_V3.0
  • juggernautXL_v8RunDiffusion

These are some of the models I’ve tried out. Exited to expand into other areas of image generation but this gets us started.

Ollama

So Facebook released the original Llama model and that is kind of the standard “open” model for Large Language Models (LLMs) and Ollama Web UI provides a nice interface to it and other models.

LLama2 Runs quite well, but I am interested in largest model I can run.

I am running Mixtral24B and asking it a question based on a fictional timeline I also had it generate and I had printed to a PDF. Then I attach PDF to a new context and ask a question based on it. Quite impressive!

If you notice in the above I’m not using the wick1 system which is the RTX4090 on the Ollama Mixtral screenshot. I am using my main RTX4090 system for image generation and this other system for LLM as the performance is great even with 8 year old GPUs.

Old GPUs vs New GPUs

So as I was thinking about building a machine learning rig for the homelab, I really wanted to have a standard server for running some local development tasks and other stuff. I hit up eBay and got a R720 Dell Poweredge for $400. Not a bad system with quite the specs:

Ok this system ran fine, but what about adding some GPUs?

Look back in time to 2016 and you have the TESLA GPUs with 16-24GB RAM for less than $200. These need special cable but again easy these days.

The most shocking thing is that both GPUs work in parallel with Ollama. I think we have a 1k LLM machine!

What’s Next?

I have a chatbot in the works and trying to figure out how to pipeline and use the 3 GPUs. I have many questions about keeping models in cache and then fully integrating into a NSFW filter that is almost a requirement… Stay tuned and Happy Inferring.