Just One More Prompt

Why do drug dealers and software companies both call their customers “users”?

Not all of us have been enamored with the token spewing monster that has taken over the entire discourse around “productivity”. I’ll admit that I’ve been waffling between the idea that “it’s so over” and “we’re so back” around generative artificial intelligence (GENAI). This will be a pseudo-scientific post on my current feelings and that I really do think we are approaching the limits of our current methods.

I am thankful to the HTMX discord community which has pushed back completely on vibe coding. I still think it might have its place, but I think in this post I hopefully will lay out a few reasons why I am bearish on this topic.

Whomever is telling you to not learn to code now… don’t listen to them. I am still putting in my reps daily on https://executeprogram.com! When I first demoed “vibe coding” to my wife and I thought this was the future of coding her words brought me right back to reality. I was mouthing that ‘oh it’s trying to do this’ I’ll need to prompt it to do this instead cause I _KNOW_ better.

“Oh, this is only for people who already know how to program…”

My wife

I think it is important to note that I am a “classically trained” computer engineer. I was fortunate to do my undergraduate degree at a solid school that still taught the C programming language. This was about a decade ago and after spending many years in industry, I’m finishing up a masters degree also in engineering. This being said, my dad taught me how to program in BASIC when I was 11 years old (I’m now 42).

I learned how to program before stack overflow, and I was exposed to all the fads and trends around rapid application development (RAD) and Agile methods. Let’s just say I’m a bit surprised the current fad of vibe coding took over my lizard brain and I was enthralled.

Original meme I shitposted on LinkedIn bastardizing the ikagai

This post is largely a penance around this hype wave I feel I added energy to. Let’s just go ahead and get started why I believe we are hitting the wall. I was waiting for this moment:

OpenAI famously only trains on your inputs/outputs to its system if you don’t pay them. Anthropic (Claude) said it never trained on your chats. I figured this was mostly a temporal thing. They could always revise their terms of service and do whatever they want. They increased their data retention policy also in this announcement to 5 years… How can they safely incorporate my chats into its training corpus!? The fact that I have to opt-out is interesting and I don’t believe everyone who clicks “agree” fully understands what this implies.

Revisiting how LLMs and Agents Work

The large unlock in the modern “Transformer” era was the fact that we could now do sequence to sequence translation for the equivalent of paragraphs of text. This is a very very hard problem and as the size of input increased it seemed impossible to scale with traditional approaches. This all changed with a simple training method of forcing the machine to predict the next word or token with a self-attention mechanism. The fact that it could translate between arbitrary domains was an incredible emergence of “intelligence”.

Slide from my Agents talk

This idea of knowledge compression is great and it shows that something is definitely happening in the “latent space” between your prompt and the AI’s output. Because we are training only on language and predicting the next token or word. Let us just say this is not a thinking or reasoning being. It is a really great parlor trick. In some contexts I believe it is a net positive, but it can be detrimental in others. It is a bit Orwellian to say this is thinking or reasoning. As you can see in the video no matter what people tell you, it is just producing the next token. Perhaps between 2 <thinking></thinking> tags 🙂

Predicting the next token. Transformer Explainer

So back to training on all the user data that Anthropic has been collecting… now we are going to be either doing a knowledge compression of this into the weights and biases or parameters of the model or they will be using it in the fine-tuning on using the chats as annotated datasets doing a reinforcement learning approach like the Chinese published. Or a combination of both. Who knows… Let’s just say this giant copyright infringement mixer is snowballing.

You cannot copyright anything that a large language model outputs if you don’t own the copyright to all its training corpus. It is a derivative work. This might change in the future think about how long things stay out of the public domain now? Who makes laws? Lawyers. Who likes licensing? Lawyers!

Agents as a Savior?

Setting aside the copyright issue. Because of how LLMs simply produce the next token it will happily just do that. We call it hallucinations but let’s be real, it is just doing what it has been trained to do. Produce the most likely next token.

Ok so let’s add another layer in the lasagna. I’m tired of babysitting this machine, let us write a program to do this, that is controlled by… another LLM. See a problem?

My man lost a mustache after being scolded

So the models seem to keep getting better, but you always seem to need a human in the loop. I was demoing something and I was sure to sandbag everything. But it still failed. I actually couldn’t figure it out on the spot. Take a gander at the screenshot below to see why it failed. Granted this is an older model but I feel we are dealing with something that is unsafe at any speed.

Even with a good system prompt the LLM emitted a valid python program that it read instead of the calculated response 😦

I will not be reading the slop, but godspeed

pop punk pelosi

Ok so in a reactive nature we will always be optimizing towards a better solution. But it will not be through anything magical, only feeding back in the failures. They have to be annotated by PEOPLE. This is why they want your chats to train on.

Context Window Non-Linearity

So this idea that it is somehow back to the prompter to add the right context and say the right words. We are right back to this idea of “programming”. The gaming of the metric around context window length… there is a non-linear quality of the results of the LLMs when the context window grows. Just because it says on the tin it has a context length of 32k tokens doesn’t mean it will produce quality output at that length. I’m actually a little bit disappointed in the meta game some of my colleagues are playing. Being stuck trying to make this context engineering machine magic work, instead of just solving the real problem. It is the bias of creating a game engine instead of creating a great game. You get stuck in this meta machine to make more machines and I’m sorry but this has never worked. It is a Ouroboros or a snake eating its own tail.

Driving can only be done safely while not using your cell phone. This “overflows” your context window and makes driving unsafe. Do LLMs produce quality content most of the time? Yes?… until this arbitrary window fills us and starts to get “dumber”

Somehow 50 years of software engineering and security practices are out the window. Running random code on your computer is dangerous. But that is what we are doing now. Its fast, fun, and dangerous. Please put your Agents in a Docker container at least. Ideally in Virtual Machine. I do like Google’s vibe coding approach with Jules.

Perception is Reality

What we feel really matters, but science doesn’t care about our feelings. I always get anecdotes from “users” and they all seem plausible but they are all based around feeling and the output that these people claim are rarely beyond a demos. Where are all the products? Also: Code is a liability, I wouldn’t brag about how large your codebase is. We need to be more grug brained.

I gain enough value from LLMs that I now deliberately consider this when picking a library—I try to stick with libraries with good stability and that are popular enough that many examples of them will have made it into the training data. I like applying the principles of boring technology—innovate on your project’s unique selling points, stick with tried and tested solutions for everything else.

Simon Willison

I feel like greenfield projects and using the LLM or agent as a cookie-cutter template is its sweet spot for value. But there have been a few studies and I think it is interesting the juxtaposition of these 2 studies:

  1. LLM Productivity Study: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
    • “Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.”
    • “This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.”
  2. Adderall Student Study: https://pmc.ncbi.nlm.nih.gov/articles/PMC6165228/
    • “These findings indicate that healthy college students experience substantive increases in emotional and autonomic activation in the period following Adderall consumption.”
    • “In summary, the present pilot study indicates that a moderate dose of Adderall has small to minimal effects on cognitive processes relevant to academic enhancement (i.e., on reading comprehension, fluency, cognitive functioning), in contrast with its significant, large effects on activated positive emotion…”
    • https://www.netflix.com/title/80117831

In both cases the people who have the active part of the experiment rather than the placebo “feel” like it is working… I do feel like the coding study is flawed because of the “warm-up” time it takes to learn how to prompt correctly etc. Also vibe coding excels at greenfield projects. I don’t think anyone can argue that vibe coding demos and MVPs won’t be faster upfront especially if you have little experience with the frameworks and platforms. But what happens when you have to actually depend on the slop?

It is fascinating to read about someone who is doing real science with NLP states what it takes to really “improve” your responses is jaw-dropping. 1000 annotated samples to prove 1% increase. No wonder Anthropic needs all our personal chat logs…

Why are we hitting a wall? Models can’t get any bigger, we can’t really train them and retain the information in the training because of something, something scaling laws:

 As a result, raising their reliability to meet the standards of scientific inquiry is intractable by any reasonable measure. We argue that the very mechanism which fuels much of the learning power of LLMs, namely the ability to generate non-Gaussian output distributions from Gaussian input ones, might well be at the roots of their propensity to produce error pileup, ensuing information catastrophes and degenerative AI behaviour.

https://arxiv.org/abs/2507.19703

DRAFT NOTICE

I am appreciative of your feedback and suggestions for fixing this article.

What was the point of all this? Why is there a limit? If we have to go say the magic words correctly to get the Agent to respond properly AND we also have to double check it. What is the point? Aren’t we just programming again?

Further Reading

https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect

https://blog.langchain.com/context-engineering-for-agents/

https://arxiv.org/abs/2506.02153

Security Tangent

Great breakdown of this new vector of attack. Unescaped user input (PR Title) injection to arbitrary code execution in the build system that lead 10k develops poisoned at the well:

https://www.kaspersky.com/blog/nx-build-s1ngularity-supply-chain-attack/54223/

Microwaving Code

Introduction

New coding tools always face the same reaction: immediate adoption by some, outright dismissal by coding purists. This pattern repeats with every technological leap.

Vibe coding is already here – and soon we’ll just call it coding. Just like we dropped “smart” from smartphones, the distinction between AI-assisted and traditional coding is rapidly disappearing. What matters isn’t the tool but knowing when to use it.

The Microwave Principle

Nobody microwaves a prime rib. A good cook knows when to use a microwave and when to use the oven. Microwaving popcorn makes perfect sense – the result is good, it’s convenient, and alternatives require special equipment. For certain tasks, the microwave is clearly best.

How Industries Work

Industries move toward tools that save time, effort, and money. That’s how economies function. Markets prefer “good enough and affordable” over “perfect but expensive.” Throughout history, craftspeople who adapted to new technologies often thrived, while those who resisted change sometimes found themselves with fewer opportunities.

Better Tools Make You Stronger

Good tools don’t replace skill—they amplify it. A beginner with a microwave can’t fix a complex meal. Automation tools work best when used by people who understand programming.

Code You Can “Microwave”

Before Product-Market Fit: When testing ideas, fast prototyping is essential. Get something working quickly, then refine if it has potential.

UI Work: Interfaces follow patterns that can be generated faster than coding from scratch.

Data Transformation: Moving data between formats is perfect for automation – it’s tedious and follows consistent patterns.

Boilerplate: Project scaffolding and configuration should be automated because they follow templates.

Single-Shot Prompts: Here’s the magic – translating vague ideas into working demos. Taking your boss’s intent and turning it into something real that you can iterate on later.

Code You Shouldn’t “Microwave”

The truth is, we don’t yet know the limits of vibe coding. The boundaries are shifting rapidly.

What seemed impossible for automation yesterday becomes routine today. Perhaps the only constant is that each new capability shifts our attention to the next level of complexity.

For now, use your judgment. What parts of your system need your deepest expertise? Focus your hands-on attention there, and let the microwave handle the rest.

Beyond the Microwave: Programming’s Next Evolution

Right now, we’re stuck in a transition phase. We express intent in human language, which gets translated into human-readable code that compilers then execute. This multi-step translation process introduces inefficiencies and vulnerabilities at each layer.

Imagine instead a different paradigm: What if we operated in a space where we had fundamental building blocks that were mathematically guaranteed to run correctly? Where AI systems understood how to assemble these components without introducing security flaws or runtime errors?

Someone will still need to verify these foundational components, but this verification only needs to happen once. Once library code is verified and trusted, we can build upon that foundation with confidence. The code becomes provably correct by construction.

Human-written programming languages will always exist for experts who need precise control. But to truly break through to the next level of software development, we may need to let machines operate in a different computational space entirely—one optimized for how AI thinks rather than how humans think.

This isn’t about tools becoming better than experts; the experts are the ones who created the tools in the first place. Rather, it’s about designing new computational environments where computers have the advantage—where we write the game for them. In domains like massive data processing and pattern recognition, machines already outperform us. By creating new abstraction layers that leverage these strengths, we can unlock capabilities beyond what traditional programming paradigms allow.

The Smart Approach

The best developers know when to use quick tools and when to craft by hand. They save their expertise for where it adds the most value.

Conclusion

The future belongs to those who use automation strategically. Knowing when to “microwave code” and when to craft it by hand isn’t about laziness—it’s about maximizing impact.

Nobody microwaves a prime rib. But nobody ignores their microwave either.

In the end, the distinction may fade entirely as we move toward computational systems that combine the best of human creativity with machine reliability. The goal isn’t replacing human programmers—it’s augmenting them with tools that handle the tedious and error-prone aspects of coding, freeing them to focus on the truly creative work of solving problems that matter.

https://charlespetzold.com/etc/DoesVisualStudioRotTheMind.html

The Command Pattern [BETA]

You would figure that the namesake of this website would have a post about it. Welcome to the Command Pattern. I first read about it in the famous Gang of Four book [Gamma et al]

Notes

This design pattern is useful to implement scripting and undo/redo behavior.

Updates:

Prerequisites

Explanations & Definitions

If I had to describe it to another programmer or technically adept person, I would say its just functions. Functions all the way down. But the special part is how you handle state and the ability to undo state changes. Or, script it!

Let us let the machine try and explain it:

Imagine you’re playing with a toy robot. Instead of controlling the robot directly, you have a special remote control. Each button on the remote is like a “command” that tells the robot what to do.

When you press a button, the remote doesn’t actually make the robot move. Instead, it sends a message to the robot saying “do this action”. The robot then follows that instruction.

This is kind of like how the Command pattern works in computer programs. Instead of one part of the program directly telling another part what to do, it creates a “command” that can be sent and followed later. This makes it easier to add new commands, undo actions, or even save a list of commands to do later.

– Claude3.5 Sonnet

Implementation Motivation

This is my implementation, there are many like this but this what I would consider an easy to teach and implement solution. You can always add features, ERRRR… complications, later 🙂

State or “World”

State is part of most developer’s lives, but really you can think about this as the data you care about in the program. It could be your air-line reservation, forum post, baby photos. You don’t want your data to be lost. State is your data or world, and often like the real-world the state is hierarchical.

Mutable World or Immutable World?

I’ll skip right past the mutable world. It is useful for large simulations, but that is another topic and I will leave this for another moment. Sometimes the memory overhead actually matters, because you cannot afford a machine with that much RAM. If you have limited hardware, consider a hybrid-mutable-immutable-world.

I consider Immutable World the cleanest approach and most ideal for scripting.

The Immutable World

new_world = get_next_state(world, command) 

So it is an agreement that any time the “World” will be mutated, we make a copy and return new state. We never mutate state.

We will also keep all of our state objects in a stack linked to their commands that have been performed.

all_states = [] # list of all worlds ever

Function Interfaces

Finite State Machines (FSM) are very important for system design. In this scenario we will ensure to call our functions on our state/world objects to maintain consistent design. From this point on we will refer to world as state as it is the preferred term of the author.

It helps to have an actual problem to solve to teach the command pattern. We will start with the humble calculator.

Command Processor

# CRL LAB - COMMAND PATTERN - A
# COPYRIGHT 2024 CRYPTIDE RESEARCH - ALL RIGHTS RESERVED
# LICENSED UNDER GPL3

# generic processor that is slightly tuned for us to teach
class CommandProcessor:
    def __init__(self, config):
        self.config = config
        self.history = [config.get_default_cmd()]
        self._value = config.get_default_value()

    def exec(self, op, a, b=None):
        self._value, cmd_link = self.config.exec(op, a, b)
        self.history.append((self._value, cmd_link))

        return self._value

    def clear(self):
        self.history.clear()
        self.config.clear()

    def undo(self):
        if len(self.history) == 0:
            raise ValueError("No operations to undo")

        undo_value, undo_cmd = self.history.pop()

        new_value, _ = self.history[-1] #look at last one and get value
        self._value = new_value

        #this following approach is half working for side-effect based systems example
        #new_value = self.config.undo(self.history[-1])
        #but what else do we need?

        return self._value


    def value(self):
        return self._value

Note that this processor implementation is just a starting point. The core ingredients. As we develop our application we may dream up new features of the interfaces.

You can note that there is no specific implementation baked into the design. It is meant to operate as a shim or a harness to run specific state/command patterns.

We have a configuration object passed in which contains the domain specific code we will be wrapping in the façade. A history list and helper value_history to aid in debugging.

This is the core function, this executes a command. a, b are arguments to the command. op is the operation to be performed.

History is preserved and we return the result of the execution is returned for easy of use of the API.

Calculator Commando

from enum import Enum

# simple calculator command processor
class Calculator:
    def __init__(self):
        self._value = 0

    # for referencing the payload
    class CALC_ROW(Enum):
        OP = 0
        A = 1
        B = 2
        RESULT = 3

    # public interface
    def exec(self, op, a, b=None):
      print(f"executing {op} {a} {b}")
      coms = self._get_command_map()
      if op not in coms:
        raise ValueError(f"Invalid operation: {op}")
      self._value = coms[op](a, b)

      return (self._value, (op, a, b, self._value))

    def value(self):
        return self._value

    def undo(self, tuple4):
        op, b, a, _ = tuple4 # pull arguments in reverse order!!!

        if a is None:
            a = self._value

        self._value = self._get_undo_map(op)(a, b)
        return (self._value, (op, a, b, self._value))

    def reset(self):
        self._value = self.get_default_value()

    def get_default_value(self):
        return 0
        
    def get_default_cmd(self):
        return (self.get_default_value(), ('+', 0, 0, 0))

    def clear(self):
        self._value = 0

    # private implementation

    def _add(self, a, b):
        if b is None:
            b = self._value
        return a + b

    def _multiply(self, a, b):
        if b is None:
            b = self._value
        return a * b

    def _subtract(self, a, b):
        if b is None:
            b = self._value
        return a - b

    def _divide(self, a, b):
        if b is None:
            b = self._value
        if b == 0:
            raise ValueError("Division by zero is not allowed")
        return a / b

    def _get_command_map(self):
      return {
          '+': self._add,
          '*': self._multiply,
          '-': self._subtract,
          '/': self._divide,
      }

    def _get_undo_map(self, op:str):

          switch = {
              '+': self._subtract,
              '*': self._divide,
              '-': self._add,
              '/': self._multiply
          }
          return switch[op]

This is the domain specific part of the code. It would change depending on the task that the application programmer might need. Example usage of the latest code:

base_calc = Calculator()
x0 = base_calc.exec('+', 1, 2)
print('using implementation itself')
print(x0) # (3, ('+', 1, 2, 3))
print(base_calc.value()) # 3
x1 = base_calc.exec('+', 1) 
print(x1) # (4, ('+', 1, None, 4))
print(base_calc.value()) # 4

At this point you might be wondering why we’ve done all this boiler plate. The final details:

# you can see the (value, (op, a, b, value))
# data structure here as an artifact to help with undo
# clear up and do the real Command Pattern

base_calc.clear()
print(base_calc.value())

print('wrap it in command processor')
calc = CommandProcessor(base_calc)
calc.exec('-', 10, 4)
print(calc.history[-1])
print(calc.value())

calc.exec('+', 1)
print(calc.history[-1])
print(calc.value())

print('undo last calc...')
calc.undo()
print(calc.value())
print('undo last calc...')

calc.undo()
print(calc.value())
0
wrap it in command processor
executing - 10 4
(6, ('-', 10, 4, 6))
6
executing + 1 None
(7, ('+', 1, None, 7))
7
undo last calc...
6
undo last calc...
0

Discussion

With the encoding of the inverse mapping of the operations between add/subtract and divide/multiply we can safely undo any operation but applying it in reverse. Not all commando’s will have this luxury. Think if there is a function that creates a file on disk. The undo of that command should do what?

DELETE THE FILE

Ok. we can end here for this section for the lab.

There will be an on-going discussion, but part 2 of this lab will transition to image manipulation and scripting support.

Thanks for reading!

What Curve Am I Riding?

Me teaching the youth how to wire up lasers in 2014

If you think programming is hard now. You are listening to the wrong gurus. Programming computers has never been easier. I will not justify that statement in a single breathe, and this is kind of a venting post, so please take with a grain of salt.


Exponential Complexity

Where are we in this sigmoid? What is a sigmoid? Lets ask our AI brethren.

A sigmoid is a special type of mathematical function that is often used in machine learning and data analysis. It looks like an “S” shape, with the middle part being steep and the two ends being flat. Sigmoids have a really useful property: they can take any number as input, but they always output a value between 0 and 1.

Here’s an example of a sigmoid function:

f(x) = 1 / (1 + e^-x)

This formula might look scary, but it’s actually pretty simple. The “e” in the formula is just a special number called Euler’s number (it’s about 2.71828). The “-x” inside the parentheses means that we take the opposite of whatever number we plug in for x. So if x is 5, then -x would be -5.

e^-x == 1/e^x

-rob

When we plug a negative number into this formula, we get a number very close to 0. When we plug in a positive number, we get a number very close to 1. And when we plug in zero, we get 0.5. So you can see how the sigmoid function maps any input number to a value between 0 and 1.

-Mixtral22b


Sigmoids & Friends

I ❤ functions. Sigmoids are very cool. I actually used a sigmoid back when I did my Genetic Algorithm Caterpillar thing for undergrad a decade ago. I used a slightly different but same effect.

Creepture circa 2013 – sigmoid activation for network between “muscles”
Excerpt from Creepture paper (appendix)

Ok so why is this useful [late update]

In neural networks in order for a neuron to become excited and “fire” the signals into the neuron through a “synapse” must be added and the fed to a sigmoid function. This will fire a zero or one to the other connected neurons. And then each layer this happens and is forward or backward propagated through the network to the output/input.

In this simple caterpillar example each joint represents a “motor” or “muscle” that is activated by the sigmoid. Zero or One. Move + Stay


So what does this have to do with the exponential complexity topic. Lets talk e^x

Put them into your graphing calculators. play around with sweep.


The Natural Number

e has this special feature. It is often called “Euler’s Number” approximately 2.718…

Deriving it links to compound interest. So you know it’s good. It is irrational meaning you cannot represent it fully with a decimal or fraction. Go take a look at your precalculus books 🙂

(e^x) != (1 / (1 + e^-x))

You can be tricked you are in an exponential function. You might be in a sigmoid 😦

Wolframe output: https://www.wolframalpha.com/input?i=plot+e%5Ex+vs+%281%2F%281%2Be%5E-x%29%29

Developing Now Has Never Been Easier

With the amount of documentation, tutorials, influencers, companies shilling products…. I get it. Your signal to noise ratio is approaching zero. But let me give you some signal in that numerator.


  • Go back to first principles when possible (the math!!!)
  • Avoid new language (versions), frameworks, fads/trends
    • Unless you understand this could be completely a waste of time and you are ok with it. schedule your treks.
  • Remember programming has a fashion sine curve. If you stick around long enough it will become advant-garde again 🙂

Focus on product, use your product, programmers that are NOT project/product driven are doomed to make buttons that nobody clicks. Make something so valuable and ship it that so that if it goes down, PEOPLE CARE.


Appendix

Why do a Masters Degree?

It is honestly a question I ask myself often as I’m approaching the last 2 semesters of my own journey.

Maybe take a step back and think about your educational arc.

Why do a Bachelors Degree?

So you can get a good job and make some money! Most engineering jobs at large companies and governments require it. I made 60k when I got out of college. My highest paying job before that was 60+ hour weeks running a kitchen for 40k. Your mileage may vary and I was hired in the Gold Rush of tech. I feel for you people trying to get jobs now.

Why do engineering?

If you want to know how things are built, study engineering. It will force you to learn all the math you care to ever learn. Then you will be forced to apply it. This is the critical stage. Going back to your fundamentals and organize your thoughts. Hammock Time. Optimize for your particular problem, solving for a trade-off. It is how shit gets done. Da Vinci was an engineer as well as an artist. Who doesn’t want to be Da Vinci?

Wait I have to take 5 years of Math?

Yes. Next question. It’s why you should be homeschooled. You could get to calc def by 8th grade if you had private tutor and no schedule from kindergarten.

Wait it takes 5 years for me to get Bachelors Degree?

Yes. If you start in Intermediate Algebra like I did. It was worth it to take them semester after semester. Algebra, Trig, Pre-Calc, Calculus I-III, Differential Equations, Linear Algebra, Discrete Math, Computer Science 2 is essentially a math class as well as all your circuit classes, statics, dynamics…

Ok I am done with this format I think. Back to original question. Why Masters? Because as you get to the end of bachelors that is when all the interesting stuff happens. You finally got through the math to unlock the classes that you find interesting

  • Programming Language Design
  • Artificial Intelligence
  • Robotics
  • Databases
  • 3D Graphics
  • Advanced Data Structures
  • Parallel Computing

By this time you are already quite burned out and you just need to get done with your 5 year tour and get a job and start putting money in the bank and pay off the student debts. But let’s talk about the content of these classes that are “new” in the sense that calculus and the laws of motion are hundreds of years old. We are talking things invented or commercialized in the last 50ish years.

The content of these classes that are “soft” on science I would say and more on applied engineering. They can be a mixed bag. It often feels wrong while you are in the class if you are in the know.

Let’s talk about AI at the University of Central Florida in the early 2010s. Dr. Gomez taught the class. It was cool class. But he basically taught us LISP for first month of class. He was Natural Language Processing (NLP) guy that adored Minsky and had professional acquiantance with him as far as he said.

But to ignore neural networks in an artificial intelligence class in 2014 is kind of bonkers. To use LISP. I am glad I learned it, but this class was stuck right there in 1980. NLP was moving quickly to statistical methods rather than formal grammars. But we learned CDR and how to really run code in our heads with our Mind Compiler for LISP. Oatmeal and Fingernail Clippings (())()())

Symbolics LISP Machine

<clipped>

Overall, while Minsky’s NLP techniques were innovative and influential in their time, they have largely been superseded by modern statistical and machine learning approaches. However, some of the ideas behind dynamic predictive analysis are still used in some NLP systems today, and the focus on parallelism and efficiency remains relevant in the context of modern computing architectures.

Mixtral24b

Minsky famously shit all over the perceptron and killed neural network public opinion for 20 years cause he said it couldn’t learn XOR. He used one neuron. Da Vinci Garfield doesn’t like this.

Ok so why go get masters?

I wanted to finally take all the courses I missed in undergrad.

  • Machine Learning (was lame unfortunately, I’m replacing it with Andrew Ng’s coursera series 2.5/5 done)
  • Digital Signal Processing (recommend)
  • Aerial Robotics
  • Arm Robotics
  • Very Large Scale Integration VSLI (bucket list!)
  • Power Electronics
  • Advance Applied Differential Equations and Friends (hardest math class yet, Matlab is star)

Never stop learning. Most of the learning is done in addition to classroom. It is a framework that you build out with your individual efforts.

There is also something to be said about networking and meeting new people and hearing new ideas. The University is a special place.

Sea Change

I’ve been promised Virtual Reality and Artificial Intelligence (AI) all my life. I bought the Power Glove. It was all mostly just hype and small toys that never stuck. But current iterations?

What we are seeing now with AI with regards to Large Language Models (LLMs such as GPT) and Stable Diffusion (Image Generation) is nothing short of a change in how we use computers. Models, weights, and LoRas? are now the “Programs” we run.

I’ve spent last month with these products like InvokeAI and Ollama, they are wonderful, but they aren’t even close to where we will be in 2 years for a consumer. But I can’t help but think of the giant foundational models trained on the entire human corpus being compressed into little tiny chips that can be queried anywhere as some type of “Holographic” computing.

I can understand why that one google engineer freaked out talking to internal chatbot.

“I think, therefore I am” is a famous philosophical statement made by René Descartes in his Meditations on First Philosophy. The statement is often translated from the original French as “Cogito, ergo sum,” and it is meant to express the idea that the very act of doubting one’s existence serves as proof of one’s own consciousness.

In other words, if you are able to think about whether or not you exist, then you must exist in some form in order to have that thought. This statement has been interpreted and debated by philosophers for centuries, but at its core, it is a powerful reminder of the connection between thinking and being.

Mistral24b

Philosophy aside and lets talk Hardware

We are in the “Mainframe” era of AI. These models literally take up rooms (rows of racks in a datacenter) and it seems its One Big GPU per User at a time. I’m trying to fathom what it takes to support a Copilot or ChatGPT at Microsoft and oAI’s scale.

As you see now the problem with size is combination of power and memory bandwidth, but these will be solved with tricks like they always are. INT8 quantization is so hilariously simple optimization that I’m quite shocked it works so well.

Overview of quantization that I will spend the rest of the month fact checking…

Compressing the weights of neural networks from floating-point to int8 (also known as quantization) can significantly reduce the computational requirements and memory footprint of deep learning models, making them more suitable for deployment on low-power hardware such as mobile devices or embedded systems.

The general idea behind weight compression is to represent the weights of a neural network using fewer bits than their original floating-point representation. By reducing the precision of the weights, it is possible to significantly reduce the storage requirements and computational complexity of the model. In particular, int8 quantization involves representing the weights as 8-bit integers rather than 32-bit or 64-bit floating-point numbers.

There are several steps involved in compressing neural network weights from floating-point to int8:

  1. Weight normalization: This step involves scaling the weights of the neural network so that they fit within the range of an 8-bit integer. This is typically done by computing the maximum absolute value of the weights and then scaling them accordingly.
  2. Quantization: This step involves converting the normalized weights from floating-point to int8 format. There are several methods for doing this, including linear quantization and logarithmic quantization.
  3. Decomposition: In some cases, it may be possible to decompose the weight matrix of a neural network into smaller matrices that can be more easily compressed. This is known as low-rank decomposition or tensor decomposition.
  4. Pruning: This step involves removing unnecessary connections between neurons in the neural network, further reducing the computational requirements and memory footprint of the model.
  5. Codebook generation: In some cases, it may be possible to generate a codebook that maps int8 values back to their original floating-point values. This can help improve the accuracy of the compressed model.
  6. Model fine-tuning: After compressing the weights of a neural network, it is typically necessary to fine-tune the model to ensure that its performance is not significantly degraded. This can be done using techniques such as knowledge distillation or iterative quantization.

Overall, compressing neural network weights from floating-point to int8 format can greatly improve the performance and energy efficiency of deep learning models on low-power hardware. However, it requires careful consideration of factors such as weight normalization, quantization, decomposition, pruning, codebook generation, and model fine-tuning in order to ensure that the compressed model retains its accuracy and performance.

Mistral24b

Google sells the Coral TPU system that is a usb accelerator that is capable of few TeraFlops (TOPS – Trillion Operations per Second). Great for old school CNN style networks but pretty much useless for the current generation of AI models of Transformers and giant Video RAM (VRAM 24GB+)

I’m awaiting the LLM/StableDiffusion version of the Coral TPU or Jetson Nano (NVIDIA)

Make sure you board is waxed, this is going to be a giant wave of VR and AI coming in this next 3 years.