When GUIDs Collide

What happens when 2 identifiers or keys collide?  Bad stuff.  Databases originally used integers for their record identifiers.  If you wanted to add a new record you must query or ask database what was the last integer and add one to it.

This is fine in isolation, but what happens if you want to insert a bunch of records into a remote database that you can only connect to once per hour?  Oh and there are multiple users.  You can’t use Integers anymore you must use something like a UUID/GUID.

Globals Unique Identifiers (GUID) or Universally Unique Identifiers provide a “random” 128bit value that looks like the following:

https://sessionlink.herokuapp.com/?id=c42db039-ba38-4ce8-a5c0-c997894bcade

They work great, and once you switch from Integers to GUIDs your database is truly distributed and can be updated on the moon!  They are a bit ugly though.  Also they take a huge amount of storage compared to the 64 bits of an integer.  Data size matters sometimes.  Wait, GUIDs are 128 bits!

But still we have a slight probability of generating or guessing the same ID with the birthday problem.

Screen Shot 2019-10-27 at 6.42.43 AM.png

If you are FAANG then you have to start worrying about stuff like this.  There will be collisions.  What happens if you add another GUID or two, increasing the bits to 512?

So there is still the GUID Soup to consider…  Start putting multiple GUIDs in an URL and it is pretty gross.

Enter CUID, which has some interesting objectives:

Collision-resistant ids optimized for horizontal scaling and binary search lookup performance.

Screen Shot 2019-10-27 at 6.29.57 AM.png

How do they do this?  with a string 25 characters.  Timestamping is critical to its binary search capabilities.  Ordering by timestamp allows the lookup to be logarithmic.  So if you need to find one record in a billion.  It would take about 30 tries to find it.  Sorting is key to speed here.

If you need speed and no creation time privacy CUIDs may be a viable option.  Anonymous systems must never use CUIDs.

Only slightly larger in storage size than a GUID… 25 Characters * 8 Bits/Character = 200 Bits.  Guessing CUIDs I feel would be easier…

Screen Shot 2019-10-27 at 6.50.42 AM.png

I’ll stick with my GUIDs soup for now.

Until Next Time. Happy Merging Distributed Datasets,

Rob

 

P.S. Microsoft trying to brand it, er recapitulate:

Screen Shot 2019-10-27 at 7.12.13 AM.png

Coded into a Corner

Creating software is easy, creating software that is sustainable to maintain and extend is hard.  There are not many mediums in which the consequences of a simple change or design decision can render the complete system useless.  The way in which the software is built is just as critical as to who is building it.

Honey why is the toilet flushing now when I flip the light switch?

Whoa! Cool! I didn’t realize changing the light bulb would do that.

Poorly designed systems are interconnected in ways that are not necessary.  It is often easier to write the prototype in this manner, but products often need to be refactored to use time tested design patterns.

If you ever find yourself scared to change something or can’t figure out how to add the next killer feature.  It is certainly time you reevaluate what you have and don’t be afraid to delete your code.   You may be surprised how much of it is no longer needed.

More code, more bugs!

Recommended Reading

61iOSX4WNiL._SX404_BO1,204,203,200_.jpg

 

Blaming the User

My favorite super-pattern in developing and supporting software.  Always assume the user is wrong.  Remember, you are User Number Zero, so… blame yourself.

Screen Shot 2018-07-08 at 8.51.40 AM.png

Defensive Programming is a technique similar to Defensive Driving. Always assume the user will break your program. Defensive programming will save you time in the long run by reducing support and increasing user satisfaction.

Its aim is to reduce the risk of collision by anticipating dangerous situations, despite adverse conditions or the mistakes of others – Defensive Driving – Wikipedia 2018

We must have a balance of making programs flexible enough to allow the user to use it for intentions unforeseen by the tool-makers themselves, and to ensure they don’t fall off the rails and lose data because they entered a word instead of a number.

Input Validation

  • Never trust Input Text Boxes
    • Always convert numerical values using ParseFloat or equivalent
    • Never  Eval() or Shell() with input string
    • Never use string to build SQL Query without proper escaping. see SQL Injection
    • Always trim whitespace at beginning and end of string
  • Give clear and actionable error messages when providing feedback to user on their incorrect inputs
  • … This post could go on for a while, TBF

 

Further Reading

Demo or Die

The amount of information conveyed with a picture is said to be worth a thousand words. What is the information density of a demo?  Pictures, flow arrows, and hand waiving can only get you so far.  Demo or die, or sometimes… demo and die.

5b6af7b0-efd9-4d94-b7d7-3319270f2662.jpg

Don’t you love it when someone in audience gets smart and tries to move you off script… the movie magic ends real quick.

There is something about the medium in which you can poke and paw a slider or a button and have something update in realtime.  It either works or it crashes and burns.  A working demo is the baseline test of competency, but the only problem is can the demo make it to 1.0?  Shipping software is incredibly hard and the history of PowerPoint 1.0 is facinating.

In the early days they had trouble convincing people of what PowerPoint was and what problem it solved.  Their mockups were impeccable.  The amount of planning and discipline of these 1980s software startups is inspiring.  Sweating Bullets is on the must read list for anybody who studies software development.

Suggested Reading

PDF of Sweating Bullets by Robert Gaskins

Screen Shot 2018-05-15 at 9.13.21 PM.png