When GUIDs Collide

What happens when 2 identifiers or keys collide?  Bad stuff.  Databases originally used integers for their record identifiers.  If you wanted to add a new record you must query or ask database what was the last integer and add one to it.

This is fine in isolation, but what happens if you want to insert a bunch of records into a remote database that you can only connect to once per hour?  Oh and there are multiple users.  You can’t use Integers anymore you must use something like a UUID/GUID.

Globals Unique Identifiers (GUID) or Universally Unique Identifiers provide a “random” 128bit value that looks like the following:

https://sessionlink.herokuapp.com/?id=c42db039-ba38-4ce8-a5c0-c997894bcade

They work great, and once you switch from Integers to GUIDs your database is truly distributed and can be updated on the moon!  They are a bit ugly though.  Also they take a huge amount of storage compared to the 64 bits of an integer.  Data size matters sometimes.  Wait, GUIDs are 128 bits!

But still we have a slight probability of generating or guessing the same ID with the birthday problem.

Screen Shot 2019-10-27 at 6.42.43 AM.png

If you are FAANG then you have to start worrying about stuff like this.  There will be collisions.  What happens if you add another GUID or two, increasing the bits to 512?

So there is still the GUID Soup to consider…  Start putting multiple GUIDs in an URL and it is pretty gross.

Enter CUID, which has some interesting objectives:

Collision-resistant ids optimized for horizontal scaling and binary search lookup performance.

Screen Shot 2019-10-27 at 6.29.57 AM.png

How do they do this?  with a string 25 characters.  Timestamping is critical to its binary search capabilities.  Ordering by timestamp allows the lookup to be logarithmic.  So if you need to find one record in a billion.  It would take about 30 tries to find it.  Sorting is key to speed here.

If you need speed and no creation time privacy CUIDs may be a viable option.  Anonymous systems must never use CUIDs.

Only slightly larger in storage size than a GUID… 25 Characters * 8 Bits/Character = 200 Bits.  Guessing CUIDs I feel would be easier…

Screen Shot 2019-10-27 at 6.50.42 AM.png

I’ll stick with my GUIDs soup for now.

Until Next Time. Happy Merging Distributed Datasets,

Rob

 

P.S. Microsoft trying to brand it, er recapitulate:

Screen Shot 2019-10-27 at 7.12.13 AM.png

Duck Typing

If it walks like a duck and talks like a duck it must be a duck.

Recently I started a project with Pure “Vanilla” Javascript, and it was quick going.  With the React as the starter pack, I had a basic prototype with proof of concept multi-user editing: https://twitter.com/robmurrer/status/1169118440122015744

That was a month ago and I ported it to Typescript which is a superset of Javascript (JS) that “compiles” to JS.  It plays well with React’s JSX becoming TSX.  Tip to cast in Typescript:

 blockchain = await this.state.store.getItem(id) as BlockProps;

To me, having to do the work of a compiler, is maddening and that is why I actually really love Typescript (TS).  Coming from any traditional language it is a nice safety cushion under you.

TS has excellent tooling to enable confident refactors, which is the number one issue with my JS codebase.  Refactoring “blind” in pure JS is nightmare.  Refactoring is by far the most important task, so I’ve gone full TS.  But I can see if you are just learning JS you should absolutely not start with TS.

Start pure Vanilla JS and build something. Then when you get stuck trying to refactor.  Revert and start fresh with TS.  I like the quick rewrite and throwaway approach for rapid prototyping.  JS is the best rapid prototyping environment I’ve ever seen.

Ok on to code… The most important part of an editor is the data the user enters.  You cannot lose it.  We are using the Local Forage package from the folks at Mozilla for the data storage layer.  Under the hood it will use IndexedDB on modern browsers.

Screen Shot 2019-10-21 at 6.59.49 AM.png

Ok so what are we looking at here?  The genesis for the post.  What the heck is Hydrate and Dehydrate?  It’s just the cool term for serialize and deserialize.  I like it. The whole point of these names is to communicate what it is they do.

Update:  Um it writes the data in a format it can send over wire or store in database.

But there is something about naming it what it has been called in the past.  Hence this post.

The tour de force of programming design patterns:

  • Finite State Machine (FSM)
    • React’s fundamental concept
    • Start here!
  • Command Pattern
    • Ahem…
  • Classes
    • Only to hold state!!!
    • Of course React uses inheritance 😀
  • Composite
    • Just a tree, or
    • Now with more Blockchain

I’ll write/link up some more about the above at some point.

Until next time, Happy Coding!

 

Screen Shot 2019-10-21 at 7.38.09 AM