Blob storage is a solved problem- what about compute?
Cheap and simple data blob storage is widely available and relatively easy to access and use. However, easily applying the right kind of compute is still laborious with too many complex barriers, arcane commands, hard to understand costs, and psychological anxiety that interrupts flow state when prototyping, analyzing, and generally doing data intensive research and engineering.
Storage is a solved problem
Storing blobs of data for your web application used to be more involved, I had to think about it. Now I don’t (much): when I build an app or website, that needs some blob storage my thoughts are:
- create a bucket or whatever it’s called in one of some cloud provider, doesn’t have to be the biggest name, they’re all very reliable
- put stuff in
- get stuff out
I don’t care where much it is. Why? Because it’s a solved problem. Remote blob storage is:
- very cheap, and getting cheaper
- reliable [1]
At scale however, yes, you do have to think about cost, but those calculations are pretty straightforward business calculations. At anything below very large data, you don’t have to think much about it, and this makes it a solved problem.
It’s a solved problem in a similar way that nature has solved storing information: replication, with automated mechanisms for damage repair/reconciliation [2].
Compute is not a solved problem
Because you have to think about it. What I want:
I give you some application, some workflow, for example, a machine learning agent, some program I have created, some tool, and it requires from time to time, some level of computing power. You are able to automatically, safely, connect that application to the right level of compute resources as needed.
- If the program is in the browser, I might be able to use the GPU, while the tab is open and running.
- If the program is downloaded and installed program, I have access to your entire computer, but due to that, security and parasitic programs become a problem.
- If I make available some cloud computing, it suddenly becomes complex, with lots of decisions, but with lots of scalable compute resources of different useful types.
Computing resources are inherently valuable, and often able to be converted to $$ efficiently via automation.
Obviously storage !== compute
but if it were as easy, then I could distribute complex scientific simulations, and revive them years later, and they would “just work”.
The team at https://metapage.io aim to solve that problem: compute as a simple commodity.
When we can treat compute as a commodity, we have more power over the compute providers. When they manage to get their system to be difficult to move from, you lose bargaining power.
Personally, I default to https://www.digitalocean.com/. This isn’t a paid plug! They just do a great job of making plenty of options, at the right complexity/resolution, not too many, not too few.
For my full stacks, I’m using AWS but not directly: my choice of platform vendor (https://nhost.io) makes that decision for me.
References
[1] How data is lost in the cloud
https://spanning.com/blog/how-data-is-lost-in-the-cloud/