Unix in the Cloud

Ignorance, Stagnation, Obsolescence

Synopsis

cloud in the broad sense of ideology
not quite about running BSD on EC2
very limited to skills and experience of yours humbly

Multi-core

installation?
configuration management?
load balancing?

Multi-node

installation?
configuration management?
load balancing?
why multi-node?

Large Computing Needs

Facebook, Google, ...
more than any OS can provide

Happy Hardware Vendor Law

The amount of nodes needed to solve a given task doubles every now and again.

OS Scalability Limit

1 node only
multi-socket and stacks approaching NUMA
E25K, z10, etc — fail for most purposes

Operating System — ?

traditional definition no more relevant
the notion itself on the brink of obsolescence
field heavily eroded by current distributed apps

Distributed Applications

forced to be an OS unto themselves
huge overlap
huge opportunity for sharing and consolidation

Anti-Patterns

virtualization
chefs and puppets
thick abstraction

Attempts

z/OS
Plan 9, Inferno
Clustrx, E1, DYSEAC, ...
OpenStack (~~)

Species Survival Plan

Freeze the bodies and leave them for future generations to fix.

Don't Panic: Incremental

perfection v. done
still a decade or more till a good AI
no practical need for POSIX over a cloud

Mindful Approach

immediate practicality
long-term perspective
sustained, integrally rich effect

Operating System

major abstraction repository
overlapping code distillery
pre-production architecture research

Increments

Machine Generated Data

logs, error messages, status monitors
meant for humans... no more
rethinking for better aggregation and analysis

Identity and Authentication

YP, LDAP outdated and poorly supported
no distributed model
passwd in git as a first stab

Remote Procedure Call

ssh losing relevance, HPN or not
all-mighty agent daemon worse than rsh
capabilities, RBAC, WoT

Hardware Failures

no culture for low-level fault-tolerance
watchdogd as state-of-the-art self-healing
focus on self-diagnostics: disk error counters, etc

Distributed Configuration

current anti-patterns worsen the problem
role-aware configuration
/ in git as a second stab

Storage

intra-node redundancy irrelevant
no appropriate local multi-disk FS
no fast path for data exchange
nginx + curl + dispatcher

Error Handling

cf MGD and hardware failures
software is 10x more prone to failures
serious problem at scale

☺