Dispatching Petabytes with PostgreSQL
Andrew Pantyukhin
andrew@dreamindustries.ru
15M media objects
3PB raw data
storage, streaming, processing
HDFS? Isilon?
custom solution
1000s hard drives
file system per drive
filename = sha256(file)
dispatching
ingestion, rebalancing
encoding, analysis
PostgreSQL!
(of course)
entities
sha (asset), hdd, chassis
metadata, actions, status
15M master objects
25M derivatives
70M copies
200GB core
500GB XML processing
2TB+ overall
custom types
enum
native/wrappers
hashtypes
shatypes
+ crc32, bugfixes
actions
fully async, fail-over
dumb polling
smart locking
update set t=now()where t old
update returning
XML
third-party metadata
stored, processed in PG
research
large-scale action logging
production
aggregated views of dispatcher
distributed logic
dispatcher, XML processing, production, research
full-mesh data exchange
table data transfer
slow or inflexible
simple custom scripts, diff
dream industries
disruptive innovation lab
funding, collaborating
inviting, hiring