![[Personal Takes] Network Storage: Principal](/static/c0f37741e6625b48332a538ed0da398e/2a4de/ibm_first_drive.png)
Warning: This post is mostly about Ceph, but the principles apply to most distributed storage systems. Your mileage may vary with other solutions.
The Reality Check
After years of building, breaking, and rebuilding network storage systems, I've learned that most documentation glosses over the harsh realities. The vendor presentations show perfect scenarios with uniform hardware and infinite budgets. The real world? Well, that's messier. Here are the hard-earned lessons that'll save you some sleepless nights.
1. Why Busy the Network if We Don't Have To?
This might sound obvious, but you'd be amazed how often people overcomplicate their storage architecture. Every byte that crosses the network is a byte that could cause latency, congestion, or worse - complete failure during peak loads.
The principle is simple: keep data local whenever possible. If your application can work with local storage, don't force it through the network just because you have a fancy distributed storage system. Reserve network storage for when you actually need the redundancy, accessibility, or shared state.
I've seen perfectly good applications hobbled by unnecessary network hops. Sometimes the best network storage solution is no network storage at all.
2. How to Move Data Fast? Don't Move It at All
The fastest data transfer is the one that never happens. This isn't just philosophical - it's practical architecture.
Instead of designing systems that constantly shuffle data around, design them to minimize data movement from the start. Use techniques like:
- Data locality: Keep related data on the same nodes
- Smart caching: Cache frequently accessed data locally
- Lazy replication: Only replicate when necessary, not because you can
When you absolutely must move data, batch it. Move large chunks during off-peak hours rather than constant tiny transfers that create network chatter.
3. Fast Local Storage is Easy and Cheap. Fast Shared Storage is Hard and Expensive
This is the fundamental economics of storage that marketing teams don't want you to understand. You can build blazing-fast local storage with consumer SSDs for a fraction of what enterprise shared storage costs.
But the moment you want that performance shared across multiple nodes? Welcome to complexity hell. Now you need:
- High-speed networking (expensive)
- Specialized protocols (complex)
- Redundancy mechanisms (more expensive)
- Synchronization overhead (performance killer)
Don't build shared storage unless you genuinely need it. And if you do need it, budget accordingly - both in money and complexity.
4. Your Cluster Storage System is Only as Good as Its Network
I cannot stress this enough: your storage cluster will perform like your worst network link. It doesn't matter if you have the fastest SSDs money can buy if they're connected with gigabit Ethernet from 2010.
Network considerations for storage clusters:
- Bandwidth: More is always better, but consistency matters more
- Latency: Low latency beats high bandwidth for small operations
- Reliability: One flaky switch can bring down your entire cluster
Invest in your network infrastructure. It's not glamorous, but it's the foundation everything else depends on.
5. Mixing Drive Capacities: Proceed with Extreme Caution
"It's okay to mix drive capacities, but keep them as uniform as possible." This sounds contradictory, but here's the reality:
When you mix drive sizes, several problems emerge:
- Uneven distribution: Smaller disks fill up faster, creating hotspots
- Placement group imbalance: Larger disks handle more placement groups
- Uneven wear: Some disks work harder than others, failing sooner
- Performance degradation: The cluster performance drops to accommodate the differences
If you must mix capacities (and sometimes you must), try to keep ratios reasonable. Don't put 1TB drives next to 10TB drives in the same pool. Your future self will thank you.
6. Don't Exceed 80% Capacity if You Want Nice Weekend Trips
This is the "golden rule" of storage administration, and it exists for good reasons:
- Performance degradation: Most storage systems slow down significantly above 80% capacity
- Rebalancing headaches: Less free space means longer recovery times
- Emergency buffer: You need space for temporary operations and emergency growth
- Peace of mind: Nothing ruins a weekend like a full storage cluster
I've learned this the hard way. That extra 15% capacity seems wasteful until you're frantically trying to free up space at 2 AM because someone decided to upload their entire video collection to the company share.
7. We'll Figure It Out on the Fly
This isn't defeatism - it's pragmatism. No matter how much you plan, real-world usage will surprise you. Applications will behave differently than expected. Users will find creative ways to stress your system. Hardware will fail in ways the vendor said were impossible.
The key is building systems that are observable and adjustable. You need:
- Good monitoring: Know what's happening in real-time
- Flexible architecture: Be able to adapt without complete rebuilds
- Documentation: Write down what you learn for next time
Don't aim for perfection on day one. Aim for "good enough to start" and "flexible enough to improve."
8. Don't Sabotage Your Beautiful Cluster with Terrible Frontend Networking
This is the classic mistake: spending thousands on high-performance storage nodes, then connecting clients through whatever network infrastructure was lying around.
Your storage cluster might be capable of hundreds of thousands of IOPS, but if clients connect through old gigabit switches with oversubscribed uplinks, users will experience the performance of those old gigabit switches.
The client experience is what matters. Build your network architecture from the user backward, not from storage forward.
The Bottom Line
Network storage is about trade-offs, not absolutes. Every decision has consequences, and those consequences compound over time. The best storage architecture is the one that:
- Meets your actual needs (not your imagined ones)
- Fits your budget (including operational costs)
- Can evolve as requirements change
- Doesn't keep you up at night
Remember: perfect is the enemy of good, but "good enough" is the enemy of excellent. Find your balance, document your decisions, and always keep learning.
Because in six months, you'll need to explain to your future self why you made these choices. Make sure it's a conversation you'll enjoy having.