Share:
How Ruangguru Scaled to 22M Students: Tech Architecture Deep Dive
Ruangguru by the Numbers
From 100K to 22M users — this is how the technology scaled.
Ruangguru grew from a startup to Indonesia's largest education technology platform, serving 22+ million students. This is the story of how we partnered with them in 2016 — when they had 100,000 users — and helped build technology that scaled to 22 million.
The Challenge: Scaling Education Technology
When Ruangguru first engaged us in 2016, they were a fast-growing EdTech startup with 100,000 registered students and big ambitions. Their initial infrastructure was adequate for the current scale but wasn't designed for the 100x growth they were targeting.
Initial State (2016)
- Users: ~100,000 registered students (early-stage growth)
- Content: 10,000+ learning videos
- Peak load: 5,000 concurrent users
- Issues: Infrastructure not built for scale, monolithic architecture, no CDN strategy
The Growth Trajectory
User Growth:
2016: 100K registered users (partnership start)
2017: 500K users
2018: 1M users
Concurrent User Peaks:
├── Normal day: 200,000 concurrent
├── Exam prep season: 800,000 concurrent
├── National exam day: 2M+ concurrent
└── Challenge: 40x variation in load
Technical Partnership Approach
Our engagement evolved through several phases as Ruangguru's needs changed:
Phase 1: Stabilization (2016 — 3 months)
Before we could scale, we had to stabilize.
Initial Issues Identified:
├── Database bottleneck (single PostgreSQL instance)
├── Video delivery (origin server overloaded)
├── Session management (in-memory, not distributed)
├── No auto-scaling (manual capacity management)
└── Limited monitoring (reactive, not proactive)
Immediate Actions:
├── Database read replicas + connection pooling
├── CDN implementation for video content
├── Distributed session management (Redis)
├── Auto-scaling configuration
└── Comprehensive monitoring setup
Results (30 days):
├── 99.5% uptime (from 94%)
├── Page load time: 6s → 2.1s
├── Video start time: 8s → 1.5s
└── Zero exam-period outages
Phase 2: Architecture Evolution (2017 — 6 months)
With stability achieved, we rebuilt for scale.
Architecture Transformation:
Before (Monolithic):
┌─────────────────────────────────────────────────────────────────────┐
│ Single Application │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ Auth + Content + Video + Payment + Analytics + Admin ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ Single Database │
└─────────────────────────────────────────────────────────────────────┘
After (Microservices):
┌─────────────────────────────────────────────────────────────────────┐
│ API Gateway │
├─────────────────────────────────────────────────────────────────────┤
│ ┌──────┐ ┌───────┐ ┌──────┐ ┌───────┐ ┌─────────┐ ┌─────────────┐│
│ │ Auth │ │Content│ │Video │ │Payment│ │Analytics│ │ Live Class ││
│ └──────┘ └───────┘ └──────┘ └───────┘ └─────────┘ └─────────────┘│
│ │ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ ▼ │
│ ┌──────┐ ┌───────┐ ┌──────┐ ┌───────┐ ┌─────────┐ ┌─────────────┐│
│ │UserDB│ │Content│ │Video │ │Payment│ │Analytics│ │ Real-time ││
│ │ │ │ DB │ │ CDN │ │ DB │ │ Store │ │ Infra ││
│ └──────┘ └───────┘ └──────┘ └───────┘ └─────────┘ └─────────────┘│
└─────────────────────────────────────────────────────────────────────┘
Key Technical Decisions
| Decision | Rationale | Result |
|---|---|---|
| Kubernetes for orchestration | Auto-scaling, self-healing, consistent deployment | Can scale to 10x in minutes |
| Multi-CDN strategy | Redundancy + regional optimization for Indonesia | 99.9% video availability |
| Event-driven architecture | Decouple services, handle spikes | 2M+ events/second capacity |
| Separate read/write paths | Optimize for different access patterns | 10x read throughput |
Phase 3: Feature Development (2018–Ongoing)
Beyond infrastructure, we built new capabilities:
Features Developed:
Live Learning Platform:
├── Real-time video streaming (100K+ concurrent viewers)
├── Interactive Q&A during sessions
├── Whiteboard collaboration
├── Recording and playback
└── Bandwidth adaptation for varied connections
Adaptive Learning Engine:
├── Student performance tracking
├── Personalized content recommendations
├── Difficulty adjustment based on progress
├── Weakness identification and targeted practice
└── Learning path optimization
Assessment System:
├── Large-scale exam delivery (500K simultaneous)
├── Anti-cheating measures
├── Instant grading and feedback
├── Performance analytics for teachers
└── Question bank management
Results and Impact
Technical Metrics
| Metric | Before (2016) | After (2024) | Improvement |
|---|---|---|---|
| Peak concurrent users | 5,000 | 2,000,000+ | 400x |
| System availability | 94% | 99.95% | ~6x fewer outages |
| Page load time | 6 seconds | 1.2 seconds | 5x faster |
| Video start time | 8 seconds | 0.8 seconds | 10x faster |
| API response time (p95) | 2.5 seconds | 200ms | 12x faster |
Business Impact
Growth Metrics:
├── User base: 1M → 28M (28x growth)
├── Content library: 100K → 1M+ items
├── Live classes delivered: 10K/month → 500K/month
├── Revenue growth: 15x over partnership period
└── Market position: #1 EdTech in Indonesia
Student Outcomes:
├── 10M+ students prepared for national exams
├── 85% of users report improved grades
├── 2M+ scholarship assessments processed
└── 500K+ hours of live instruction delivered
COVID-19 Response
When schools closed in March 2020, Ruangguru had to scale overnight:
March 2020 Scaling Event:
Before (Feb 2020):
├── 200K daily active users
└── 50K peak concurrent
After (April 2020):
├── 2M daily active users (10x)
├── 400K peak concurrent (8x)
└── Required: 2-week timeline to scale
Our Response:
├── Emergency capacity planning (48 hours)
├── Additional infrastructure provisioning (72 hours)
├── Performance optimization sprint
├── Free tier launch for all Indonesian students
└── Result: Zero downtime during transition
Lessons from the Partnership
What Worked
- Embedded team model: Our engineers worked alongside Ruangguru's team, building internal capability
- Incremental migration: Moved to microservices piece by piece, not big bang
- Load testing obsession: Tested at 3x expected peak before every major event
- Local optimization: Indonesia-specific CDN and infrastructure choices
- Knowledge transfer: Documented everything, trained internal team
Challenges Overcome
- Indonesia's infrastructure variety: Users on everything from 5G to 2G connections
- Peak load unpredictability: Viral content could 10x traffic in hours
- Regulatory compliance: Data localization and content requirements
- Rapid feature demands: Business moved faster than typical enterprise
Technology Stack
| Layer | Technology | Why We Chose It |
|---|---|---|
| Container orchestration | Kubernetes (GKE) | Managed, auto-scaling, reliable |
| Backend services | Go, Node.js | Performance + developer productivity |
| Databases | PostgreSQL, MongoDB, Redis | Right tool for each data type |
| Message queue | Apache Kafka | High throughput, durability |
| Video delivery | Multi-CDN (Akamai, Cloudflare, local) | Redundancy + regional performance |
| Real-time | WebSocket + custom signaling | Low latency for live classes |
| Analytics | ClickHouse, Apache Spark | Fast queries on large datasets |
Partnership Evolution
Engagement Model Over Time:
2016–2017: Foundation & Stabilization
├── Infrastructure assessment and rebuild
├── CDN strategy for Indonesia
└── Monolith → first modular services
2018–2019: Embedded team
├── Engineers embedded in Ruangguru's team
├── Microservices migration (piece by piece)
└── Knowledge transfer and internal capability build
2020–2021: Scale for COVID-19
├── Emergency capacity response (March 2020)
├── 10x traffic in 2 weeks — zero downtime
└── Live class platform for 100K+ concurrent viewers
2022–Present: Strategic advisory
├── Architecture reviews for new product lines
├── Scaling guidance as they expand across Southeast Asia
├── Ongoing support relationship
└── Ruangguru's internal team handles day-to-day
Conclusion
Ruangguru's journey from 100,000 to 28 million students — which we've been part of since 2016 — demonstrates what's possible when technology scales with business ambition. The keys to success were pragmatic architecture decisions, obsessive focus on reliability, and a partnership model that built lasting capability.
Today, Ruangguru's internal team handles most development, exactly as planned. Our ongoing role is supporting their continued growth and tackling new technical challenges as they expand across Southeast Asia.
Building an EdTech platform or scaling an existing one? Contact us to discuss how we can help.





