AgileSoftLabs Logo
EzhilarasanBy Ezhilarasan
Published: February 2026|Updated: March 2026|Reading Time: 6 minutes

Share:

How Ruangguru Scaled to 22M Students: Tech Architecture Deep Dive

Ruangguru by the Numbers

22M+
Active Students
65K+
Video Lessons
34
Provinces in Indonesia
100x
Scale Achieved

From 100K to 22M users — this is how the technology scaled.

Ruangguru grew from a startup to Indonesia's largest education technology platform, serving 22+ million students. This is the story of how we partnered with them in 2016 — when they had 100,000 users — and helped build technology that scaled to 22 million.

The Challenge: Scaling Education Technology

When Ruangguru first engaged us in 2016, they were a fast-growing EdTech startup with 100,000 registered students and big ambitions. Their initial infrastructure was adequate for the current scale but wasn't designed for the 100x growth they were targeting.

Initial State (2016)

  • Users: ~100,000 registered students (early-stage growth)
  • Content: 10,000+ learning videos
  • Peak load: 5,000 concurrent users
  • Issues: Infrastructure not built for scale, monolithic architecture, no CDN strategy

The Growth Trajectory


User Growth:
2016: 100K registered users (partnership start)
2017: 500K users
2018: 1M users

Concurrent User Peaks:
├── Normal day: 200,000 concurrent
├── Exam prep season: 800,000 concurrent
├── National exam day: 2M+ concurrent
└── Challenge: 40x variation in load

Technical Partnership Approach

Our engagement evolved through several phases as Ruangguru's needs changed:

Phase 1: Stabilization (2016 — 3 months)

Before we could scale, we had to stabilize.


Initial Issues Identified:
├── Database bottleneck (single PostgreSQL instance)
├── Video delivery (origin server overloaded)
├── Session management (in-memory, not distributed)
├── No auto-scaling (manual capacity management)
└── Limited monitoring (reactive, not proactive)

Immediate Actions:
├── Database read replicas + connection pooling
├── CDN implementation for video content
├── Distributed session management (Redis)
├── Auto-scaling configuration
└── Comprehensive monitoring setup

Results (30 days):
├── 99.5% uptime (from 94%)
├── Page load time: 6s → 2.1s
├── Video start time: 8s → 1.5s
└── Zero exam-period outages

Phase 2: Architecture Evolution (2017 — 6 months)

With stability achieved, we rebuilt for scale.


Architecture Transformation:

Before (Monolithic):
┌─────────────────────────────────────────────────────────────────────┐
│                     Single Application                               │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │  Auth + Content + Video + Payment + Analytics + Admin           ││
│  └─────────────────────────────────────────────────────────────────┘│
│                              │                                       │
│                              ▼                                       │
│                    Single Database                                   │
└─────────────────────────────────────────────────────────────────────┘

After (Microservices):
┌─────────────────────────────────────────────────────────────────────┐
│                      API Gateway                                     │
├─────────────────────────────────────────────────────────────────────┤
│  ┌──────┐ ┌───────┐ ┌──────┐ ┌───────┐ ┌─────────┐ ┌─────────────┐│
│  │ Auth │ │Content│ │Video │ │Payment│ │Analytics│ │ Live Class  ││
│  └──────┘ └───────┘ └──────┘ └───────┘ └─────────┘ └─────────────┘│
│     │         │        │         │          │             │         │
│     ▼         ▼        ▼         ▼          ▼             ▼         │
│  ┌──────┐ ┌───────┐ ┌──────┐ ┌───────┐ ┌─────────┐ ┌─────────────┐│
│  │UserDB│ │Content│ │Video │ │Payment│ │Analytics│ │ Real-time   ││
│  │      │ │  DB   │ │  CDN │ │  DB   │ │  Store  │ │   Infra     ││
│  └──────┘ └───────┘ └──────┘ └───────┘ └─────────┘ └─────────────┘│
└─────────────────────────────────────────────────────────────────────┘

Key Technical Decisions

DecisionRationaleResult
Kubernetes for orchestrationAuto-scaling, self-healing, consistent deploymentCan scale to 10x in minutes
Multi-CDN strategyRedundancy + regional optimization for Indonesia99.9% video availability
Event-driven architectureDecouple services, handle spikes2M+ events/second capacity
Separate read/write pathsOptimize for different access patterns10x read throughput

Phase 3: Feature Development (2018–Ongoing)

Beyond infrastructure, we built new capabilities:


Features Developed:

Live Learning Platform:
├── Real-time video streaming (100K+ concurrent viewers)
├── Interactive Q&A during sessions
├── Whiteboard collaboration
├── Recording and playback
└── Bandwidth adaptation for varied connections

Adaptive Learning Engine:
├── Student performance tracking
├── Personalized content recommendations
├── Difficulty adjustment based on progress
├── Weakness identification and targeted practice
└── Learning path optimization

Assessment System:
├── Large-scale exam delivery (500K simultaneous)
├── Anti-cheating measures
├── Instant grading and feedback
├── Performance analytics for teachers
└── Question bank management

Results and Impact

Technical Metrics

MetricBefore (2016)After (2024)Improvement
Peak concurrent users5,0002,000,000+400x
System availability94%99.95%~6x fewer outages
Page load time6 seconds1.2 seconds5x faster
Video start time8 seconds0.8 seconds10x faster
API response time (p95)2.5 seconds200ms12x faster

Business Impact


Growth Metrics:
├── User base: 1M → 28M (28x growth)
├── Content library: 100K → 1M+ items
├── Live classes delivered: 10K/month → 500K/month
├── Revenue growth: 15x over partnership period
└── Market position: #1 EdTech in Indonesia

Student Outcomes:
├── 10M+ students prepared for national exams
├── 85% of users report improved grades
├── 2M+ scholarship assessments processed
└── 500K+ hours of live instruction delivered

COVID-19 Response

When schools closed in March 2020, Ruangguru had to scale overnight:


March 2020 Scaling Event:

Before (Feb 2020):
├── 200K daily active users
└── 50K peak concurrent

After (April 2020):
├── 2M daily active users (10x)
├── 400K peak concurrent (8x)
└── Required: 2-week timeline to scale

Our Response:
├── Emergency capacity planning (48 hours)
├── Additional infrastructure provisioning (72 hours)
├── Performance optimization sprint
├── Free tier launch for all Indonesian students
└── Result: Zero downtime during transition

Lessons from the Partnership

What Worked

  • Embedded team model: Our engineers worked alongside Ruangguru's team, building internal capability
  • Incremental migration: Moved to microservices piece by piece, not big bang
  • Load testing obsession: Tested at 3x expected peak before every major event
  • Local optimization: Indonesia-specific CDN and infrastructure choices
  • Knowledge transfer: Documented everything, trained internal team

Challenges Overcome

  • Indonesia's infrastructure variety: Users on everything from 5G to 2G connections
  • Peak load unpredictability: Viral content could 10x traffic in hours
  • Regulatory compliance: Data localization and content requirements
  • Rapid feature demands: Business moved faster than typical enterprise

Technology Stack

LayerTechnologyWhy We Chose It
Container orchestrationKubernetes (GKE)Managed, auto-scaling, reliable
Backend servicesGo, Node.jsPerformance + developer productivity
DatabasesPostgreSQL, MongoDB, RedisRight tool for each data type
Message queueApache KafkaHigh throughput, durability
Video deliveryMulti-CDN (Akamai, Cloudflare, local)Redundancy + regional performance
Real-timeWebSocket + custom signalingLow latency for live classes
AnalyticsClickHouse, Apache SparkFast queries on large datasets

Partnership Evolution


Engagement Model Over Time:

2016–2017: Foundation & Stabilization
├── Infrastructure assessment and rebuild
├── CDN strategy for Indonesia
└── Monolith → first modular services

2018–2019: Embedded team
├── Engineers embedded in Ruangguru's team
├── Microservices migration (piece by piece)
└── Knowledge transfer and internal capability build

2020–2021: Scale for COVID-19
├── Emergency capacity response (March 2020)
├── 10x traffic in 2 weeks — zero downtime
└── Live class platform for 100K+ concurrent viewers

2022–Present: Strategic advisory
├── Architecture reviews for new product lines
├── Scaling guidance as they expand across Southeast Asia
├── Ongoing support relationship
└── Ruangguru's internal team handles day-to-day

Conclusion

Ruangguru's journey from 100,000 to 28 million students — which we've been part of since 2016 — demonstrates what's possible when technology scales with business ambition. The keys to success were pragmatic architecture decisions, obsessive focus on reliability, and a partnership model that built lasting capability.

Today, Ruangguru's internal team handles most development, exactly as planned. Our ongoing role is supporting their continued growth and tackling new technical challenges as they expand across Southeast Asia.

Building an EdTech platform or scaling an existing one? Contact us to discuss how we can help.

How Ruangguru Scaled to 22M Students: Tech Architecture Deep Dive - AgileSoftLabs Blog