AgileSoftLabs Logo
EmachalanBy Emachalan
Published: April 2026|Updated: April 2026|Reading Time: 9 minutes

Share:

How Ruangguru Scaled to 22M Tech Stack

Published: April 3, 2026 | Reading Time: 14 minutes 

About the Author

Emachalan is a Full-Stack Developer specializing in MEAN & MERN Stack, focused on building scalable web and mobile applications with clean, user-centric code.

Key Takeaways

  • Scaling EdTech requires stabilization before growth — we spent the first 3 months fixing reliability before adding a single new feature.
  • Monolith-to-microservices migration works best incrementally — migrating piece-by-piece, not in a "big bang," was the key to zero-downtime migration.
  • Load testing at 3× expected peak before every major event was the single most important reliability practice across the entire 8-year partnership.
  • Embedded team model builds lasting capability — today Ruangguru's internal team handles day-to-day development, exactly as planned from day one.
  • COVID-19 proved the architecture worked — 10× traffic in 2 weeks with zero downtime validated every infrastructure decision made since 2016.
  • Indonesia's infrastructure variety (users on everything from 5G to 2G) required locale-specific CDN and bandwidth adaptation decisions that global defaults could not provide.
  • A well-architected event-driven platform can handle 2M+ concurrent users with peak load variations of 40× — if designed for it from the start.

Ruangguru by the Numbers

Metric Value
Active Students 22 Million+
Video Lessons 65,000+
Provinces in Indonesia 34
Scale Achieved 100×

From 100K to 22M users — this is how the technology scaled.

Ruangguru grew from a startup to Indonesia's largest education technology platform, serving 22+ million students. This is the story of how we partnered with them in 2016 — when they had 100,000 users — and helped build technology that scaled to 22 million.

Learn how AgileSoftLabs architects and builds enterprise-grade platforms for education, healthcare, logistics, and e-commerce — from early-stage startups to national-scale deployments.

The Challenge: Scaling Education Technology

When Ruangguru first engaged us in 2016, they were a fast-growing EdTech startup with 100,000 registered students and big ambitions. Their initial infrastructure was adequate for the current scale but wasn't designed for the 100× growth they were targeting.

Initial State (2016)

Dimension Status
Users ~100,000 registered students (early-stage growth)
Content 10,000+ learning videos
Peak load 5,000 concurrent users
Issues Infrastructure not built for scale, monolithic architecture, no CDN strategy

The Growth Trajectory

Explore AgileSoftLabs Education Platform Solutions — including Education Management and AI-Powered Academic Program Management Software — built on the same scalable architecture principles applied at Ruangguru.

Technical Partnership Approach

Our engagement evolved through several phases as Ruangguru's needs changed:

Phase 1: Stabilization (2016 — 3 Months)

Before we could scale, we had to stabilize.

Initial Issues Identified:
  • Database bottleneck (single PostgreSQL instance)
  • Video delivery (origin server overloaded)
  • Session management (in-memory, not distributed)
  • No auto-scaling (manual capacity management)
  • Limited monitoring (reactive, not proactive)

Immediate Actions:

  • Database read replicas + connection pooling
  • CDN implementation for video content
  • Distributed session management (Redis)
  • Auto-scaling configuration
  • Comprehensive monitoring setup

Results (30 days):

  • 99.5% uptime (from 94%)
  • Page load time: 6s → 2.1s
  • Video start time: 8s → 1.5s
  • Zero exam-period outages

See how AgileSoftLabs Cloud Development Services stabilize infrastructure through CDN strategy, auto-scaling configuration, and distributed session management — the same interventions that transformed Ruangguru's reliability in 30 days.

Phase 2: Architecture Evolution (2017 — 6 Months)

With stability achieved, we rebuilt for scale.

Architecture Transformation

Before (Monolithic):

After (Microservices):

Key Technical Decisions

Decision Rationale Result
Kubernetes for orchestration Auto-scaling, self-healing, consistent deployment Can scale to 10× in minutes
Multi-CDN strategy Redundancy + regional optimization for Indonesia 99.9% video availability
Event-driven architecture Decouple services, handle spikes 2M+ events/second capacity
Separate read/write paths Optimize for different access patterns 10× read throughput

Explore how AgileSoftLabs Custom Software Development Services approach monolith-to-microservices migration — incremental, low-risk, and designed to build internal team capability throughout the process.

Phase 3: Feature Development (2018–Ongoing)

Beyond infrastructure, we built new capabilities:

Live Learning Platform:

  • Real-time video streaming (100K+ concurrent viewers)
  • Interactive Q&A during sessions
  • Whiteboard collaboration
  • Recording and playback
  • Bandwidth adaptation for varied connections

Adaptive Learning Engine:

  • Student performance tracking
  • Personalized content recommendations
  • Difficulty adjustment based on progress
  • Weakness identification and targeted practice
  • Learning path optimization

Assessment System:

  • Large-scale exam delivery (500K simultaneous)
  • Anti-cheating measures
  • Instant grading and feedback
  • Performance analytics for teachers
  • Question bank management

See how AgileSoftLabs AI & Machine Learning Development Services build adaptive learning engines — personalization algorithms, recommendation systems, and real-time performance analytics at scale.

Results and Impact

Technical Metrics

Metric Before (2016) After (2024) Improvement
Peak concurrent users 5,000 2,000,000+ 400×
System availability 94% 99.95% ~6× fewer outages
Page load time 6 seconds 1.2 seconds 5× faster
Video start time 8 seconds 0.8 seconds 10× faster
API response time (p95) 2.5 seconds 200ms 12× faster

Business Impact

I. Growth Metrics:

  • User base: 1M → 28M (28x growth)
  • Content library: 100K → 1M+ items
  • Live classes delivered: 10K/month → 500K/month
  • Revenue growth: 15x over partnership period
  • Market position: #1 EdTech in Indonesia

II. Student Outcomes:

  • 10M+ students prepared for national exams
  • 85% of users report improved grades
  • 2M+ scholarship assessments processed
  • 500K+ hours of live instruction delivered

Review more enterprise-scale technology outcomes in the AgileSoftLabs Case Studies — including platforms across healthcare, logistics, and consumer applications.

COVID-19 Response: 10× Traffic in 2 Weeks — Zero Downtime

When schools closed in March 2020, Ruangguru had to scale overnight:

March 2020 Scaling Event:

Before (Feb 2020):

  • 200K daily active users
  • 50K peak concurrent

After (April 2020):

  • 2M daily active users (10x)
  • 400K peak concurrent (8x)
  • Required: 2-week timeline to scale

Our Response:

  • Emergency capacity planning (48 hours)
  • Additional infrastructure provisioning (72 hours)
  • Performance optimization sprint
  • Free tier launch for all Indonesian students
  • Result: Zero downtime during transition

The COVID-19 response was the ultimate proof-of-concept for every architecture decision made since 2016. The event-driven, Kubernetes-orchestrated, multi-CDN infrastructure absorbed 10× normal traffic with no user-facing outages — a result that would have been impossible on the 2016 monolithic stack.

Lessons from the Partnership

What Worked

  • Embedded team model: Our engineers worked alongside Ruangguru's team, building internal capability
  • Incremental migration: Moved to microservices piece by piece, not a big bang
  • Load testing obsession: Tested at 3x expected peak before every major event
  • Local optimization: Indonesia-specific CDN and infrastructure choices
  • Knowledge transfer: Documented everything, trained internal team

Challenges Overcome

  • Indonesia's infrastructure variety: Users on everything from 5G to 2G connections
  • Peak load unpredictability: Viral content could 10x traffic in hours
  • Regulatory compliance: Data localization and content requirements
  • Rapid feature demands: Business moved faster than typical enterprise

Technology Stack

Layer Technology Why We Chose It
Container orchestration Kubernetes (GKE) Managed, auto-scaling, reliable
Backend services Go, Node.js Performance + developer productivity
Databases PostgreSQL, MongoDB, Redis Right tool for each data type
Message queue Apache Kafka High throughput, durability
Video delivery Multi-CDN (Akamai, Cloudflare, local) Redundancy + regional performance
Real-time WebSocket + custom signaling Low latency for live classes
Analytics ClickHouse, Apache Spark Fast queries on large datasets

Explore AgileSoftLabs Web Application Development Services — our engineering teams apply the same Go, Node.js, Kubernetes, and Kafka stack principles across enterprise platform builds for global clients.

Partnership Evolution: 8 Years, 4 Phases

Engagement Model Over Time:

Phase Years Mode Primary Deliverable
Foundation & Stabilization 2016–2017 Active build Infrastructure rebuild, CDN, first services
Embedded Team 2018–2019 Collaborative Microservices migration, knowledge transfer
Scale for COVID-19 2020–2021 Emergency + product 10× scale, live class platform
Strategic Advisory 2022–Present Advisory Architecture review, Southeast Asia expansion

2016–2017: Foundation & Stabilization

  • Infrastructure assessment and rebuild
  • CDN strategy for Indonesia
  • Monolith → first modular services

2018–2019: Embedded Team

  • Engineers embedded in Ruangguru's team
  • Microservices migration (piece by piece)
  • Knowledge transfer and internal capability build

2020–2021: Scale for COVID-19

  • Emergency capacity response (March 2020)
  • 10x traffic in 2 weeks — zero downtime
  • Live class platform for 100K+ concurrent viewers

2022–Present: Strategic advisory

  • Architecture reviews for new product lines
  • Scaling guidance as they expand across Southeast Asia
  • Ongoing support relationship
  • Ruangguru's internal team handles day-to-day

Conclusion

Ruangguru's journey from 100,000 to 28 million students — which we've been part of since 2016 — demonstrates what's possible when technology scales with business ambition. The keys to success were pragmatic architecture decisions, obsessive focus on reliability, and a partnership model that built lasting capability rather than lasting dependency.

Today, Ruangguru's internal team handles most development, exactly as planned from the beginning. Our ongoing role is supporting their continued growth and tackling new technical challenges as they expand across Southeast Asia.

The numbers — 400× peak concurrency, 99.95% availability, 12× API response improvement — are not the story. The story is of 22 million Indonesian students accessing quality education that wasn't previously available to them. The technology made that possible. The partnership made the technology sustainable.

Building an EdTech platform or scaling an existing one? AgileSoftLabs brings the same partnership model and architecture expertise to your platform. Browse our product portfolio, explore our case studies, and contact our team to discuss how we can help.

Frequently Asked Questions (FAQs)

1. How did Ruangguru grow from 100K to 22M students technically?

Started 2016 with 100K users on Node.js monolith serving 10K videos. Migrated Kubernetes/GKE 2018 handling 40x exam spikes. 2022 hit 2M concurrent peaks via auto-scaling across multi-region clusters with intelligent load distribution.

2. What Kubernetes HPA settings managed Ruangguru's 40x spikes?

Horizontal Pod Autoscaler targeted 70% CPU utilization scaling 10x pods in 2 minutes. Cluster Autoscaler provisioned nodes dynamically. Self-healing replaced 5% daily pod failures automatically during exam seasons.

3. Why migrate 90% backend from Node.js to Golang microservices?

Go delivered 10x throughput per instance vs Node event loop limits. Single binary deployments eliminated Docker layer complexity. Goroutines processed 400K concurrent WebSocket connections efficiently.

4. How does Kafka handle Ruangguru's 2M events/second throughput?

12-node Kafka cluster with 3x replication across 3 AZs. Separate exam analytics vs transactional streams. Consumer lag alerts trigger auto-partition rebalancing maintaining <100ms end-to-end latency.

5. What multi-CDN routing ensures 99.9% video delivery uptime?

Cloudflare + Akamai + 3 Indonesian providers with latency-based steering. Dynamic origin failover switches traffic <3s. Pre-cached exam content regionally prevents origin overload during peaks.

6. How does Redis Cluster manage sessions for 22M distributed users?

6-node Redis Cluster (3 master/replicas) with consistent hashing. 30min TTL sessions, multi-region async replication. Jakarta-Singapore reads <50ms via cross-region read replicas.

7. What sharding strategy supports Ruangguru's mixed read/write patterns?

PostgreSQL sharded by user_id (28M registered), 1:5 read replica ratio. ClickHouse analytical cluster for exam reports. Write throughput 2K TPS → 20K TPS post-sharding.

8. How was page load reduced from 6s to 2.1s serving 400K DAU?

React 18 micro-frontends with code splitting, critical CSS extraction. Cloudflare Polish images, GKE service mesh caching. TTFB dropped 60% via edge compute + preconnect optimization.

9. What monitoring prevented Ruangguru's 99.9% SLA violations?

Prometheus/Grafana scraped 10K metrics/second cluster-wide. Datadog APM traced 90% microservices. PagerDuty escalated >5% 5xx errors within 2 minutes to on-call rotation.

10. How did Ruangguru survive exam-day 2M concurrent surges reliably?

Pre-scaled clusters 80% capacity exam week. Per-user rate limits, progressive circuit breakers. CDN overprovisioned 3x forecasted peak. GKE preemptible nodes handled non-critical workloads cost-effectively.

How Ruangguru Scaled to 22M Tech Stack - AgileSoftLabs Blog