The LMS industry was built for access, not for accountability

Written by CYPHER Learning | Jun 10, 2026 6:47:12 AM

And 275 million students just paid the price

When ShinyHunters claimed responsibility for the Canvas breach, they said they had taken 3.65 terabytes of data. Names. Email addresses. Student IDs. Messages between users.

Most of the coverage treated that number as a measure of damage. Evidence of how bad the breach was. It's worth treating it as something else entirely: evidence of a design philosophy.

Because data belonging to thousands of institutions was held as a single whole, retrievable across institutional boundaries, rather than partitioned so that one institution's exposure ends where another's begins.

The breach didn't create that amalgamation of data. It just made it retrievable.

How did the LMS become a data hoarder?

The early learning management system was a gradebook with a website attached. Over two decades, it became something far more ambitious: a central hub for course delivery, communication, assessment, attendance, engagement tracking, integration with institutional systems, and analytics. Each new capability required new data. Each new data point was stored because storage was cheap and deleting data felt like a risk.

The implicit design philosophy was more is better: collect everything that might be useful, store it indefinitely, and figure out later what you actually need. This wasn't malicious. It was how enterprise software was built for a generation. The assumption was that more data meant more value, for the platform, for the institution, for the student.

What that philosophy produced, at scale, was a target.

A monolithic LMS serving thousands of institutions doesn't store 9,000 separate, bounded datasets. It stores one large, interconnected corpus where a single set of credentials (or a single compromised integration) can reach across institutional boundaries. The attacker who gets in doesn't find one school's records. They find everyone's.

That is not a security failure. It is a predictable consequence of building a platform that was never designed to limit its own exposure.

The question behind the question

Ask why a learning management system needs to store student messages in a retrievable database, and you run into a revealing silence.

Messages between students and instructors serve an educational function. That function requires the ability to send and receive them, but not necessarily to archive them indefinitely in a centrally accessible data store. The decision to store them was an architectural choice, made in favor of features (message history, audit trails, analytics) over asking who can access the data, and from where. It's a reasonable choice in isolation. At the scale of 275 million users, it becomes a liability that dwarfs whatever value the archive provided.

The same question applies to every data category in that 3.65 terabytes.

Student IDs: necessary for management, but should a single compromised credential make them retrievable across thousands of accounts?
Activity logs: valuable for learning analytics, but are they partitioned to the organization that generated them, or pooled in a shared layer?
Custom fields: useful for institutional customization, but does each organization’s data stay within its own boundary, or does it flow into the same retrievable whole as everyone else's?

Privacy by design, the principle codified in GDPR's Article 25 and articulated by privacy scholars long before regulation caught up, starts by asking these questions before building the system, not after something goes wrong. It treats data minimization not as a compliance burden but as a design constraint: collect only what you need, retain it only as long as you need it, and separate data categories so that a breach of one does not expose all.

What a privacy-by-design learning platform actually looks like

The concept is less exotic than it sounds. It comes down to a handful of architectural commitments that a legacy LMS cannot make because they would require dismantling what the system is built on.

Data minimization as a default, not an option. A privacy-by-design platform collects the data required to deliver the educational service, such as enrollment records, course progress, assessment results, and account credentials. It treats anything beyond that as requiring explicit justification. It does not accumulate a communication archive because communication archiving was easy to build.

Retention limits with teeth. Data retained indefinitely is data that exists to be stolen. A privacy-by-design platform defines retention periods tied to the educational relationship between the student and the institution and deletes data when that relationship or the obligation ends, not when it gets around to it.

Prohibition on high-risk data categories. A platform designed with privacy in mind builds contractual and technical barriers against the ingestion of data it has no business holding: government identifiers, health information, financial credentials, sensitive personal categories. These belong in specialized systems with specialized protections, not in a general-purpose learning platform.

Organizational data sovereignty. The organization determines what data is collected, not the learning platform. The organization controls its own data environment. A vendor operating as a data processor (executing the organization’s instructions rather than making independent decisions about data use) cannot accumulate a multi-tenant dataset of the kind that made the Canvas breach possible.

None of this makes a platform immune to attack. Determined, sophisticated threat actors find ways into systems that are carefully designed and vigilantly maintained. The question privacy by design addresses isn't whether a breach can happen. It's what a breach finds when it gets in.

A platform that collects minimally, separates data categories, expires what it no longer needs, and keeps organizational data under organizational control is a fundamentally different target than one that spent two decades accumulating everything it could.

3.65 terabytes got there one design decision at a time. Privacy by design is the discipline of making different decisions before the breach, not after.

CYPHER Learning was built on the principle that the platform serves the institution. Not the other way around. Learn more about our data protection commitments.

References

View full post