Student Privacy at Kiddom (Part 1)

Val Tenyotkin
Kiddom Engineering
Published in
3 min readAug 15, 2017

--

Everyone talks about privacy in terms so refined and abstract they’re incomprehensible and thus unimpeachable. Mostly because no one wants to be left typing up a resignation letter after the personal information of hundreds of thousands users ends up on every torrent bulletin board.

“Technology does not fail people. People fail technology.” — Unknown

Almost without fail software comes under suspicion not because of its inherent failings, but rather because of the (un)conscious decisions made by people building and using it. At times reason prevails, but by and large known security holes remain unplugged. Security through obscurity is a term which most aptly describes counter-penetration measures in place at most organizations.

Kiddom takes a slightly different approach: if it cannot be done securely in the open, it cannot be done.

Part I: The Database

You have a secure(d) database in a VPC, walled off from the WWW and accessible only by the production cluster. One day, a product manager asks for the age distribution of the users; so you oblige. They are impressed. So the requests keep coming for more data. At a certain point, engineering decides that running trivial SQL is not their job, and either hands over a read-only access or — worse — a copy of the database on a USB drive. “Here, knock yourselves out!”

“You want leaks? Because that’s how you get leaks.” — Sterling Mallory Archer

Every individual with access to the production database(s), regardless of the restrictions placed on their credentials, is an unacceptable gaping security hole. A company with any inclination to safeguard their users’ data shall minimize the number of entities with access to the production database(s).

But how then, you ask, can we study the use patterns to improve the software? The situation becomes more complicated when third-party services such as Periscope are used. The PR nightmare of an EdTech startup handing over personally identifiable information (PII) on students to a company which likely never even heard of FERPA is enough of an impetus to not do it at all.

Lessons From FOIA

The U.S. government is required, within reason, to release any document in its possession to anyone requesting it. However, if the aforementioned document contains sensitive information, it is blacked-out. What remains is cleansed data which gives a decent impression of what transpired sans names, places, and other top secret information.

Product and Data Science

With some frequency, Kiddom’s production database is replicated, cleansed of all PII, and transferred to a different secure database to which, among others, Periscope has access. Names, usernames, emails, security tokens, password hashes, user image keys, IPs, and many more columns are set to NULL . Not all tables replicated. What remains is an anonymous pile of information: user 238764 submitted an assignment 3904913 in class 1039 with a score of 89.3 two days before it was due. Everything the product team needs can be extracted from this sterilized version of the database without increasing the risk of Kiddom earning a spot in the data breach hall of fame.

Development and QA

At times it becomes necessary to test our code on live data. Which means handing access to a potentially buggy piece of software operated by individuals without extensive security training, thus potentially exposing the data. While security tokens and other auxiliary information can be erased easily enough, our code does not do well with NULLs in place of names and emails in particular. Luckily, anyone who’s ever created a pirate name for themselves knows what to do:

UPDATE
users
SET
name = CONCAT(
ELT(0.5 + RAND()*3, "Jack", "Roger", "Henry"), " ",
ELT(0.5 + RAND()*3, "Sparrow", "Blackbeard", "Morgan"), " ",
ELT(0.5 + RAND()*3, "Ph.D.", "M.D.", "Esq.")
);

The query we use provides quite a bit more entropy, but you get the idea. Emails are a bit trickier because unlike names they serve more than just a cosmetic purpose: users are notified of an assignment due, for instance. With nominal trickery, emails can be anonymized, unique, and useful:

UPDATE
users
SET
email = CONCAT("qa+", REPLACE(name, " ", "_"), "@kiddom.co");

Where name is the result of the previous query.

Postscriptum

This is the first installment of our Privacy Series, where we discuss the steps taken in order to safeguard the information of thousands of users entrusted to us by teachers, parents, principals, and school districts.

--

--