The personal data of about 146,000 IU students and recent graduates, which IU stored in an insecure location for 11 months, was accessed by webcrawlers from Google, a science-specific search engine called Scirus and a Chinese search engine called Baidu during that period, IU spokesperson Mark Bruhn said.
According to an IU news release, IU officials notified the Indiana attorney general’s office Tuesday of the potential exposure of personal data, including names and Social Security numbers, for students enrolled across seven IU campuses from 2011 to 2014.
Webcrawlers are automated data mining applications used by search engines to traverse the web and download files.
Similar to Google, Baidu generates revenue by offering online marketing services, according to the company’s website. Scirus has been retired, according to its website.
James Kennedy, associate vice president of university student services and systems, said staff members who access files in the IU system, such as the files that were exposed, are usually authenticated by the Central Authentication Service (CAS).
Bruhn, the IU associate vice president of public safety and institutional assurance, said the exposure was discovered by a staff member who accessed the files and realized she had not been asked for a password.
“Those aren’t files we look at every day,” Kennedy said. “She right away saw there wasn’t that layer of security there.”
The files were immediately moved to a different, more secure location, Kennedy said.
Bruhn said the University had logs to track access to the data for all 11 months the files were exposed.
“Logging is so important for that exact reason,” he said. “The logs showed nine or 10 accesses to those files during that time period.”
He said other than the accesses by Google, Baidu and Scirus, the logged accesses were department staff, authorized to access the files.
Nathaniel Husted, a doctoral student specializing in security informatics in the School of Informatics and Computing, said the University would not be able to tell if a third party accessed the exposed data from a search engine’s cache unless the companies who own the webcrawlers report the data was accessed.
“But we wouldn’t know without asking them if someone had found that on their search engine and downloaded it,” Husted said.
Bruhn said the University submitted forms to Google and Baidu requesting the companies remove the files from their cache. He said he was unsure whether the University would receive information regarding access to that cache.
“There’s more work to be done,” Bruhn said. “We’re still investigating what happened and how.”
IU spokesperson Mark Land said that as part of normal University safeguards, the files were given names and file extensions that are not indicative of the type of data in the file.
Husted said this would potentially make the information harder to find.
“We can’t just search for ‘Social Security numbers’ or ‘IU financial data’ and have Google bring it up,” he said. “If someone got lucky and typed in some information that had shown up within the file, it’s potential that those would show up.”
Kennedy said the files were saved in a zipped folder.
If the files were in a zipped folder, Husted said, the information would look garbled to a webcrawler.
Husted said the files would have to be unzipped before the information could be added to a search index and before individuals could see the contents.
Some zipped files also require a password before being unzipped. Husted said this would mitigate the threat because a strong password would make it difficult to access the file.
“If a webcrawler has accessed it, it means someone downloaded it,” Husted said.
“We’re just assuming that these people are nice people and they’re not going to do anything nefarious with it. But in the end, we have released 146,000 pieces of personal information about students, and that is a problem.”
Kennedy said the University will notify students who may have been affected by the exposure beginning Friday. There will also be a call center with experts available to answer students’ questions by Friday morning.
“We’re deeply concerned about student information,” he said.
Bruhn said there was no evidence in the University’s logs that an individual viewed the files on the University’s site and downloaded them.
However, Husted said it would not be possible for the University to control what happens to the data that was exposed and cached.
“In some ways, this is just as problematic as if someone stole the laptop with 146,000 names on it,” he said. “The point being, it has gotten out and into the world and like Pandora’s Box, you can’t really put everything back in after it’s been opened up ... we just have to hope nobody gets ahold of them that will use them improperly. ”
Follow reporter Tori Fater on Twitter @vrfater.