Everything you do online is being used to track you and guess what you’ll want to do next. Should you be scared yet?
The owner of the corner shop has known you for a long time. He
knows what you eat, that you like to drink Italian wine, and that you
usually watch action movies on Sundays. That’s how he can offer you
things that you need, like a new crime thriller, the perfect bottle for
your next party, and reserved bags of your favorite snacks when you
forget to order them.
What
sounds like a pleasant community store in the past occurs every day on
the Internet. Our “corner shop owner” is not behind the counter, but
instead runs a successful online business that offers exactly what his
customers need. He might have had to know you personally
30 years ago, but today the business’s computers simply have to analyze
your online visiting habits.
Now
imagine that man in the store is following you around, reminding you to
buy a gift for the party you said you’d be attending on Facebook last
week. Or imagine that he’s seen your status updates about starting a
diet, and starts telling you about the store’s low-fat foods section.
This is pretty much what’s happening online these days. CHIP shows how
online shops today use advanced Deep Packet Inspection to screen
customers such that they can offer exactly what the customer wants. We
also give you the lowdown on how behavior-based advertisements work with
behavioral targeting.
Online shops collect data en masse
Online
shops such as Amazon swear by one rule: get to know everything about
our customers. The more information it has, the more specific its user
profiles will be, and the more effective its advertisements. Thus,
products that one has viewed on Amazon influence the display of others.
For instance, if someone buys a Wii game console, he will be offered
accessories for it in the future.
Many
find this invasive and Amazon has had to face criticism from
individuals, activists, and even the media. German TV host Günther Jauch
famously called out the store after he once received a package with
something he called “erotic”, which had not been meant for him. Since
then he has constantly received pornographic recommendations. Though
Jauch’s surprise has given rise to plenty of jokes about his supposed
gifting ideas, it also exposes the weaknesses of this system. Amazon
does not know that the erotic product was not supposed to match with
Jauch’s profile.
Amazon
also often displays products that are not of interest to the customer—a
waste of advertising space. In one such example, Amazon displayed two
different types of refill packs for a coffee machine it was selling, in
the advertising module titled “Customers who bought this product also
bought…..” The packs did not work with this machine at all! This is
annoying for those who see an opportunity and quickly buy what looks
like a good product, assuming it matches.
Analyzing surfing habits
Behavioral
Targeting techniques are an evolution of this idea, which many
marketing professionals consider a wonder weapon. Behavior-based
advertisement displays take into account where the user comes from,
which websites he has visited previously, and what he has clicked on.
For
a long time, Google’s AdWords service has been displaying
advertisements after detecting keywords on a web page. However since
March 2009, the search giant has also been offering behavioral targeting
and can display specific advertisements to groups of people. For
instance, if a user has been browsing through a sportswear website for a
football shirt in August, he might be shown ads for another website
with Christmas offers on similar products in December. Google itself describes its technique as using cookies which save tracking information on users' computers.
The
possibilities available to a shop through behavior-based display are as
endless as the creativity of search and marketing providers. If a
customer only clicks on special offers, the online shop can even
discourage him by directing him to a slow server in the future and
spoiling the fun of bargain shopping. In addition, the dealer puts the
customer at a disadvantage by not displaying advertisements related to
special offers. These will be shown only to customers they want to
reward!
Online
shops also apply marketing tips from the real world. For instance, if a
retailer wants to attract only well-to-do customers, leaflets with
attractive offers are only put in mailboxes in upmarket areas with
well-situated residents. Similarly, one can use geolocation information
to analyze the place of origin of a surfer and recommend specific offers
to him or her. The coordinates obtained through IP address identification on the Internet are very fine-grained, but modern
cellphones and certain desktop browsers now supply precise GPS
locations, which can even be used to guess the financial behavioral
pattern of any surfer.
In-depth analysis divulges too much information
Deep Packet Inspection, or DPI in short, is a technological continuation of this personalized advertisement strategy. While
theoretically a surfer can avoid behavioral targeting by not allowing
any cookies, DPI traces a user’s activities on the Internet as if he or
she is under surveillance. In theory, every website that is called up
can be recorded; every mail can be scanned in real time—and with the
help of keywords found in these, an individual profile can be created
through which advertisers can send users specific offers. For instance,
if an advertiser detects a number of messages to a car dealer from a
customer inquiring about certain accessories, advertisements for those
very products can be inserted in advertisement spaces as he or she
browses the Web. However, online shops cannot use DPI by themselves;
they need Internet service providers to offer it, but they seem to be
cautious of violating user privacy agreements. Governments will soon be
forced to formulate policies to regulate this practice.
The technology is not new. It is already being used for things like filtering viruses and spam. ISPs
normally look at the IP headers of data packets in transit (in which
the sender and the recipient IP addresses are mentioned), which means
they can easily use the same techniques to search through an entire
packet. This way the provider gets an insight into the actual data that
is being sent and received.
DPI
can be misused, but there are no cases that could be cause for any
alarm at present. It possible for providers to analyze data traffic, and
manipulate it as well—just like cybercriminals do when they attempt to
send malicious code to a victim.
When
a user calls up a website, he receives more than just its source code.
The Internet service provider can use DPI to slip in JavaScript that
displays an advertisement, even if the website owner designs his/her
website advertisement-free. Spyware installed on your computer can also
do this. In the worst case scenario, a website owner is not even aware
that an advertisement has been embedded into his site. ISPs could also
determine which users are generating the most peer-to-peer file sharing
traffic, and which are using their service mostly for email, leading to
bandwidth throttling.
In
2009, British Telecom started the first tests with a DPI service
provider, Phorm, without informing its customers. In Germany, TMobile
and Vodafone allegedly manipulate cellular data traffic and assign
identifying codes to users. The argument is that this enables faster
access to frequently used websites, but the downside is that the
injected JavaScript can sometimes lead to errors in displaying sites in a
browser. In addition, DPI has a negative connotation since it is used
for monitoring and manipulating specific Web content—countries such as
China and Iran use it to filter and censor the Web, which is alarming
for free-speech advocates and political campaigners.
Advertisers
do not always need ingenious techniques to get information about
surfers from the Internet. Users voluntarily give away plenty of
information too. For instance, Amazon users can create wishlists in
which they save products they desire but do not own yet. Friends
can have a look, to order the products and send them to the creator of
the list as gifts. What many do not know is that if one isn’t careful
with his or her Amazon settings, the wishlists become public and the
whole world can access them through search engines.
Web 2.0 follows specific identities
Social
networks are also ideal data sources for marketing professionals. Data
collectors have been known to make the most of Facebook with its open
API. One can program applications that convince users to grant them
access to personal information, including details about their other
friends. Other less ethical means include persuading people to add a
fake profile as a “friend”, thereby granting it access to more of your
user profile, which most people leave totally visible to their friends.
Through the Facebook API, programmers can access information about
members, including details such as their employers, religious
affiliations, and sexual orientation. According to the Facebook
developer Wiki, applications can access over 50 sets of user
information—which is interesting for marketers and hackers alike.
American
students of MIT at successful in programming a “radar” system for
Facebook, which can analyze the information stored in a user’s friends’
profiles to draw conclusions about that person, even if his own settings
made all information private. This should be a warning sign for users
of a community not to publish too much of their real lives online. Most
importantly, people need to be cautious about the kinds of applications,
games and quizzes they click on, since doing so grants all of them
access to one’s personal information.
While
Facebook is a superb example, all of this also applies to other
services that identify individuals, such as OpenID and Google Accounts.
These let users log into dozens of websites with a single username and
password. For example, with a valid Facebook account, members can use
the Facebook Connect system to log in to the video sharing portal Vimeo
which also lets you publish your “likes” on your wall. This is easy for
users and opens up new ways for companies to court customers if they are
ethical. Online shops are experimenting with ways to display products
that friends have bought or looked at often (although this famously
spoiled many people’s Christmas shopping surprises when Facebook
demonstrated the capability with its highly criticized and short-lived
Beacon advertising program in late 2007).
Another
example is a promotional online trailer for the videogame Prototype,
which came out in mid 2009. Those who used Facebook Connect suddenly
found themselves becoming part of the trailer! It accessed
users’ names, photos and professional backgrounds through their
Facebook profiles and integrated this information into scenes in the
trailer.
Users become advertising figures
People
are more receptive to recommendations from friends than from strangers,
so companies try reaching customers personally by creating so-called
fansites. Any user can, for instance, become fans of products, people,
companies, and even designs. With Facebook’s Open Graph tool, companies
even have the opportunity to put advertisements on external websites to
receive testimonials from members of the fansite, and gain advertising
exposure through the profile picture.
People
who recommend products of their own accord are particularly of interest
to online shops. Economists have conducted research on filtering out
these opinion makers in the populations of online networks through
community analysis. There are plenty of scenarios for such
identification services to thrive in, when advertisers start linking
information from the digital world with the real.
Data and the Google juggernaut
Of
course no discussion of privacy online is complete without analyzing
Google’s data-mining habits. The search giant is in a position to use
its multiple online properties to gather amazing amounts of information,
and possibly even link these profiles to individuals in the real world.
The company’s motto has long been “Don’t be evil”, but it’s difficult
to ascertain what exactly the company considers to be within this limit
and what is too much. Incidents of anti-Google dissent are growing more
common, from strangers being able to follow you on Google Wave, to
protests in the publishing industry against the mass digitization of
books, to rumblings of antitrust cases because of the company’s
dominance in online advertising. Jeff Jarvis, blogger and author of the
book “What would Google do?” sharply criticizes the company for a
product called Sidewiki which collects user comments about websites and
saves them on Google servers. The site operators themselves, and the
furious Jeff Jarvis, have no control over it. Google copies entire
libraries, and has detailed photographs of the entire planet, covering
all countries and cities, many streets and houses, the oceans, the Moon
and Mars. Google offers an operating system for mobile phones, and soon
there will also be one for netbooks. Google says "It is our mission to
organize the information of the world and to make it accessible and
usable worldwide”.
The
information of the world also includes health data. Google has for
example invested in the start-up 23andMe, run by Sergey Brin's wife Anne
Wojcicki, which offers genetic analysis for anyone. Will we one day be
able to run a search to find out which illnesses we are predisposed to?
One
can also see it as part of a strategy to be omnipresent on the Web. In
order to submit and read comments in Sidewiki, the Google toolbar must
be installed. This piece of software doesn't have a very good
reputation, and continuously provides Google with user information,
linking information on sites you surf to Google services, such as
addresses in Google Maps. On the sidelines of the 2009 Frankfurt Book
Fair, Google announced its entry into the digital book business. Google
soon intends to digitize every book in the world!
Even
the Chrome browser doesn’t have a clean record when it comes to
privacy—it identifies each user with a unique ID. As of version 4.1, the
ID is purged when a user first downloads an update, but it should not
be there at all.
Brilliant
ideas underlie most Google services. They are easy to use, technically
solid, and best of all, they’re nearly all free. Google does a lot of
good as a company by investing in alternative energy production and
giving employees an allowance if they buy a hybrid car. But Google is
also greedy for data. It commands the largest Web index available, and
has insight into every website, photo and video.
“We
are building a mirror world,” said Marissa Mayer, head of Google
Search, a few years ago at the Digital Life Design Conference in Munich.
The company is setting up a digital copy of our world. It records down
to the last detail, how we move in it.
Google tracks 80 percent of all websites
One
can hardly elude Google today. It does not help if you stop using
Google Search, YouTube, Picasa or even the services requiring
registration like GMail, Docs and Calendar. With its astoundingly wide
network, Google is present on 80 percent of all websites—for lay persons
often invisibly. After its acquisition of advertising network
DoubleClick, around half of all ad banners on the Web originate from
Google servers. The more inconspicuous, but still more widely spread
text ads come from Google AdWords as well. The Google Analytics service
works completely secretly, allowing website operators to analyze the
click-paths of their visitors. Whenever a surfer lands on a site that
uses this service, Google sets a cookie with a unique ID and records his
or her IP address. Thanks to its super dense network, Google can then
see exactly who moves how on the Internet. Every click or search query
generates a log entry with an IP address and unique cookie
ID as well as a time stamp. The log file of YouTube until mid 2008
alone was over 12 Terabytes in size.
For
database security as well as privacy concerns, the different databases
for each Google service are not necessarily tied to each other, but it
is technically possible and Google certainly has to have the know-how.
Even when there are no actual names, the records have enough parts to
piece together a picture of the person who is sitting at a PC, where he
lives, what interests he has, and how much money he spends. Google
justifies its passion for collection by being able to improve its
services with the data. Only in this way can it know how to show
personalized relevant search results, or ads that users are more likely
to click on.
How much is too much?
One
can pick up interesting tidbits from the official company blogs, such
as the fact that some employees are excited about the idea of building a
3D model of every building ever built on the planet. Google Building
Maker already makes the required tools available. On an academic level,
most of those working at high levels in the company are IT pros,
mathematicians and statisticians—most of them toppers from prestigious
universities. For them the masses of data collected are like toys with
which they can run riot. They work on them as if possessed, to write
algorithms which recognize patterns and structures in what seems like
random chaos. There are no limits. Suggestions for projects which might
seem outlandish are particularly welcome at Google. Lars Reppesgaard
quotes a Google software engineer in his book The Google Empire: “One
day, someone suggests some wild endeavor for which he needs a few
thousand computers, and you say ‘OK, you’ve got it’.” Usually it takes
new employees a couple of months to get so far, but the moment can come
anytime.
Deep Packet Inspection
China
uses it for Internet monitoring, the same way as Tunisia and Iran. Deep
Packet Inspection (DPI) has become an explosive topic since it first
started being used not only used for Internet security, but also for
on-the-fly-manipulation of websites, be it to silence political
dissidents or display personalized advertisements.
Personal
privacy becomes a concern when companies start matching individuals to
the profiles they generate online. Serving advertisements by harnessing
this knowledge is a questionable practice in terms of data protection
regulation, and most countries’ legal systems see this as a gray area.
However, certain cases have come up in courts of law. The EU commission
has initiated proceedings against Great Britain since it failed to
prevent British Telecom from violating its user privacy guidelines. The
fact that it managed to display advertisements through Phorm without
users’ consent implies that the UK’s own laws do not have this kind of
protection mechanism in place. Most countries’ laws have only limited
control over what people do with the data floating out there, since they
can hardly keep pace with the development of new technology.
However
at least some countries, for example Germany, are becoming aware of the
problem and have begun to enact laws precluding general-purpose
monitoring of citizens through DPI.
What it means for users
Privacy
sometimes takes a backseat when it could slow down innovative thinking.
In the midst of protests about Google parsing its users’ email to show
related ads, founders Larry Page and Sergey Brin answered: “That is
automated. No one watches, so we don’t believe that personal privacy is
affected”
Data
that doesn’t include specific private information can still be enough
to personally identify you. One does not need to read crude conspiracy
theories to imagine how interesting such data could be for the world’s
governments, which are already overzealous about protecting their
national security. Some agencies already monitor the eating preferences
of airline passengers to filter cultural influences. What if new laws
compel Amazon and Google to disclose their log files to prosecutors and
intelligence agencies? Each innocent-looking mouse click would gain even
greater importance; way beyond individual privacy concerns.
source-chip.in
No comments:
Post a Comment