Handout Recsys Sac2010

8/8/2019 Handout Recsys Sac2010

1/139

Introduction to Recommender S stems

TutorialatACMSymposiumonAppliedComputing2010Sierre,Switzerland,22March2010

MarkusZanker

DietmarJannach

TUDortmund

- 1 -


2/139

MarkusZanker

AssistantprofessoratUniversityKlagenfurt

CEOofConfi WorksGmbH

DietmarJannach

ProfessoratTUDortmund,Germany

Researchbackground

and

interests

ApplicationofIntelligentSystemstechnologyinbusiness

Recommendersystemsimplementation&evaluation

Productconfigurationsystems

Webmining

Operationsresearch

- 2 - Dietmar Jannach and Markus Zanker


3/139

Whatarerecommendersystemsfor?

Introduction

Howdo

they

work?

o a ora ve er ng

ContentbasedFiltering

KnowledgeBasedRecommendations

HybridizationStrategies

Howtomeasuretheirsuccess?

Evaluationtechni ues

CasestudyonthemobileInternet

Selected

recent

topics

AttacksonCFRecommenderSystems

RecommenderSystemsintheSocialWeb

What to ex ect?



4/139


Introduction

Howdo

they

work?

o a ora ve er ng







Selected

recent

topics



What to ex ect?



5/139



6/139

Recommendationsystems RS helptomatchuserswithitems

Easeinformationoverload

Salesassistance uidance,advisor , ersuasion,

Differentsystemdesigns/paradigms

Basedon

availability

of

exploitable

data

Implicitandexplicituserfeedback

Goalto

identify

good

system

implementations

But:multipleoptimalitycriterionsexist



7/139

eren perspec ves aspec s

Depends ondomain andpurpose

No wholistic evaluation scenario exists

Retrieval perspective

Reduce search costs Provide correctproposals

Usersknow inadvance what they want

Recommendation perspective eren p y en y ems rom e ong a

Usersdid notknow about existence



8/139

re c on perspec ve

Predict to what degree users like anitem

Mostpopular evaluation scenario in

research

Interactionperspective

Educate users about the product domain

Convince/persuade users explain

Finally,

conversion perspective ommerc a s ua ons

Increase hit,clickthru,lookers to bookersrates

Optimize sales margins and profit



9/139

RSseen as afunction [AT05]

Given: sermo e e.g.ra ngs,pre erences, emograp cs,s ua ona con ex

Item

Relevance score

Scores responsible for ranking

Inpractical systems usually notallitems willbe scored,buttask is to find

most relevantones (selection task)



10/139

Recommender systems

reduce information overload



11/139

recommendations



12/139

o a ora ve: e me

whats popular among my



13/139

- more of the same what Ive

liked



14/139

Knowledge-based: tell me

what fits based on my



15/139

Hybrid: combinations ofvarious inputs and/or

composition of different



16/139

ros ons

Collaborative Nearly no ramp-upeffort, serendipity of

Requires some form ofrating feedback, cold start

results, learnsmarket segments

for new users and newitems

- -

to acquire, supportscomparisons

necessary, cold start fornew users, no surprises

Knowledge-based Deterministic recs,assured quality, no

cold-start can

Knowledge engineeringeffort to bootstrap,

basicall static does notresemble salesdialogue

react to short-term trends

- 16 -


17/139

Goals

,online conversion,

cost of ownership,

Improvement

Evaluation

Explorecombinationsofcollaborativeandknowledgebasedmethods

Hybridizationdesigns

Feedbackloopbyempiricalevaluations



18/139



19/139

Themostprominentapproachtogeneraterecommendations

usedbylarge,commercialecommercesites

wellunderstood,

various

al orithms

and

variations

exist

applicableinmanydomains(book,movies,DVDs,..)

Approach

use

the

"wisdom

of

the

crowd"

to

recommend

items

Basicassumptionandidea

Usersgiveratingstocatalogitems(implicitlyorexplicitly)

Customerswhohadsimilartastesinthepast,willhavesimilartastesinthe

future



20/139

Thebasictechnique:

Givenan"activeuser"(Alice)andanitemInotyetseenbyAlice

findasetofusers eers wholikedthesameitemsasAliceinthe astandwhohaverateditemI

use,e.g.theaverageoftheirratingstopredict,ifAlicewilllikeitemI

Somefirst

questions

Howdowemeasuresimilarity?

Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?

Howmanyneighborsshouldweconsider?

Howdowegenerateapredictionfromtheneighbors'ratings?

User1 3 1 2 3 3

User2 4 3 4 3 5User3 3 3 1 5 4

User4 1 5 5 2 1



21/139

a,b :users

ra,p :rating

of

user

a

for

item

p

P :setofitems,ratedbothbyaandb

Possiblesimilarityvaluesbetween 1and1


Alice 5 3 4 4 ?

User1 3 1 2 3 3

User2 4 3 4 3 5

s m = ,sim =0,70

sim = 0,79

User3 3 3 1 5 4

User4 1 5 5 2 1



22/139

Takesdifferencesinratingbehaviorintoaccount

4

5

6 Alice

User1

User4

2

3

Ratings

0

Item1 Item2 Item3 Item4

Workswellinusualdomains,comparedwithalternativemeasures

suchascosinesimilarity



23/139

Acommonpredictionfunction:

Calculate,whether

the

neighbors'

ratings

for

the

unseen

item

i are

higher

orlowerthantheiraverage

Combinetheratingdifferences usethesimilaritywithaasaweight

Add/subtractthe neighbors'biasfromtheactiveuser'saverageanduse



24/139

Notallneighborratingsmightbeequally"valuable"

Agreementoncommonlylikeditemsisnotsoinformativeasagreementon

controversialitems

Possiblesolution: Givemoreweighttoitemsthathaveahighervariance

Valueofnumberofcorateditems

Use"significanceweighting",bye.g.,linearlyreducingtheweightwhenthe

numberof

co

rated

items

is

low

Intuition:Givemoreweightto"verysimilar"neighbors,i.e.,wherethe

similarityvalueiscloseto1.

Neighborhoodselection

Usesimilaritythresholdorfixednumberofneighbors



25/139

UserbasedCFissaidtobe"memorybased"

theratingmatrixisdirectlyusedtofindneighbors/makepredictions

doesnot

scale

for

most

real

world

scenarios

largeecommercesiteshavetensofmillionsofcustomersandmillionsof

items

Modelbasedapproaches

basedonanofflinepreprocessingor"modellearning"phase

atruntime,onl thelearnedmodelisusedtomake redictions

modelsareupdated/retrainedperiodically

largevarietyoftechniquesused

mo e u ngan up a ngcan ecompu a ona yexpens ve



26/139

Basicidea:

Usethesimilaritybetweenitems(andnotusers)tomakepredictions

LookforitemsthataresimilartoItem5

TakeAlice'sratingsfortheseitemstopredicttheratingforItem5


ce

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 3 3 1 5 4

User4 1 5 5 2 1



27/139

Producesbetterresultsinitemtoitemfiltering

Ratingsareseenasvectorinndimensionalspace

Similarityiscalculatedbasedontheanglebetweenthevectors

Adjustedcosinesimilarity

takeaverageuserratingsintoaccount,transformtheoriginalratings

U:setofuserswhohaveratedbothitemsaandb



28/139

Itembasedfilteringdoesnotsolvethescalabilityproblemitself

PreprocessingapproachbyAmazon.com(in2003)

a cu a ea pa rw se ems m ar es na vance

Theneighborhoodtobeusedatruntimeistypicallyrathersmall,because

onlyitemsaretakenintoaccountwhichtheuserhasrated

Itemsimilaritiesaresupposedtobemorestablethanusersimilarities

Memoryrequirements

p o pa rw ses m ar es o ememor ze =num ero ems n

theory

Inpractice,thisissignificantlylower(itemswithnocoratings)

Furtherreductionspossible

Minimumthresholdforcoratings

Limittheneighborhoodsize(mightaffectrecommendationaccuracy)



29/139

PureCFbasedsystemsonlyrelyontheratingmatrix

Explicitratings

os common yuse o , o er responsesca es

Researchtopics

"Optimal"granularityofscale;indicationthat10pointscaleisbetteracceptedin

moviedomain

Multidimensionalratings

(multiple

ratings

per

movie)

Challenge

Usersnotalwayswillingtoratemanyitems;sparseratingmatrices

Howtostimulateuserstoratemoreitems?

mp c ra ngs

clicks,pageviews,timespentonsomepage,demodownloads

Canbeusedinadditiontoexplicitones;questionofcorrectnessofinterpretation



30/139

Coldstartproblem

Howtorecommendnewitems?Whatdorecommendtonewusers?

Ask/forceuserstorateasetofitems

Useanothermethod(e.g.,contentbased,demographicorsimplynon

personalized)intheinitialphase

Alternatives

se e era gor ms eyon neares ne g orapproac es

Example:

In

nearest

neighbor

approaches,

the

set

of

sufficiently

similar

neighbors

might

betosmalltomakegoodpredictions

Assume"transitivity"ofneighborhoods



31/139

RecursiveCF

Assumethereisaverycloseneighbornofuwhohoweverhasnotratedthe

targetitem

i yet.

Idea:

ApplyCFmethodrecursivelyandpredictaratingforitemi fortheneighbor

neighbor

Alice 5 3 4 4 ?

User1 3 1 2 3 ?

sim =0,85

User2 4 3 4 3 5

User3 3 3 1 5 4

Predictrating


User4 1 5 5 2 1

User1


32/139

"Spreadingactivation"

Idea:Usepathsoflengths>3

torecommend

items

Length3:RecommendItem3toUser1

Length5:Item1alsorecommendable



33/139

Plethoraofdifferenttechniquesproposedinthelastyears,e.g.,

Matrixfactorizationtechniques,statistics

singularvalue

decomposition,

principal

component

analysis

Associationrulemining

compare:shoppingbasketanalysis

clusteringmodels,

Bayesian

networks,

probabilistic

Latent

Semantic

Analysis

Variousothermachinelearningapproaches

Costsofpreprocessing

Usuallynotdiscussed

ncremen a

up a esposs e



34/139

LatentSemanticIndexin

developedintheinformationretrievalfield;aimsatdetectionofhidden

"factors"(topics)ofadocumentandthereductionofthedimensionality

basedonSingularValueDecomposition(SVD)

SVDbased recommendation

decomposematrix

/find

factors

factorscanbegenre,actorsbutalsononunderstandableones

on yre a n en= o mos mpor an ac ors

canalsohelptoremovenoiseinthedata

make

recommendation

in

the

lower

dimensional

space e.g.,usenearestneighbors

HeavilyusedinNetflixprizecompetition;specificmethodsproposed



35/139

Commonlyusedforshoppingbehavioranalysis

aimsatdetectionofrulessuchas

"I acustomer urchasesbab oodthenhealsobu sdia ers

in70%ofthecases"

Associationruleminingalgorithms

candetectrulesoftheformX=>Y(e.g.,babyfood=>diapers)fromasetof

salestransactions

measureofquality:support,confidence

usedasathresholdtocutoffunimportantrules



36/139


transform5pointratingsintobinary

ratings

(1

=

above

user

average)

User1 1 0 1 0 1

User2 1 0 1 0 1

Minerulessuchas

Item1=>Item5

User3 0 0 0 1 1

User4 0 1 1 0 0

,

Makerecommendations

for

Alice

(basic

method)

Determine"relevant"rulesbasedonAlice'stransactions

(theaboverulewillberelevantasAliceboughtItem1)

ComputeunionofY'snotalreadyboughtbyAlice

'

Differentvariationspossible

dislikestatements userassociations..



37/139

Basicidea simplisticversionforillustration :

giventheuser/itemratingmatrix

determinethe

robabilit

that

user

Alice

will

like

an

item

i

basetherecommendationonsuchtheseprobabilities

CalculationofratingprobabilitiesbasedonBayes Theorem

Howprobableisratingvalue"1"forItem5givenAlice'spreviousratings?

CorrespondstoconditionalprobabilityP(Item1=1|X),where

= ' = = = =

CanbeestimatedbasedonBayes'Theorem

Assumption:Ratingsareindependent(?)



38/139


Alice 1 3 3 2 ?

User1 2 4 2 2 4

User2 1 3 3 5 1

User3 4 5 2 3 3

User4 1 1 5 2 1

Zeros(smoothingrequired),computationallyexpensive,

like/dislikesimplificationpossible



39/139

Useaclusterbasedapproach

assumeusersfallinasmallnumberofsubgroups(clusters)

Make

redictionsbased

on

estimates

probabilityofAlicefallingintoclusterc

probabilityofAlicelikingitemi givenacertainclusterandherpreviousratings

Numberof

classes

and

model

parameters

have

to

be

learned

from

data

in

advance(EMalgorithm)

Others:

BayesianNetworks,ProbabilisticLatentSemanticAnalysis,.

Emp r ca ana ys ss ows:

Probabilisticmethodsleadtorelativelygoodresults(moviedomain)

Noconsistentwinner;smallmemor foot rintofnetworkmodel



40/139

Pros:

wellunderstood,workswellinsomedomains,noknowledgeengineeringrequired

requiresusercommunity,sparsityproblems,nointegrationofotherknowledgesources,

noexplanationofresults

WhatisthebestCFmethod?

Inwhich

situation

and

which

domain?

Inconsistent

findings;

always

the

same

domains

anddatasets;Differencesbetweenmethodsareoftenverysmall(1/100)

Howtoevaluatethepredictionquality?

MAE/RMSE:WhatdoesanMAEof0.7actuallymean?

Whataboutmultidimensionalratings?



41/139



42/139

WhileCF methodsdonotrequireanyinformationabouttheitems,

itmightbereasonabletoexploitsuchinformation;and

recommendfantasy

novels

to

people

who

liked

fantasy

novels

in

the

past

Whatdoweneed:

someinformationabouttheavailableitemssuchasthegenre("content")

somesortofuserprofile describingwhattheuserlikes(thepreferences)

Thetask:

locate/recommenditemsthatare"similar"totheuserpreferences



43/139

" "

Thegenreisactuallynotpartofthecontentofabook

MostCBrecommendationmethodsoriginatefromInformationRetrieval

goalistofindandrankinterestingtextdocuments(newsarticles,webpages)

theitemdescriptionsareusuallyautomaticallyextracted(importantwords)

Fuzzy

border

between

content

based

and

"knowledge

based"

RS Here:

classicalIR basedmethodsbasedonkeywords

nomeansendsrecommendationknowledgeinvolved



44/139

Simpleapproach

Computethesimilarityofanunseenitemwiththeuserprofilebasedonthe

. .

Oruseandcombinemultiplemetrics



45/139

Simplekeywordrepresentationhasitsproblems

inparticularwhenautomaticallyextractedas

notevery

word

has

similar

importance

longerdocumentshaveahigherchancetohaveanoverlapwiththeuserprofile

Standardmeasure:TFIDF

EncodestextdocumentsinmultidimensionalEuclidianspace

weightedterm

vector

TF:Measures,howoftenaterma ears densit inadocument

assumingthatimportanttermsappearmoreoften

normalizationhastobedoneinordertotakedocumentlengthintoaccount

Givenakeywordi andadocumentj

TFIDF(i,j)=TF(i,j)*IDF(i)



46/139


.


47/139

Vectorsareusuallylongandsparse

Improvements

removes opwor s a , e ,..

usestemming

sizecutoffs(onlyusetopnmostrepresentativewords,e.g.around100)

uselexicalknowledge,usemoreelaboratemethodsforfeatureselection

detectionofphrasesasterms(suchasUnitedNations)

m tat ons

semanticmeaningremainsunknown

example:

usage

of

a

word

in

a

negative

context "thereisnothingonthemenuthatavegetarianwouldlike.."

Usualsimilaritymetrictocomparevectors:Cosinesimilarity(angle)



48/139

Simplemethod:nearestneighbors

GivenasetofdocumentsDalreadyratedbytheuser(like/dislike)

Findthe

n

nearest

neighbors

of

an

not

yet

seen

item

i in

D

Taketheseratingstopredictarating/votefori

(Variations:neighborhoodsize,lower/uppersimilaritythresholds..)

Usedin

combination

with

method

to

model

long

term

preferences

Quer basedretrieval:Rocchio's method

TheSMARTSystem:Usersareallowedtorate(relevant/irrelevant)retrieved

documents(feedback)

Queriesarethenautomaticallyextendedwithadditionalterms/weightof

relevantdocuments



49/139

DocumentcollectionsD+ andD

,, usedtofinetune oftenonlypositivefeedback

isused



50/139


51/139

Sidenote:Conditionalindependenceofeventsdoesinfactnothold

"NewYork","HongKong"

Still,

oodaccurac

can

be

achieved

Booleanrepresentationsimplistic

positionalindependenceassumed

keywordcountslost

Moreelaborateprobabilisticmethods

e.g.,estimateprobabilityoftermvoccurringinadocumentofclassCby

relativefrequencyofvinalldocumentsoftheclass

SupportVectorMachines,..

Useotherinformationretrievalmethods(usedbysearchengines..)



52/139

Keywordsalonemaynotbesufficienttojudgequality relevanceofa

documentorwebpage

up

to

dateness,

usability,

aesthetics,

writing

style

contentmayalsobelimited/tooshort

contentmaynotbeautomaticallyextractable(multimedia)

ampupp aserequ re

Sometraining

data

is

still

required

Web2.0:Useothersourcestolearntheuserpreferences

Overspecialization

Algorithmstendtopropose"moreofthesame"

Or:too

similar

news

items



53/139


54/139

Conversationalinteractionstrategy

Opposedtooneshotinteraction

Elicitationof

user

re uirements

Transferofproductknowledge(educatingusers)

Explicit

domain

knowledge Requirementselicitationfromdomainexperts

Systemmimicsthebehaviourofexperiencedsalesassistant

Bestpracticesalesinteractions

Can

guarantee

correct

recommendations

(determinism)

- 54 -

Dietmar Jannach and Markus Zanker


55/139


56/139


57/139

now e ge ase

Usuallymediatesbetweenusermodelanditemproperties

Variables Usermodelfeatures(requirements),Itemfeatures(catalogue)

Setofconstraints

B)

Hardand

soft/weighted

constraints

Solution references

Deriveasetofrecommendableitems

Fulfillingsetofapplicableconstraints

Applicabilityof

constraints

depends

on

current

user

model

Explanations transparentlineofreasoning

- 57 -



58/139

Severa erentrecommen at ontas s:

Findasetofuserrequirementssuchthatasubsetofitemsfulfills allconstraints

Askuser

which

re uirements

should

be

relaxed/modified

such

that

some

items

exist

that

do

not

violateanyconstraint(e.g.Trip@dvice [MRB04])

Similar to findamaximally succeeding subquery (XSS)[McSherry05]

Allproposed items have to fulfill the sameset ofconstraints (e.g.[FFJ+06])

Com ute relaxations based on redetermined wei hts

Rankitemsaccordingtoweightsofsatisfiedsoftconstraints

an

ems

ase

on

era o

o

u e

cons ra n s

Doesnotrequireadditionalrankingscheme

- 58 -



59/139

Powershot XYWeight LHS RHS

=

Know ledge Base: Product catalogue:

Lower focal length 35

Upper focal length 140

.

C2: 20 Motives = Landscape Low. foc. Length =28mmandPrice>350EUR

Computationofminimalrevisionsofrequirements

Eventuallyguidedbysomepredefinedweightsorpast communitybehavior

- 60 -



61/139

Twomaximallysucceedingsubqueries

XSS={{C1},{C2,C3}}

Selectioncan

be

based

on

constraints

wei hts

RelaxC1andrecommendLumix

- 61 -



62/139

Applicable: LHS(c) is satisfied by user model, i.e. {C1,C2,C3}

Satisfied: not applicable or RHS(c) is satisfied by catalogue item,i.e. {C1} for Powershot and {C2,C3} for Lumix

Onl items that satisf all hard constraints receive ositive score(fulfilled for both)

Ratio of penalty values of satisfied constraints

Ranked #1: Lumix 35/ 60

Ranked #2: Panasonic: 25/ 60

Cutoff recommendation list after n items

- 62 -



63/139

Morevariants ofconstraint representation:

Interactiveacquisition ofsolution preferences

. .

To explore cheaper variants ofcurrent proposal

Max.cost of350EURfor Canon brand initially specified,higher price

sensitivity for Panasonic brand?

Aging/Outdating ofolder

preferences

Construction ofdecision model/tradeoffanalysis

Disjunctive RHS

IFlocation requ.=nearbyTHENlocation =Ktn ORlocation =Stmk

.g.

e

erma

spa s ou e oca e e er n

ar n a or yr a

- 63 -



64/139

Morevariants ofrecommendation task

Finddiversesets ofitems

Notion

of

similarity/dissimilarity

Idea that users navigate aproduct space

If recommendations are more diversethan users can navigate viacritiques on

recommended entr ointsmore efficientl less ste s ofinteraction

Bundling ofrecommendations

E.g.travel packages,skin care treatments or financial portfolios

RSfor differentitemcategories,CSPrestricts configuring ofbundles

- 64 -



65/139

Findoptimalsequence ofconversational moves

Recommendation is less about optimalalgorithms,butmore about

Asking for requirements,proposing items (that can be critiqued)or

showing explanatory texts are allconversational moves

Interactionprocess towards preference elicitation andmaybe also

user persuasion

- 65 -



66/139

Costofknowledgeacquisition

Fromdomainexperts

Fromusers

Fromwebresources

Accuracyofpreferencemodels

Veryfine

granular

preference

models

require

many

interaction

cycles

Independenceassumptioncanbechallenged

Preferencesarenotalwaysindependentfromeachother

E.g.asymmetricdominanceeffectsandDecoyitems

- 66 -



67/139

- 67 -



68/139

A t ree asetec n quesarenatura y ncorporate yagoo sa esass stance

(atdifferentstagesofthesalesact)buthavetheirshortcomings

Ideaofcrossingtwo(ormore)species/implementations

hybrida [lat.]:denotesanobjectmadebycombiningtwodifferentelements

Avoidsomeoftheshortcomings

Reachdesirable

properties

not

(or

only

inconsistently)

present

in

parent

individuals

Differenthybridizationdesigns

Paralleluseofseveralsystems

Monolithicexploiting

different

features

Pipelinedinvocationofdifferentsystems

- 68 -



69/139

Onlyasinglerecommendationcomponent

Hybridizationisvirtualinthesensethat

Features/knowledgesourcesofdifferentparadigmsarecombined

- 69 -



70/139

Combinationofseveralknowledgesources

E.g.:Ratingsanduserdemographicsorexplicitrequirementsandneedsused

for

similarity

computation

H bridcontentfeatures:

Socialfeatures:Movieslikedbyuser

Contentfeatures:Comedieslikedbyuser,dramaslikedbyuser

y r ea ures:user esmanymov es a arecome es,

thecommonknowledgeengineeringeffortthatinvolvesinventinggood

featurestoenablesuccessfullearning[BHC98]

- 70 -



71/139

Contentboostedcollaborativefiltering MMN02

Basedoncontentfeaturesadditionalratingsarecreated

E. .Alice

likes

Items

1

and

3

unar

ratin s

Item7issimilarto1and3byadegreeof0,75

ThusAlicelikesItem7by0,75

Significance

weighting

and

adjustment

factors Peerswithmorecorateditemsaremoreimportant

Higherconfidenceincontentbasedprediction,ifhighernumberofown

ratings

+

Citationsinterpretedascollaborativerecommendations

- 71 -



72/139

Outputofseveralexistingimplementationscombined

Leastinvasivedesign

Someweightingorvotingscheme

Weightscanbelearneddynamically

- 72 -



73/139

Compute weig te sum:

Recommender 2Recommender 1tem 0.8 2

Item2 0.9 1

Item3 0.4 3

Item4 0

Item1 0.5 1

Item2 0

Item3 0.3 2

Item4 0.1 3tem 0Item5 0

Item1 0,65 1

Recommender weighted (0.5:0.5)

em ,Item3 0,35 3

Item4 0,05 4

Item5 0,00

- 73 -



74/139

BUT,howtoderiveweights?

Estimate,e.g.byempiricalbootstrapping

Historicdataisneeded

Computedifferentweightings

Decidewhichonedoesbest

ynam ca us men

o

we g s

Startwithforinstanceuniformweightdistribution

Foreachuseradaptweightstominimizeerrorofprediction

- 74 -



75/139

LetsassumeAliceactuallybought clickedonitems1and4

IdentifyweightingthatminimizesMeanAbsoluteError(MAE)

-

Beta1 Beta2 rec1 rec2 error MAE

Item1 0,5 0,8 0,23Item4 0,1 0,0 0,99 0,610,1 0,9

em , , ,

Item4 0,1 0,0 0,97

Item1 0,5 0,8 0,35

Item4 0,1 0,0 0,95

0,63

0,650,5

0,3 0,7

0,5

Item1 0,5 0,8 0,41

Item4 0,1 0,0 0,93

Item1 0,5 0,8 0,47

0 1 0 0 0 91

0,670,7 0,3

- 75 -


,, ,


76/139

BUT:didntrec1actuallyrankItems1and4higher?

Item1 0.8 2

Recommender 2

Item1 0.5 1

Recommender 1

tem .

Item3 0.4 3

Item4 0

Item5 0

Item2 0

Item3 0.3 2

Item4 0.1 3

Item5 0

Becarefulwhenweighting!

Recommendersneedtoassigncomparablescoresoverallusersanditems

Somescore

trans ormation

cou

e

necessary

Stableweightsrequireseveraluserratings

- 76 -



77/139


78/139


79/139

Successorsrecommendationsarerestrictedbypredecessor

Whereforall k>1

Subsequentrecommendermaynotintroduceadditionalitems

Thusproducesverypreciseresults

- 79 -



80/139

Recommender 2Recommender 1em .

Item2 0.9 1

Item3 0.4 3

Item4 0

Item1 0.5 1

Item2 0

Item3 0.3 2

Item4 0.1 3

temItem5 0

Recommender cascaded (rec1, rec2)

,

Item2 0,00

Item3 0,40 2

Item4 0,00

Item5 0 00

Recommendationlistiscontinuallyreduced

Firstrecommenderexcludesitems

Removeabsolute

no

go

items

(e.g.

knowledge

based)

Secondrecommenderassignsscore

- 80 -


. .


81/139

uccessorexp o samo e e a u ypre ecessor

xamp es:

Fab:

Onlinenewsdomain

CBrecommenderbuildsusermodelsbasedonweightedtermvectors

CFidentifiessimilarpeersbasedontheseusermodelsbutmakesrecommendationsbasedonratings

o a ora vecons ra n

ase

me a

eve

Collaborativefilteringlearnsaconstraintbase

KnowledgebasedRScomputesrecommendations

- 81 -



82/139

Onlyfewworksthatcomparestrategiesfromthemetaperspective

Likeforinstance,[Burke02]

Most

datasets

do

not

allow

to

com are

different

recommendation

aradi ms i.e.ratings,requirements,itemfeatures,domainknowledge,critiquesrarely

availableinasingledataset

Monolithic:somepreprocessingefforttradedinformoreknowledgeincluded

Parallel:requires

careful

matching

of

scores

from

different

predictors

:

Netflixcompetition stackingrecommendersystems

Adaptiveswitchingofweightsbasedonusermodel,contextandmeta

features

- 82 -



83/139

- 83 -



84/139

Amyriadoftechniqueshasbeenproposed,but

Whichoneisbestinagivenapplicationdomain?

What

are

the

success

factors

of

different

techni ues? Comparativeanalysisbasedonanoptimalitycriterion?

Researchquestionsare:

IsaRSefficientwithrespecttoaspecificcriterialikeaccuracy,user

sa s ac on,response me,seren p y,on neconvers on,rampupe or s,

.

Docustomerslike/buyrecommendeditems?

Docustomers

buy

items

they

otherwise

would

have

not?

Aretheysatisfiedwitharecommendationafterpurchase?

- 84 -



85/139

eren perspec ves aspec s

Depends ondomain andpurpose

No wholistic evaluation scenario exists

Retrieval perspective

Reduce search costs

Provide correctproposals

Usersknow inadvance what they want

Recommendation perspective

eren p y Usersdid notknow about existence

- 85 -



86/139

re c on perspec ve

Predict to what degree users like anitem

Mostpopular evaluation scenario inresearch

Interactionperspective

Educate users about the product domain

Persuade users as anintentional

planned effect!?

Finally,conversion perspective

ommerc a

s ua ons Increase hit,clickthru,lookers to bookersrates

Optimize sales margins and profit

- 86 -



87/139

Characterizingdimensions:

Whoisthesubject thatisinthefocusofresearch?

Whatresearchmethodsarea lied?

Inwhichsetting doestheresearchtakeplace?

Subject Online customers, students, historicalonline sessions, computers,

Research method Experiments, quasi-experiments, non-experimental research

Settin Lab real-world scenarios

- 87 -



88/139

Experimentalvs.nonexperimental observational researchmethods

Experiment(test,trial):

Anexperimentisastudyinwhichatleastonevariableismanipulatedand

unitsarerandomlyassignedtodifferentlevelsorcategoriesofmanipulated

variable(s).

Units:users,historicsessions,

Manipulatedvariable:

type

of

RS,

recommended

items,

Cate ories of mani ulated variable s : contentbased RS collaborative RS

- 88 -



89/139

- 89 -



90/139

MeanAbsoluteError MAE computesthedeviationbetweenpredicted

ratingsandactualratings

RootMean

Square

Error

(RMSE)

is

similar

to

MAE,

but

places

more

- 90 -


, ,


91/139

PrecisionandRecall,twostandardmeasuresfromInformationRetrieval,

aretwoofthemostbasicmethodsforevaluatingrecommendersystems

E.g.Considerthemoviepredictionsmadebyasimplifiedrecommender

thatclassesmoviesas oodorbad

Theycanbesplitintofourgroups:

Realit

ActuallyGood ActuallyBad

on Rated TruePositive(tp) FalsePositive(fp)

Predict

i

ooRated

Bad

FalseNegative (fn) True Negative(tn)



92/139

Precision:ameasureofexactness,determinesthefractionofrelevant

itemsretrievedoutofallitemsretrieved

E.g.theproportionofrecommendedmoviesthatareactuallygood

Recall:a

measure

of

completeness,

determines

the

fraction

of

relevant

E.g.theproportionofallgoodmoviesrecommended



93/139

ne exper mentat on n ne exper mentat on

Historic session Live interaction

Ratings, transactions Ratings, feedback

,interpreted as dislikes

items unknown

determined

Better for estimating Recall Better for estimating Precision



94/139

RankScoreextendstherecallmetrictotakethepositionsofcorrect

itemsinarankedlistintoaccount

Particularlyimportantinrecommendersystemsaslowerrankeditemsmaybe

overlookedby

users

RankScoreisdefinedastheratiooftheRankScoreofthecorrectitems

o es eore ca an coreac eva e or euser, .e.

h isthesetofcorrectlyrecommendeditems,i.e.hits

rankreturnstheposition(rank)ofanitem

Tisthesetofallitemsofinterest

isthe

rankinghalflife



95/139

Net ix competition Web-based movie rental

Prize of $1,000,000 for accuracy

improvement of 10% compared to ownCinematch system.

s or ca a ase

~480K users rated ~18K movies on ascale of 1 to 5

~100M ratings

Last 9 ratings/user withheld Probe set for teams for evaluation

Quiz set evaluates teams submissions

Test set used by Netflix to determinewinner

- 95 -


96/139

Winning team combined more than 100 different predictors

Small group of controversial movies responsible for high share of error rate

E. . sin ular value decom osition, a techni ue for derivin the underl in

factors that cause viewers to like or dislike movies, can be used to find

connections between movies

Interestin l most teams used similar rediction techni ues

Very complex and specialized models

Switching of model parameters based on user/session features

Number of rated items

Content features added noise



97/139


98/139

Quasiexperiments

Lackrandomassignmentsofunitstodifferenttreatments

Nonexperimental/observationalresearch

Longitudinalresearch

Observationsover

long

period

of

time

.g. ustomer et meva ue,return ngcustomers

Casestudies

Focusgroup

Interviews

Thinkaloudprotocols



99/139

SkiMatcherResortFinderintroducedbySkiEurope.comtoprovideusers

withrecommendationsbasedontheirpreferences

ConversationalRS

questionandanswerdialog

matchingofuserpreferenceswithknowledgebase

e ga oan av soneva ua e e

effectivenessoftherecommenderovera

4monthperiodin2001

Classifiedas

a

quasi

experiment

asusersdecideforthemselvesifthey

wanttousetherecommenderornot



100/139

u y ugus ep em er c o er

UniqueVisitors 10,714 15,560 18,317 24,416

SkiMatcherUsers 1,027 1,673 1,878 2,558

NonSkiMatcher Users 9,687 13,887 16,439 21,858

RequestsforProposals 272 506 445 641

SkiMatcherUsers 75 143 161 229

NonSkiMatcher Users 197 363 284 412

Conversion 2.54% 3.25% 2.43% 2.63% SkiMatcherUsers 7.30% 8.55% 8.57% 8.95%

NonSkiMatcher Users 2.03% 2.61% 1.73% 1.88%

IncreaseinConversion 359% 327% 496% 475%

[Delgado and Davidson, ENTER 2002]



101/139

Thenatureofthisresearchdesignmeansthatquestionsofcausality

cannotbeanswered,suchas

Areusersoftherecommendersystemsmorelikelyconvert?

Doestherecommendersystemitselfcauseuserstoconvert?

However,significantcorrelationbetweenusingtherecommender

system

and

making

a

request

for

a

proposal

Sizeofeffecthasbeenreplicatedinotherdomains!

Electronicconsumerproducts



102/139


103/139

Evaluationdesigns ACMTOIS20042008

Intotal12articles onRS

50%movie domain

75%offlineexperimentation

2user experiments under labconditions

qua a veresearc

va a ty o ata eav y ases w at s one

Many tagrecommendersproposed recently

Tenorat RecSys09to foster liveexperiments

Publicinfrastructures to enable A/B

tests


What are recommender systems for?


104/139


Introduction

Howdotheywork?

o a ora ve

er ng ContentbasedFiltering



Howto

measure

their

success?



Selectedrecenttopics

Attacks

on

CF

Recommender

Systems RecommenderSystemsintheSocialWeb

What to ex ect?



105/139



106/139

TheMovieLensdataset,others

FocusonimprovingtheMeanAbsoluteError

Nearlynorealworldstudies

Exceptions,e.g.,Diasetal.,2008.

eGrocerapplication

CF

method Shortterm:belowone ercent

Longterm,indirecteffectsimportant

Thisstudy

Measuringimpact

of

different

RS

algorithms

in

Mobile

Internet

scenario

Morethan3%moresalesthroughpersonalizeditemordering



107/139

Gamedownloadplatformoftelco provider

Accessviamobilephone

directdownload,char edtomonthl statement

lowcostitems(0.99centtofewEuro)

Extensiontoexistingplatform

"Myrecommendations"

Incategory

personalization

(where

applicable)

,

Controlgroup

naturaloreditorialitemranking

no"My

Recommendations"


.


108/139

6recommendationalgorithms,1controlgroup A Btest

CF(itemitem,SlopeOne),Contentbasedfiltering,SwitchingCF/Content

basedhybrid,toprating,topselling

Testperiod:

4weeksevaluationperiod

About150,000usersassignedrandomlytodifferentgroups

Onlyexperiencedusers

H1:Pers.recommendationsstimulate moreuserstoviewitems

H2:Person.recommendationsturn morevisitorsintobuyers

H3:Pers.

recommendations

stimulate

individual users to

view

more

items

H3:Pers.recommendationsstimulateindividual users tobuymoreitems



109/139

Clickandpurchasebehaviorofcustomers

Customersarealwaysloggedin

Allnavi ationactivitiesstoredins stem

Measurementstakenindifferentsituations

MyRecommendations,startpage,postsales,incategories,overalleffects

Metrics:

item

viewers/platform

visitors item urchasers latform visitors

itemviewspervisitor

purchasespervisitor

Implicit

and

explicit

ratings

Itemview,itempurchase,explicitratings


" "


110/139

Itemviews customer Purchases customer

Itemviews:

ExceptSlopeOne,allpersonalizedRSoutperformnonpersonalizedtechniques

Itempurchases

measura ys mu a eusers o uy own oa more ems

Contentbasedmethoddoesnotworkwellhere

Conversionrates:Nostrongeffects



111/139

per visitor rate

Note: Only 2 demosin top 30 downloads

Demosandnonfreegames:

Previousfigures

counted

all

downloads

Figureshows

Personalizedtechniquescomparabletotopsellerlist

However,canstimulateinterestindemogames

Note:Ratingpossibleonlyafterdownload



112/139

Itemviews/visitor Purchases/visitor

Findings " ",

notworkwell

TopRatingandSlopeOnenearlyexclusivelystimulatedemodownloads(Not

TopSellerundcontrolgroupsellnodemos


O ll b f d l d (f f )


113/139

Overallnumberofdownloads(free+nonfreegames)

Notes:

Incategory

measurementsnot

Paygames

only

shownhere.

Contentbasedmethod

outperformsothers

in

eren ca egor es

(halfprice,newgames,

eroticgames)

Effect:3.2to3.6%sales

increase!



114/139

Mostprobablycausedbysizeofdisplays

Inaddition:Particularityofplatform;ratingonlyafterdownload

Insufficientcoverage

for

standard

CF

methods

Implicitratings

socount temv ewsan tempurc ases

IncreasethecoverageofCFalgorithms

MAEhowevernotasuitablemeasureanymoreforcomparingalgorithms

Summary

Significantsalesincreasecanbereached!(max.1%inpastwithother

activities

Morestudiesneeded,ValueofMAEmeasure

Recommendationinnavigationalcontext




115/139

Introduction

Howdotheywork?

o a ora ve

er ng ContentbasedFiltering



How

to

measure

their

success? Evaluationtechni ues





What to ex ect?



116/139



117/139

Individualsmaybeinterestedtopushsomeitemsbymanipulatingthe

recommendersystem

Individualsmight

be

interested

to

decrease

the

rank

of

other

items

Somesimplymightmaywanttosabotagethesystem..

" "

Notanewissue..

A sim le strate ?

(Automatically)createnumerousfakeaccounts/profiles

Issuehighorlowratingstothe"targetitem"

==> W notwor orne g or ase recommen ers

==> Moreelaborateattackmodelsrequired

==>Goalistoinsertprofilesthatwillappearinneighborhoodofmany


Push Nuke


118/139

Push Nuke

Notthesameeffectsobserved

Howcostlyisittomakeanattack?

Howmanyprofileshavetobeinserted?

Isknowledgeabouttheratingsmatrixrequired?

usuallyitisnotpublic,butestimatescanbemade

gor m epen a y

Istheattackdesignedforaparticularrecommendationalgorithm?

Howeasyisittodetecttheattack


Generalschemeofanattackprofile


119/139

p

Attackmodelsmainlydifferinthewaytheprofilesectionsarefilled

Randomattack

model

Takerandomvaluesforfilleritems

Typicaldistributionofratingsisknown,e.g.,forthemoviedomain

(Average

3.6,

standard

deviation

around

1.1) Limitedeffectcomparedwithmoreadvancedmodels


TheAverageAttack


120/139

g

usetheindividualitem'sratingaverageforthefilleritems

intuitivel ,thereshouldbemorenei hbors

additionalcost

involved:

find

out

the

average

rating

of

an

item

moreeffectivethanRandomAttackinuserbasedCF

Bytheway:whatdoeseffectivemean?

Possible

metrics

to

measure

the

introduced

bias

deviationingeneralaccuracyofalgorithm

Stability

c angeinpre iction oratargetitem e ore a terattac

Inaddition:rankmetrics

HowoftendoesanitemappearinTopNlists(before/after)


BandwagonAttack


121/139

g

Exploitsadditionalinformationabout thecommunityratings

Sim leidea:

Addprofiles

that

contain

high

ratings

for

"blockbusters"

(in

the

selected

items);userandomvaluesforthefilleritems

Will intuitivel lead to more nei hbors

SegmentAttack

Finditems

that

are

similar

to

the

target

item,

i.e.,

are

probably

liked

by

the

samegroupofpeople(e.g.,otherfantasynovels)

Injectprofilesthathavehighratingsforfantasynovelsandrandomorlow

ratingsforothergenres

Thus,item

will

be

pushed

within

the

relevant

community


Ingeneral


122/139

Effectdependsmainlyontheattacksize(numberoffakeprofilesinserted)

Bandwagon/AverageAttack: Biasshiftof1.5points ona5pointscaleat3%

attacksize(3%ofprofilesarefakedaftertheattack)

AverageAttackslightlybetterbutrequiresmoreknowledge

1.5pointsshiftissignificant;3%attacksizehowevermeansinsertinge.g.,

30.000profiles

into

one

million

rating

database

Itembasedrecommenders

Farmorestable;only0.15pointspredictionshiftachieved

Exception:

Segment

attack

successful

(was

designed

for

item

based

method) Hybridrecommendersandothermodelbasedalgorithmscannotbeeasily

biased(withthedescribed/knownattackmodels)


Usemodelbasedorhybridalgorithms


123/139

Increaseprofileinjectioncosts

ap c as Lowcostmanualinsertion

detectgroupsofuserswhocollaboratetopush/nukeitems

monitordevelopment

of

ratings

for

an

item

changesinaveragerating

changesinratingentropy

timedependentmetrics(bulkratings)

usemachine

learning

methods

to

discriminate

real

from

fake

profiles


Notdiscussedhere:Privacyensuringmethods


124/139

Distributedcollaborativefiltering,dataperturbation

Vulnerabilityofsomeexistingmethodsshown

Speciallydesignedattackmodelsmayalsoexistforuptonowratherstable

methods

Incorporationofmoreknowledgesources/hybridizationmayhelp

Nopublicinformationonlargescalerealworldattackavailable

Attacksizesarestillrelativelyhigh

Moreresearch

and

industry

collaboration

required



Introduction


125/139

Howdotheywork?

o a ora ve er ng










What to ex ect?



126/139


TheWeb2.0 SocialWeb


127/139

Facebook,Twitter,Flickr,

Peo leactivel contributeinformationand artici ateinsocialnetworks

Impactonrecommendersystems

Moreinformationaboutuser'sanditemsavailable

demographicinformationaboutusers

friendshiprelationships

ta sonresources

NewapplicationfieldsforRStechnology

Recommendfriends,resources(pictures,videos),oreventagstousers

==

==>Currently,

lots

of

papers

published

on

the

topic


Explicittruststatementsbetweenusers

f ( )


128/139

canbeexpressedonsomesocialwebplatforms(epinions.com)

couldbederivedfromrelationshi sonsocial latforms

Trustis

a

multi

faceted,

complex

concept

Goeshoweverbeyondan"implicit"trustnotionbasedonratingsimilarity

Exploitingtrust

information

in

RS

toimproveaccuracy(neighborhoodselection)

toincreasecoverage

couldbeusedtomakeRSrobustagainstattacks


Input

i i


129/139

ratingmatrix

ex licittrustnetwork ratin sbetween0 notrust,and1 fulltrust

Prediction

basedonusualweightedcombinationofratingsofthenearestneighbors

similarityofneighborsishoweverbasedonthetrustvalue

Note: AssumestandardPearsonCFwithmin. 3

peersandsimilaritythreshold=0.5

NorecommendationforApossible

However:Assumingthattrustistransitive,

alsotheratingofEcouldbeused

Goodforcoldstartsituations


Trust ro a ation

Variousalgorithmsandpropagationschemespossible(includingglobal


130/139

"reputation"metrics

Recommendation

accuracy

hybridscombiningsimilarityandtrustshowntobemoreaccurateinsome

experiments

SymmetryandDistrust

Trustis

not

symmetric

Howtodealwithexplicitdistruststatements?

IfAdistrustsBandBdistrusts whatdoesthistellusaboutA'srelationtoC?

va ua on

Accuracyimprovementspossible;increaseofcoverage

Notmanypubliclyavailabledatasets


CollaborativetaggingintheWeb2.0

Usersadd tags to resources (such as images)


131/139

Usersaddtagstoresources(suchasimages)

Folksonomiesarebasedonfreel usedke words e. .,onflickr.com

Note:not

as

formal

as

ontologies,

but

more

easy

to

acquire

FolksonomiesandRecommenderSystems?

Usetagstorecommenditems

UseRStechnologytorecommendtags


Ta sascontentannotations

usecontentbasedalgorithmstorecommendinterestingtags


132/139

Possibleapproach:

determinekeywords/tags

that

user

usually

uses

for

his

highly

rated

movies

findunratedmovieshavingsimilartags

Metrics:

takekeywordfrequenciesintoaccount

com areta clouds sim leoverla ofmovieta sandusercloud wei htedcomparison)

Possibleimprovements:

tagsofausercanbedifferentfromcommunitytags(plus:synonymproblem)

addsemanticallyrelatedwordstoexistingonesbasedonWordNet

information


DifferencetocontentboostedCF

tags/keywordsare not "global" annotations but local for a user


133/139

tags/keywordsarenot global annotations,butlocalforauser

,

remember,inuserbasedCF:

similarityofusersisusedtomakerecommendations

here:viewtagsasadditionalitems(0/1rating,ifuserusedatagornot);thus

similarityisalsoinfluencedbytags

likewise:in

item

based

CF,

view

tags

as

additional

users

(1,

if

item

was

labeled

withatag)

Predictions

com neuser ase an tem ase pre ct ons nawe g te approac

experimentsshowthatonlycombinationofbothhelpstoimproveaccuracy


ItemretrievalinWeb2.0applications

oftenbasedonoverlapofquerytermsanditemtags

i ffi i tf t i i th "l t il" f it


134/139

insufficientforretrievingthe"longtail"ofitems

thinkof

possible

tags

of

a

car:

"Volkswagen",

"beetle",

"red",

"cool"

Oneapproach:SocialRanking

useCFmethodstoretrieverankedlistofitemsforgivenquery

computeuserandtagsimilarities(e.g.,basedoncooccurrence)

extenduserquerywithsimilartags(improvescoverage)

rankitemsbasedon

relevanceoftagstothequery

similarityoftaggerstothecurrentuser

leadstomeasurablybettercoverageandlongtailretrieval


Remember:Users

annotate

items

very

differently

RStechnologycanbeusedtohelpusersfindappropriatetags


135/139

thus,makingtheannotationsofitemsmoreconsistent

oss eapproac :

DerivetwodimensionalprojectionsofUserXTagXResourcedata

Usenearestneighborapproachtopredictitemrating

useoneoftheprojections

Evaluation

UserTagsimilaritybetterthanUserResource

differencesondifferentdatasets;alwaysbetterthan"mostpopular(by

resource)"strategy

o an :

ViewfolksonomyasgraphandapplyPageRankidea

Methodoutperformsotherapproaches



Introduction


136/139

Howdotheywork?

o a ora ve er ng









RecommenderSystems

in

the

Social

Web

What to ex ect?


RSresearch willbecome much more diverse

Less focus onexplicitratings

B t i f f f db k h i d k l d


137/139

Butvarious forms offeedback mechanisms andknowledge

Social and Semantic Web automated knowled e extraction

Contextawareness

(beyond

geographical

positions)

Less focus onalgorithms

Explainingandtrustbuilding

Persuasiveaspects

ess ocus ono neexper men a on

Butliveexperiments,realworld case studies,

Morefocus oncausal relationships

When,where andhow to recommend?

Consumer/Sales

psychology

Consumerdecisionmakingtheories


http:/ / recsys.acm.org

Questions?


138/139

Questions?

Questions?

htt : www.recommenderbook.netDietmarJannacheServicesResearchGroup

DepartmentofComputerScience

TUDortmund,

Germany

MarkusZanker

. .

P:+492317557272

Recommender Systems An Introduction byn e gen ys emsan us ness n orma cs

Institute

of

Applied

InformaticsUniversityKlagenfurt,Austria

M:[email protected]

P:+4346327003753

Dietmar Jannach, Markus Zanker, Alexander Felfernig andGerhard FriedrichCambridge University Press, to appear 2010/11


[AT05]Adomavicius &Tuzhilin.Toward the next generation ofrecommender systems:asurvey

ofthe stateoftheart andpossible extensions,IEEETKDE,17(6),2005,pp.734749.

[BHC98] Basu Hirsh & Cohen Recommendation as classification using social and content based


139/139

[BHC98]Basu,Hirsh &Cohen.Recommendation as classification:using social andcontentbased

, , . .

[Burke02]Burke.

Hybrid

Recommender Systems:

Survey

and

Experiments.

UMUAI

12(4),

2002,

331370.

[FFJ+06]Felfernig,Friedrich,Jannach&Zanker.AnIntegratedEnvironmentfor the Development

ofKnowledgeBased Recommender Applications,IJEC,11(2),2006,pp.1134.

[HJ09]Hegelich &Jannach.Effectiveness ofdifferentrecommender algorithms inthe mobile

internet:

A

case study,

ACM

RecSys,

2009. .

strategies.Hypertext2009,pp.7382.

[McSherry05]McSherry.Retrieval Failure andRecovery inRecommender Systems,AIR24(34),

2005,pp.319338.

[MRB04]Mirzadeh,Ricci&Bansal.SupportingUserQueryRelaxationinaRecommender System.

ECWeb,

2004,

pp.

31

40.

[PF04]Pu &Faltings.Decision Tradeoff using example critiquing,Constraints 9(4),2004,pp.289


.

Handout Recsys Sac2010

Documents

Transcript of Handout Recsys Sac2010