AFINN scoring on different dimensions
My AFINN word list with words associated with a score for valence may be used for sentiment analysis. Usually when presented with a text - a sequence of words - I would look up the valence of each individual word and either average, sum or take the sum and divide by the square root of the number of words to get a combined value for the valence/sentiment of the text.
In some cases I compute further numbers beyond the valence/sentiment: arousal, ambivalence, positive and negative. The arousal I compute as the sum of the absolute values of the valence of each word. The ambivalence is computed as the difference between the arousal and the absolute of the valence. 'Positive' is the sum of all the positive valence, while 'negative' is the absolute sum of all the negative valences.
These values are really not optimal, but I guess the best one can get when each word is only labeled with a single value. An example of a problematic word is 'surprised' that I would say should have an arousal value, but since the word could indicate both be a negative and a positive surprise I have not added to my word list.
Below in the table is a scoring of a micropost from http://rainbowdash.net/notice/2764341
WordValenceArousalPositiveNegativeAmbivalence zeldatra0.00.00.00.0 I0.00.00.00.0 m0.00.00.00.0 surprised0.00.00.00.0 you0.00.00.00.0 got0.00.00.00.0 my0.00.00.00.0 hair0.00.00.00.0 spot0.00.00.00.0 on0.00.00.00.0 though0.00.00.00.0 considering0.00.00.00.0 how0.00.00.00.0 I0.00.00.00.0 exaggerate-2.02.00.02.0 it0.00.00.00.0 so0.00.00.00.0 much0.00.00.00.0 Thanks2.02.02.00.0 Although0.00.00.00.0 I0.00.00.00.0 dunno0.00.00.00.0 why0.00.00.00.0 Total0.04.02.02.04.0
Here the sum is the result. If the average (over words) should be the result then I would normalized with 23, e.g., 4/23=0.17 or if I report with the square root normalization, e.g., 4/sqrt(23)=0.83.
Code: https://gist.github.com/fnielsen/5949814


















