Item2486: LogarithmicTagCloud
Priority: Enhancement
Current State: Closed
Released In:
Target Release: n/a
The current implementation of the
TagCloudPlugin seems to use a linear approach to bin the data.
If some words have many hits and other have very few, one can no longer tell the differences between the low-frequency entries.
I propose that the
TagCloudPlugin takes the logarithm of each count before binning (or at least to offer this as an option).
--
DanielOderbolz - 09 Dec 2009
Cool idea. Go for it.
--
MichaelDaum - 09 Dec 2009
OK, I implemented this, we tested it against 1.0.7, but we did not test for side-effects.
Here is the
Patch to implement this functionality.
If you have this code
%TAGCLOUD{"Lorem ipsum dolor ipsum dolor ipsum dolor ipsum dolor ipsum dolor ipsum dolor ipsum dolor ipsum dolor ipsum dolor ipsum dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor"
header="<div style=\"text-align:center; padding:15px;line-height:180%\">"
format="<span style=\"font-size:$weightpx;line-height:90%\"><a style=\"color:$fadeRGB(104,144,184,0,102,255);text-decoration:none\" title=\"$count\">$term</a></span>"
footer="</div>"
buckets="40"
offset="0"
lowercase="on"
stopwords="on"
plural="off"
min="0"
map="bucket=pail"
filter="on"
method="logarithmic"
}%
%TAGCLOUD{"Lorem ipsum dolor ipsum dolor ipsum dolor ipsum dolor ipsum dolor ipsum dolor ipsum dolor ipsum dolor ipsum dolor ipsum dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor dolor"
header="<div style=\"text-align:center; padding:15px;line-height:180%\">"
format="<span style=\"font-size:$weightpx;line-height:90%\"><a style=\"color:$fadeRGB(104,144,184,0,102,255);text-decoration:none\" title=\"$count\">$term</a></span>"
footer="</div>"
buckets="40"
offset="0"
lowercase="on"
stopwords="on"
plural="off"
min="0"
map="bucket=pail"
filter="on"
method="linear"
}%
you get:
It is clear that in the linear case, all these "dolor" dominate the picture, while in the log scale you see more structure.
--
DanielOderbolz - 10 Dec 2009
I renamed
method
to
normalize
and made logarithmic normalization default ... because thats superior obviously
--
MichaelDaum - 10 Dec 2009
Released as v2.20.
--
MichaelDaum - 10 Dec 2009
Ah, we need to keep the original counts to display them in the tooltip appropriately.
--
MichaelDaum - 10 Dec 2009