Super Volatile

Krzysztof Szafranek's link blog

Hi, I'm Krzysztof and I make websites.
When I'm not making websites, I read these.
Dec 15 / 3:59pm

Bug Prediction at Google

In order to help identify these hot spots and warn developers, we looked at bug prediction. Bug prediction uses machine-learning and statistical analysis to try to guess whether a piece of code is potentially buggy or not, usually within some confidence range.

How Google uses statistics to find troublesome places in the code.

Filed under: google   software development   statistics  
Dec 1 / 2:50pm

What if academics were as dumb as quacks with statistics?

You can say that there is a statistically significant effect for your chemical reducing the firing rate in the mutant cells. And you can say there is no such statistically significant effect in the normal cells. But you cannot say that mutant cells and mormal cells respond to the chemical differently. To say that, you would have to do a third statistical test, specifically comparing the “difference in differences”, the difference between the chemical-induced change in firing rate for the normal cells against the chemical-induced change in the mutant cells.

Analysis of a very common mistake in neuroscience research papers.

Filed under: science   statistics  
Nov 20 / 1:36pm

One Per Cent: Occupy vs Tea Party: what their Twitter networks reveal

Those tweeting about the Tea Party emerge as a tight-knit "in crowd", following one another's tweets. By contrast, the network of people tweeting about Occupy consists of a looser series of clusters, in which the output of a few key people is being vigorously retweeted.

Analysis of Twitter usage by people from both sides of American political spectrum reveals interesting patterns. The article doesn't try to conclude what does it actually say about these constituents.

Filed under: politics   statistics   twitter   usa  
Mar 5 / 12:34am

Measure Anything, Measure Everything

We’ve found that tracking everything is key to moving fast, but the only way to do it is to make tracking anything easy.

On tracking what's going on with a web application using UDP.

Filed under: statistics   web analytics  
Jan 22 / 8:57pm

Internet 2010 in numbers

89.1% – The share of emails that were spam.

Some interesting statistics about internet usage in 2010.

Filed under: internet   statistics  
Jul 20 / 11:27pm

"Experts" misunderestimate our traffic, and we don't know why

The problem is, advertisers generally don't trust Google Analytics numbers. They have their own preferred sources of traffic information that they put their faith in. Let's take a look at some of them.

Reddit compares their traffic data from logs and Google Analytics with estimates coming from Alexa, compete.com and Quantcast.

Filed under: statistics   web traffic  
May 30 / 9:47pm

The Density of Smart People - Creative Class

San Francisco and New York are far and away the leaders in human capital density with 7,031 and 6,357 college degree holders per square mile, respectively. Boston (3,871), Washington, D.C. (3,395) , Seattle (2,853), and Chicago (2.543) all have human capital densities in the range of 2,500 to 3,500 degree holders per quarter mile. Silicon Valley has a human capital density of 1,259 degree holders per square mile.

Stats on density of college graduates per square mile in different American cities.

Filed under: statistics   usa  
May 14 / 11:42pm

How Not To Sort By Average Rating

Average rating works fine if you always have a ton of ratings, but suppose item 1 has 2 positive ratings and 0 negative ratings. Suppose item 2 has 100 positive ratings and 1 negative rating. This algorithm puts item two (tons of positive ratings) below item one (very few positive ratings). WRONG.

On correct formulas for ratings. Also, see another article for improved formula that works much better with smaller numbers of votes.

Link via person who “had to justify [his] expensive education”.

Filed under: statistics  
Feb 17 / 1:01am

Tableau Public

Excel on steroids. Unfortunately, the editor is only for Windows.

Filed under: statistics   visualization