Excellent find, corrected now!
The sample size was met! In both DOD1 and DOD2 we have N₁ + N₂ (so the sample listening to the original track plus the sample listening to the corrupted one) = 34. Having posted this here and on other forums and on IRC I expected a bit more — though dodecaphonic lovers are not easy to come by — but we reached the threshold. The ‘N’s are in the «descriptive statistics» tables, I will edit the article to make it clearer!
A minor correction: we didn’t test Hₒ≠H₁, but Hₒ>H₁ (abusing notation, I should have said P(X>Y)). This is of course for efficiency reason (again, fewer observation needed for the same power!); you can check it in the ‘p’ of DOD2, which is extremely high despite the original/corrupted piece having a somewhat different distribution, because the corrupted version was rated higher!
Thanks for the comments, if you have any more don’t be shy!