tag:blogger.com,1999:blog-80278185941060522192024-02-18T22:05:01.421-08:00StochNedstochnedhttp://www.blogger.com/profile/14435277810815352276noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-8027818594106052219.post-77082153675508997882011-09-19T05:55:00.000-07:002011-09-19T06:33:59.156-07:00Scientific fraudAs a mathematician and a statistician and a scientist, I must admit that my main reaction to the recent news that the famous Dutch social psychologist Prof. Diederik Stapel, whom I must admit I had never heard of before, had faked the data in many of his publications, was one of <i>Schadenfreude</i>. Especially when he started dragging some of his co-authors in his <a href="http://uvtapp.uvt.nl/fsw/spits.npc.ShowPressReleaseCM?v_id=4082238588785510">spectacular fall</a>.<br />
<br />
<br />
<div style="font: 12.0px Helvetica; margin: 0.0px 0.0px 0.0px 0.0px;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVaoJ60CS511Rwle9SYgHVom7bHVLut38d0phgwC-ibiBxEK1h-49aFfEy-N9CBcQ-YqbFC7nHapZOQzbA0qaC2kNsmwL6Bd76v3NwOMg3GGsX79vRFN3vl3u5TOFi9-KsAiILNf78/s1600/si-Stapel.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVaoJ60CS511Rwle9SYgHVom7bHVLut38d0phgwC-ibiBxEK1h-49aFfEy-N9CBcQ-YqbFC7nHapZOQzbA0qaC2kNsmwL6Bd76v3NwOMg3GGsX79vRFN3vl3u5TOFi9-KsAiILNf78/s1600/si-Stapel.jpg" /></a></div>
<div style="font: 12.0px Helvetica; margin: 0.0px 0.0px 0.0px 0.0px;">
<br /></div>
<br />
<br />
<br />
I had quite a long email discussion with Leiden University newspaper editor Bart Braun about the case. His article, which is good I think, appeared in "Mare". The national newspaper "NRC Handelsblad" also published some rather good articles on the issue. Here I am going to quote verbatim one of the best reactions, by statistician Han Oud. Unfortunately (for the moment) this is in Dutch. I'll replace the original by a translation as soon as possible, and add some of my personal observations too.<br />
<br />
<br />
<div style="background-color: #efefef; font: 20.0px Georgia; margin: 0.0px 0.0px 0.0px 0.0px;">
<a href="http://archief.nrc.nl/index.php/2011/September/13/Overig/14/">NRC</a></div>
<div style="background-color: #efefef; color: #073399; font: 20.0px Georgia; margin: 0.0px 0.0px 0.0px 0.0px; min-height: 22.0px;">
<a href="http://archief.nrc.nl/index.php/2011/September/13/Overig/14/"></a></div>
<table cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td style="width: 967.0px;" valign="top"><div style="background-color: #f2f2f2; color: #073399; font: 14.0px Georgia; margin: 0.0px 0.0px 0.0px 0.0px;">
<span style="color: #999999;">Archief \ <a href="http://archief.nrc.nl/index.php/2011/"><span style="color: #073399;">2011</span></a> \ <a href="http://archief.nrc.nl/index.php/2011/September/"><span style="color: #073399;">September</span></a> \ <a href="http://archief.nrc.nl/index.php/2011/September/13/"><span style="color: #073399;">13</span></a> \ <a href="http://archief.nrc.nl/index.php/2011/September/13/Overig/"><span style="color: #073399;">Overig</span></a> \ <a href="http://archief.nrc.nl/index.php/2011/September/13/Overig/14/"><span style="color: #073399;">14</span></a></span></div>
<div style="font: 28.0px Georgia; margin: 0.0px 0.0px 28.0px 0.0px;">
Fraude is te gemakkelijk in de sociale wetenschappen</div>
<div style="color: #999999; font: 14.0px Georgia; margin: 0.0px 0.0px 13.0px 0.0px;">
Opinie <span style="color: #bbbbbb;">|</span> <a href="http://archief.nrc.nl/index.php/2011/September/13/"><span style="color: #073399;">Dinsdag 13-09-2011</span></a> <span style="color: #bbbbbb;">|</span> Sectie: <a href="http://archief.nrc.nl/index.php/2011/September/13/Overig/"><span style="color: #073399;">Overig</span></a> <span style="color: #bbbbbb;">|</span> Pagina: <a href="http://archief.nrc.nl/index.php/2011/September/13/Overig/14/"><span style="color: #073399;">14</span></a> <span style="color: #bbbbbb;">|</span> Han Oud</div>
<div style="font: 14.0px Georgia; margin: 0.0px 0.0px 13.0px 0.0px;">
De sociale wetenschappen zijn gevoeliger voor fraude dan natuurwetenschap door gebrek aan herhaalbaarheid. Toch is fraude te voorkomen door openbaarmaking van de data, stelt Han Oud.</div>
<div style="font: 14.0px Georgia; margin: 0.0px 0.0px 13.0px 0.0px;">
Het succes van een zeer productieve en hogelijk geprezen Tilburgse hoogleraar, de 'golden boy' van de sociale psychologie, blijkt ten dele gebaseerd op fraude. De rector magnificus van de Tilburgse universiteit bevestigt dat de man op grote schaal gegevens verzonnen heeft. Is hier sprake van een geïsoleerd fenomeen of is er veel meer aan de hand?</div>
<div style="font: 14.0px Georgia; margin: 0.0px 0.0px 13.0px 0.0px;">
Vakgenoten in de sociale wetenschap stellen alles in het werk om de schade tot dit ene geval te beperken. Een collega hoogleraar uit Nijmegen, die zelf aan het gefraudeerde onderzoek deelnam, haastte zich al om te spreken over de omvangrijke misstap van één enkele collega. Een commissie onder voorzitterschap van voormalig KNAW-president Levelt zal de omvang van de fraude door deze ene persoon in kaart brengen. En ook Robbert Dijkgraaf, de huidige KNAW-president, is er als de kippen bij om de brandhaard te isoleren: 'Fraude raakt Stapel, niet de wetenschap' (Volkskrant, 9 sept.)</div>
<div style="font: 14.0px Georgia; margin: 0.0px 0.0px 13.0px 0.0px;">
Fraude blijft uiteraard niet beperkt tot de sociale wetenschappen. Maar de kans op fraude door slordige en oppervlakkige omgang met de data is in de sociale wetenschap vele malen groter dan in de natuurwetenschap. Waarom? Een belangrijke rem op fraude in de natuurwetenschap is de herhaalbaarheid. Herhaalbaarheid is in de sociale wetenschap verwaarloosbaar. Sociaal-wetenschappelijk onderzoek is op steekproeven gebaseerd en iedere steekproef kent zijn eigen afwijkingen met als gevolg dat het uiterst moeilijk is om te bewijzen dat data zijn gemanipuleerd.</div>
<div style="font: 14.0px Georgia; margin: 0.0px 0.0px 13.0px 0.0px;">
Een tweede reden waarom sociale wetenschappers bijna straffeloos de buitenwacht jaren om de tuin leiden is de betrekkelijk kleine omvang van de meeste databestanden en het feit dat zij door de onderzoekers doorgaans als een soort privé-eigendom worden beschouwd. Het is uiterst moeilijk om iemand buiten de eigen onderzoeksgroep en waarmee je niet heel intensief samenwerkt, om inzage te vragen in zijn databestanden. Vanwege de enorme productiedruk is de onmiddellijke reactie bij de betrokkene niet eens zozeer het gevoel gecontroleerd te worden, alswel de vrees dat 'zijn data' gebruikt worden in een publicatie zonder dat hij mede-auteur is.</div>
<div style="font: 14.0px Georgia; margin: 0.0px 0.0px 13.0px 0.0px;">
Wat kan worden gedaan ter verbetering van de situatie? Evenmin als in de natuurwetenschap kan fraude in de sociale wetenschap worden uitgebannen. Dat is mede het gevolg van de enorme behoefte bij de publiciteitsmedia aan onderzoeksresultaten van het kaliber 'Vleeseters zijn hufteriger en egoïstischer dan vegetariërs'. Grote schoonmaak zou echter worden gehouden door de invoering van twee relatief simpele maatregelen.</div>
<div style="font: 14.0px Georgia; margin: 0.0px 0.0px 13.0px 0.0px;">
Ten eerste: de databestanden van alle onderzoeken waarover in de publiciteitsmedia en tijdschriften is gepubliceerd, zijn publiek domein. Bezwaren hiertegen zijn niet te onderbouwen. Wie in het publieke domein wil rapporteren, moet ook de data in het publieke domein ter beschikking stellen. Ten tweede: de gezamenlijke faculteiten sociale wetenschappen stellen een ervaren onderzoeker aan die als taak heeft steekproefsgewijs na te gaan of de resultaten van promotie-onderzoek op de in het proefschrift aangegeven wijze voortkomen uit aanwezige databestanden.</div>
<div style="font: 14.0px Georgia; margin: 0.0px 0.0px 13.0px 0.0px;">
De data worden als een soort privé-eigendom beschouwd</div>
<div style="color: #999999; font: 14.0px Georgia; margin: 0.0px 0.0px 13.0px 0.0px;">
Info: Dr. J.H.L. Oud is als wiskundige verbonden aan het Instituut voor Gedragswetenschappen van de Radboud Universiteit Nijmegen.</div>
<div style="font: 14.0px Georgia; margin: 0.0px 0.0px 13.0px 0.0px;">
Op dit artikel rust auteursrecht van NRC Handelsblad BV, respectievelijk van de oorspronkelijke auteur.</div>
</td>
</tr>
</tbody>
</table>
<br />
<br />
<br />stochnedhttp://www.blogger.com/profile/14435277810815352276noreply@blogger.com1tag:blogger.com,1999:blog-8027818594106052219.post-69044090024444899892011-08-25T22:49:00.000-07:002011-08-26T01:13:16.017-07:00Mathematics in a Blog<script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript">
</script><br />
<br />
I'm experimenting with mathJax, <a href="http://www.mathjax.org/">www.mathjax.org</a>. That's a way to write LaTeX formulas in html documents which the reader sees displayed just as they ought to be. Well, that's the theory.<br />
<br />
$$<br />
\sqrt{\vphantom{I}} n \bigl(\hat\theta_{\text{MLE}}-\theta_0\bigr)~\Rightarrow ~ \mathcal N\,\bigl(\, 0\,, \mathcal I(\theta_0)^{-1}\bigr)<br />
$$<br />
<br />
<br />
$$<br />
\Pr(T_E\gg t_E)~=~\prod_{A\subseteq E}\,\,\prod_{s_A\in(0_A,t_A]}\, \Biggl(\prod_{B\subseteq A}\Pr\Bigl(T_{A\setminus B}\gg<br />
s_{A\setminus B}\Bigm|T_A\ge s_A\Bigr)^{(-1)^{|B|}}\Biggr)<br />
$$<br />
<br />
So I'ld like to hear from you, dear reader. What do you see? Does it look OK? Did you have to wait a long time?<br />
<br />
So, how is it done? The LaTeX formulas are typed in completely standard LaTeX, surrounded by double dollar signs. At top of the html source for this page the following code is included: < script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" ><br />
< /script > (I added a space after each start-angle-bracket, so that the html tags for starting and ending a java script are not recognised as such).<br />
<br />
Finally, let me add as images what real LaTeX makes of the two formulas:<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUrUpeHeonugPp-7inyn9DkMeZWODUHFFYUNk1dsfvXn7pPmlSf6xYoxX-LZ9QwzkkiRnnb0ym39qfAuKc0ew5nzd1cmPH7hyd5Np7Vd7C438ZrCxKiYo-uNWOxDWrjXVU0gppe88O/s1600/formula1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="61" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUrUpeHeonugPp-7inyn9DkMeZWODUHFFYUNk1dsfvXn7pPmlSf6xYoxX-LZ9QwzkkiRnnb0ym39qfAuKc0ew5nzd1cmPH7hyd5Np7Vd7C438ZrCxKiYo-uNWOxDWrjXVU0gppe88O/s320/formula1.png" width="320" /></a></div><br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5c9Ytw8D9YjFVsD83oaiK7zN4f59XnVbQiHBTYl5oVuXe1B4R3EmLbWHsB4lfg0vth4aHwu4LlCmbuJ91-Ao9nW1ESZBIOEoWOMGFNdeJgLTE53vIz44BOy317dX7UZS-LxXyx9HA/s1600/formula2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="97" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5c9Ytw8D9YjFVsD83oaiK7zN4f59XnVbQiHBTYl5oVuXe1B4R3EmLbWHsB4lfg0vth4aHwu4LlCmbuJ91-Ao9nW1ESZBIOEoWOMGFNdeJgLTE53vIz44BOy317dX7UZS-LxXyx9HA/s640/formula2.png" width="640" /></a></div>stochnedhttp://www.blogger.com/profile/14435277810815352276noreply@blogger.com3tag:blogger.com,1999:blog-8027818594106052219.post-39398366760591873102011-08-23T03:56:00.000-07:002011-09-05T04:50:04.146-07:00The true story of the VvS+OR logo<div dir="ltr" style="text-align: left;" trbidi="on">
The logo as we see it on the cover of Statistica Neerlandica was drawn sometime around 1970, freehand, by CWI resident artist Tobias Baanders. Presumably he was inspired by <a href="http://www.flickr.com/photos/gill1109/sets/72157620859202372/with/3681928516/">earlier VVS graphic design</a>, which almost always featured a standard normal probability density as the recognisable trade mark of all statisticians. The role of operations research, especially deterministic operations research, is perhaps encapsulated in the left hand part of the logo (associations of optimization, efficiency?). Years later the design was scanned and converted to a postscript file. For a number of years I gave students the exercise: fit a family of smooth curves to the logo, and if possible come up with a statistical (or mathematical) story of the image. However, no one succeeded. Suddenly I had a vision that the logo was a 3-dimensional object viewed in perspective; in fact, it shows nine parallel race tracks receding into the distance on the left hand side; and on the right hand side, we see the race tracks close by, almost from above, as they go over a hill.<br />
<br />
Now it was just a question of drawing the curves in three dimensions, in R, and viewing them from a well chosen distance and direction.<br />
<br />
The hill on the right hand side is based on a (mirrored) gamma density with shape parameter 7 (my favourite number).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_ZVO0qN7NRux6Q_4tlRGTZO9y1gzy8PIMxFL23vI2QL715Z2I11hg1YfTsbVIDTfSD4dQhW9mL0927z_52R4WG6e75x6ecUo8tlvoLKLDTIukheKeThvvLEOM-MxSqbWIxXAvkll8/s1600/3dlogoVVS-SMS.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh_ZVO0qN7NRux6Q_4tlRGTZO9y1gzy8PIMxFL23vI2QL715Z2I11hg1YfTsbVIDTfSD4dQhW9mL0927z_52R4WG6e75x6ecUo8tlvoLKLDTIukheKeThvvLEOM-MxSqbWIxXAvkll8/s400/3dlogoVVS-SMS.jpg" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: left;">
The R code for the “waves” part of this picture can be found <a href="http://www.math.leidenuniv.nl/%7Egill/3dlogo.txt">here</a>. I use the “rgl” package to create and view a three-dimensional plot. The image is then saved in svg format (scalable vector graphic). </div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
In three dimensions the notion of a “filled closed path” doesn't make sense. Surfaces are represented in rgl by wire frame or similar piecewise linear objects. I therefore used rgl only to draw the boundaries of the nine strips, as nine closed polygonal paths.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
Unfortunately, in the transition via rgl from R to svg, what originally were 9 closed polygonal paths (each with about 500 vertices) are broken up into a large number of smaller not-closed polygonal paths, collected together in one graphical object. In a graphical editor (Adobe Illustrator or Inkscape) I first break up the object into its constituents, then I have the constituents joined into one path. Finally I convert the closed path into a filled closed path (most easily done by a one word replacement in the svg source text file).</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
In order for this to work it is necessary that the many polygonal line segments can be joined together into a single closed path without adding new line connections, since otherwise extra lines are added, resulting in surprising and pretty but unintended results. I satisfied this criterium by adding a strip perpendicular to the nine strips in the image, connecting the nine strips together. That part of the image is outside of the clipped area to the right.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
The final image is composed in Keynote, Apple's presentation editor. This preserves scalable images as scalable images, including characters from fonts. So one can finally export a pdf file consisting entirely of scalable components ... except that at very high resolution one will see that the curves of the nine race-tracks are actually polygonal lines. This needs to be fixed by replacing the polygonal lines by spline curves, which I believe can be easily done in Illustrator or Inkscape, or alternatively in the svg source. The main problem will be to keep the sharp corners at the ends of the strips.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
The letters VvS+OR are (mostly) typeset in URW++ <b>Bauhaus 93</b>. Under the name <b>Blippo Black</b>, it was designed by Joe Taylor in 1969, inspired by Herbert Bayer’s 1925 experimental “universal typeface”. Bayer was director of printing and advertising for Walter Gropius’ Bauhaus and in his minimalistic font, lowercase and uppercase letters were scaled versions of one another. The font reminds me of publications from the early days of the VVS (late fourties, Dutch graphic design: modernistic and minimalistic). </div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
However, the ‘S’ and the ‘R’ come from another font: Neufville Digital <b>Futura</b>. I found the ‘S’ and the ‘R’ of Bauhaus 93 both a little too outspoken, while Futura is a more bland typeface, lending itself well to combination with more outspoken characters.</div>
<br />
<b>Futura</b> again goes back to the Bauhaus movement, being designed in 1927 by Paul Renner. From Wikipedia: <i>Futura has an appearance of efficiency and forwardness. The typeface is
derived from simple geometric forms (near-perfect circles, triangles and
squares)</i>. <br />
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<a href="http://cg.scs.carleton.ca/%7Eluc/fonts.html">Luc Devroye of McGill University</a> gave me a lot of good advice on this part of the logo project (though I did not follow all of it!).</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
The script letters SMS are typeset in <b>Tex Gyre Chorus</b>: an open source version of ITC <b>Zapf Chancery</b>, designed by Hermann Zapf in 1979 and inspired by Italian renaissance papal chancery writing, and included as a system font in Apple's Mac OS.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
Both type-faces are built of a bare minimum of simple strips or brush strokes, resembling the waves in the image, yet each with a very distinctive character.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
I find the combination of two contrasting type-faces, <b>Bauhaus</b> (<b>Futura</b>) and <b>Chancery</b>, each with historical and cultural connotations, together with the dynamic fluidity of the waves of the logo, rather pleasing. But that’s a matter of taste.</div>
stochnedhttp://www.blogger.com/profile/14435277810815352276noreply@blogger.com0tag:blogger.com,1999:blog-8027818594106052219.post-46393828124036229672011-03-30T03:12:00.000-07:002011-03-30T03:12:46.722-07:00StochNed goes OnLineThe sexy Mathematical Statistics of the Dutch Statistical Society is now online with a twitter account (stochned) and this blog. More to come.stochnedhttp://www.blogger.com/profile/14435277810815352276noreply@blogger.com0