{"id":3755,"date":"2020-07-07T08:26:41","date_gmt":"2020-07-07T00:26:41","guid":{"rendered":"https:\/\/www.merieuxnutrisciences.com.cn\/?p=3755"},"modified":"2022-03-12T13:23:02","modified_gmt":"2022-03-12T05:23:02","slug":"statistics-in-lines-of-code-so-what-3","status":"publish","type":"post","link":"https:\/\/www.merieuxnutrisciences.com.cn\/?p=3755&lang=en","title":{"rendered":"Statistics in lines of code \u2026 so what?"},"content":{"rendered":"<hr \/>\n<p><span style=\"font-size: 14px;\"><img loading=\"lazy\" class=\" size-full wp-image-2567\" style=\"height: 100%; width: 100%;\" src=\"https:\/\/www.merieuxnutrisciences.com.cn\/wp-content\/uploads\/2020\/07\/thumbnails_image_statistics-in-lines-of-code-1.jpg\" alt=\"\" width=\"1122\" height=\"472\" \/><\/span><\/p>\n<p><strong><span style=\"font-size: 14px;\">Conducting a clinical study requires several steps and tools.<\/span><\/strong><\/p>\n<p><span style=\"font-size: 14px;\">Based on the scientific &amp; biological question a promoter needs to address, he has to design the study, define the number of volunteers, recruit, monitor and collect data. Finally, once he gets the data, he enters the mysterious world of statistics, which has its own languages of \u201cstatistical programming\u201d.<\/span><\/p>\n<p><span style=\"font-size: 16px;\"><span style=\"color: #d28f1b;\"><strong>What is a statistical programming language, and why do we need it?<\/strong><\/span><\/span><\/p>\n<p><span style=\"font-size: 14px;\">A statistical software is needed to handle, check and analyse data. There are 2 possibilities in the existing softwares:\u00a0<\/span><\/p>\n<ul>\n<li><span style=\"font-size: 14px;\">\u201cready to use\u201d systems<\/span><\/li>\n<li><span style=\"font-size: 14px;\">systems based on lines of code.\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-size: 14px;\">Ready to use software are easier to take over and are an interesting option for medical professionals. But statistics professionals generally take advantage of the command language option for a lot of reasons. It is very useful to save the lines of code which generate statistical outputs, for reasons of traceability, or on the contrary to more easily change an option by changing an element of the program. In addition, even if coding takes time, we can perform many analyses faster with code. Finally, ready-to-use software offers less flexibility in terms of data manipulation and analysis options.<\/span><\/p>\n<p><span style=\"color: #d28f1b;\"><span style=\"font-size: 16px;\"><strong>Are they different languages used in statistical programming?<\/strong><\/span><\/span><\/p>\n<p><span style=\"font-size: 14px;\">As well as many fields of science, statistical programming has a constantly changing landscape. Indeed, ten years ago, SAS<sup>\u00ae<\/sup>\u00a0and R were the statistical programming languages the most widely used. SPSS, Stata, Matlab and many other softwares were existing but had less users. Lastly, we can observe a rise of R and Python users, and a decline of SAS\u00ae customers. All these programming languages \u200b\u200bhave a particularity: they are completely incomprehensible to non-specialists.<\/span><\/p>\n<p><span style=\"font-size: 14px;\"><img loading=\"lazy\" class=\" size-full wp-image-2568\" style=\"height: 100%; width: 100%;\" src=\"https:\/\/www.merieuxnutrisciences.com.cn\/wp-content\/uploads\/2020\/07\/thumbnails_image_statistics-in-lines-of-code-2.jpg\" alt=\"\" width=\"1654\" height=\"591\" \/><\/span><\/p>\n<p class=\"rtecenter\"><span style=\"font-size: 14px;\"><em>SAS\/R\/Python\u2026 different languages of statistical programming<\/em><\/span><\/p>\n<p><span style=\"color: #d28f1b;\"><span style=\"font-size: 16px;\"><strong>What are currently the key drivers for choosing a statistical language in clinical studies?<\/strong><\/span><\/span><\/p>\n<p><strong><span style=\"font-size: 14px;\">\u00a0\u2014 FDA guidance \u2014<\/span><\/strong><\/p>\n<p><span style=\"font-size: 14px;\">Let&#8217;s focus on the pharmaceutical industry. SAS<sup>\u00ae<\/sup>\u00a0have always had a massive leadership in this area, due to the US Food and Drug Administration (FDA) guidance. It is well known that most submissions have used SAS<sup>\u00ae<\/sup>, and many people think SAS<sup>\u00ae<\/sup>\u00a0is the only software we can use to analyze clinical trial data. However, in 2015, the FDA published a Statistical Software Clarifying Statement, opening a door to the use of R for clinical trials: \u201cFDA does not require use of any specific software for statistical analyses [\u2026]. However, the software package(s) used for statistical analyses should be fully documented in the submission, including version and build identification\u201d. This assertion is enhanced by \u201cA Guidance Document for the Use of R in Regulated Clinical Trial Environments\u201d published by FDA in March 2018.<\/span><\/p>\n<p><strong><span style=\"font-size: 14px;\">\u2014 Distrust of freeware \/ Validation \u2014<\/span><\/strong><\/p>\n<p><span style=\"font-size: 14px;\">Therefore, why do so many pharmaceutical and CRO companies still pay for expensive SAS<sup>\u00ae<\/sup>\u00a0licenses while the same work could be done with R? SAS<sup>\u00ae<\/sup>\u00a0is a for-profit company which vets all its code to ensure it returns correct results. On the other hand, R is an open-source language, everyone can contribute by writing a package. Therefore, are results from R reliable? Those produced by the most famous R packages certainly are, those produced by more esoteric R packages might be treated cautiously. Plus, it is easier with SAS<sup>\u00ae<\/sup>\u00a0to make results traceable (for example to give the outputs a time stamp) and stable (independent of package version). For all these reasons, SAS<sup>\u00ae<\/sup>\u00a0remains widely used in pharma.<\/span><\/p>\n<p><strong><span style=\"font-size: 14px;\">\u2014 Tradition \/ Habits \/ Change \u2014<\/span><\/strong><\/p>\n<p><span style=\"font-size: 14px;\">When an organization is using SAS<sup>\u00ae<\/sup>\u00a0for many years, and has developed plenty of working macros, translating these codes from a programming language to another can be very costly in time and money \u2013 even more than one year of SAS<sup>\u00ae<\/sup>\u00a0renewal fees. Tradition does not change rapidly. But by demanding large license fees from universities, SAS<sup>\u00ae<\/sup>\u00a0is getting less and less learnt by students, and nowadays, many newcomers entering the workforce already know how to use R and not SAS<sup>\u00ae<\/sup>. Add to this fact that R is a more intuitive language \u2013 therefore easier to learn than SAS<sup>\u00ae<\/sup>\u00a0\u2013 and more adapted to modern analytic practices (as microbiome analysis), we can imagine SAS<sup>\u00ae<\/sup>\u00a0programming skills are going to be increasingly rare in the next few years. Several pharmaceutical and CRO companies have already initiated a migration project from SAS<sup>\u00ae<\/sup>\u00a0to open source, and many more will follow.<\/span><\/p>\n<p><span style=\"font-size: 14px;\"><img loading=\"lazy\" class=\" size-full wp-image-2569\" style=\"height: 100%; width: 100%;\" src=\"https:\/\/www.merieuxnutrisciences.com.cn\/wp-content\/uploads\/2020\/07\/thumbnails_image_statistics-in-lines-of-code-3.jpg\" alt=\"\" width=\"1890\" height=\"777\" \/><\/span><\/p>\n<p class=\"rtecenter\"><span style=\"font-size: 14px;\"><em>Decrease of SAS interest versus increase for R<\/em><\/span><\/p>\n<p><span style=\"font-size: 16px;\"><span style=\"color: #d28f1b;\"><strong>What would be the future of statistical programming language?<\/strong><\/span><\/span><\/p>\n<p><span style=\"font-size: 14px;\">Since the early 2010s, machine learning is getting more and more popular in reason to higher computing power and larger amount of available data. Python and R are the two most commonly used languages for this technology today. They are both open source products and completely free to use. Python was first released in 1991 and contrarily to R, is not purposed for statistics only. Indeed, it is historically used by software engineers, recognized to be great for mathematical computations, with an elegant syntax. However, Python provides less libraries than R, and reporting tables and data visualization is more convoluted, so both Python and R are appropriated to data science, with their respective advantages.<\/span><\/p>\n<p><span style=\"font-size: 14px;\"><img loading=\"lazy\" class=\" size-full wp-image-2570\" style=\"height: 100%; width: 100%;\" src=\"https:\/\/www.merieuxnutrisciences.com.cn\/wp-content\/uploads\/2020\/07\/thumbnails_image_statistics-in-lines-of-code-4_0-scaled.jpg\" alt=\"\" width=\"2560\" height=\"1185\" \/><\/span><\/p>\n<p class=\"rtecenter\"><span style=\"font-size: 14px;\"><em>Python for future with machine learning?<\/em><\/span><\/p>\n<p><span style=\"font-size: 14px;\">Finally, note that a recent programming language is seducing more and more data scientists: Julia. Currently its use is not commonplace yet, but many data scientists can be charmed by the rapidity, the good memory management and the good parallelism that Julia offers.<\/span><\/p>\n<p><span style=\"color: #d28f1b;\"><span style=\"font-size: 16px;\"><strong>SAS, R, Python, Julia \u2026<\/strong><\/span><\/span><\/p>\n<p><span style=\"font-size: 14px;\">In conclusion, there are currently several statistical programming languages, the aim of which is always the same: to analyze data sets and to extract knowledge from them. The specialists in statistical programming always have to monitor the constantly evolving landscape of these languages: the language used today for your project is not necessarily the one that will be used tomorrow.<\/span><\/p>\n<p class=\"rtecenter\"><em><span style=\"color: #d28f1b;\"><strong><span style=\"font-size: 14px;\">\u2013 Benoit Douillard, Statistical Programmer &amp; Biostatistician, Biofortis M\u00e9rieux NutriSciences \u2013<\/span><\/strong><\/span><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Conducting a clinical study requires several steps and  [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[104],"tags":[],"modified_by":"amyadmin","_links":{"self":[{"href":"https:\/\/www.merieuxnutrisciences.com.cn\/index.php?rest_route=\/wp\/v2\/posts\/3755"}],"collection":[{"href":"https:\/\/www.merieuxnutrisciences.com.cn\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.merieuxnutrisciences.com.cn\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.merieuxnutrisciences.com.cn\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.merieuxnutrisciences.com.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3755"}],"version-history":[{"count":1,"href":"https:\/\/www.merieuxnutrisciences.com.cn\/index.php?rest_route=\/wp\/v2\/posts\/3755\/revisions"}],"predecessor-version":[{"id":3756,"href":"https:\/\/www.merieuxnutrisciences.com.cn\/index.php?rest_route=\/wp\/v2\/posts\/3755\/revisions\/3756"}],"wp:attachment":[{"href":"https:\/\/www.merieuxnutrisciences.com.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3755"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.merieuxnutrisciences.com.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3755"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.merieuxnutrisciences.com.cn\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3755"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}