If you have ever used a spreadsheet program then you are well-acquainted with the frustrations produced by entering one thing and having it auto formatted into something else. If such formatting errors go unnoticed then it will definitely lead to serious repercussions when it comes to company finances in the commercial scenario, but when it comes to the serious sciences like genetics their outcomes will be catastrophic.
Recently a study published in the journal of Genome Biology stated that 19.6 percent i.e. roughly 1 in 5 of all genetics papers published that contained spreadsheets had such errors. The main reason behind the problem is the way how genomic names are written. To consider an example, the gene called – “Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase” is written as MARCH1 in shorthand. But in Excel due to its default settings, it is automatically converted into 1-Mar or some other calendar date format.
Similarly when scientists enter genetic ID numbers they get converted into floating point numbers like – 2310009E13 became 2.31E+13. And as per Popular Mechanics, it is not possible for the scientists to completely reformat their excel files. So, instead they choose to reformat a blank document and then re-enter their data cell by cell. However, fortunately these errors do not have any effects on the paper’s original findings, but they may pose a serious problem for scientists who would want to replicate the study.
Here is some more insight on the issue:
Recently in a convention for geneticists, the speaker began with the snarky statement that – if you thought autocorrect was a bummer for your Whatsapp texts then you should talk to a geneticist instead. While automated features and productivity tweaks are supposed to save people time and work, they are often times counterproductive as they insert errors into our substantial pieces of work. And sadly, that is the case for an alarming number of academic papers in genetics.
Many academic papers come with supplemental files complete with charts, table and other tabular data and ideally such files are there to support the data and other aspects of the research. They are also useful for fellow researchers to take the work further. While all things remain fine in an ideal scenario, but some automated features in Excel some important data such as scientific names, floating point numbers and dates ends up getting reformatted and causes much too trouble for the scientific community causing havoc confusion. This problem with automatic reformatting occurs all the time and scientists had found the issue back in 2004 and the problem has still persisted since then.
An extensive research has been conducted by Mark Ziemann, Yotam Eren and Assam El-Osta on gene name conversion revealed that about 20 percent of all papers with supplemental spreadsheets have such errors appearing in them. The researchers took a note of more than 35,000 supplemental Excel files attached to such research documents related to genetic studies. They employed automated software to search and filter anything that resembled lists of genes and narrowed the field to about 3597 papers with several supplemental files. Then they went on to screen for the 10 most common false positive cases and discovered them in files attached to some 704 publishing houses who have published such papers. That is 19.6 percent of all the research papers they screened.
So, while many of us have been the victim of autocorrect changing the meaning of our text messages, some with hilarious results but in the case of genetic studies this matter is of no big laughs. These papers are assets for the scientific community and are often used by new generations of researchers to further study the matters. But having such massive errors on the papers can definitely slow things down and create problems for science to advance.
Science has already seen several wasted years in the world due to human intervention with obstacles to free thinking and questioning in the past riding on government or authoritarian censorship eating away people’s ideas of genius.
To further worsen the situation, there is no way we can turn off this autocorrect feature in Microsoft Excel permanently. Fortunately, researchers have discovered that Google Sheets does not perform such automated correctional functions, and if people copied such content from Google Sheets into other forms of spreadsheet programs then the formatting of these data were preserved.
So, until the prominent spreadsheet software manufacturers can figure out a way to offer people with the feature to switch off such autocorrect functions, it will probably fall in the hands of some young, unfortunate research assistant to double check this massive amount of data and correct the lists of gene names.
To learn more about common application of MS Excel and some nifty tricks for spreadsheet software take up an advanced Excel course in Gurgaon from DexLab Analytics, the premiere analytics training institute in India.