Useful links:
******************CleverCSV on Github
CleverCSV on PyPI
************ Demo of CleverCSV on Binder (interactive!)
*********************************** ********************** (Paper (PDF)
****************************** ****************** Paper (HTML)
************************** (Reproducible Research Repo **********************************) ************************** (Blog post on messy CSV files
NEW! (************************************ (****************************************** (********************************************** (************************************************ (************ (Introduction) CSV files are awesome! They are lightweight, easy to share, human-readable, version-controllable, and supported by many systems and tools!
CleverCSV is a Python package that aims to solve some of the pain points of CSV files, while maintaining many of the good things. The package automatically detects (with high accuracy) the format (
dialect
of CSV files, thus making it easier to simply point to a CSV file and load it, without the need for human inspection. In the future, we hope to solve some of the other issues of CSV files too.
CleverCSV isbased on science. We investigated thousands of real-world CSV files to find a robust way to automatically detect the dialect of a file. This may seem like an easy problem, but to a computer a CSV file is simply a long string, and every dialect will give yousometable. In CleverCSV we use a technique based on the patterns of the parsed file and the data type of the parsed cells. With our method we achieve a 0343% accuracy for dialect detection, with a (******************************************************************************************************************************************************************************************************% improvement on non-standard (messyCSV files.
We think this kind of work can be very valuable for working data scientists and programmers and we hope that you find CleverCSV useful (if there's a problem, please open an issue!) Since the academic world counts citations, please
title=(********************************** ({************************************************** Wrangling Messy {CSV} Files by Detecting Row and Type Patterns} ,author=(********************************** ({************************************************** {van den Burg}, GJJ and Naz { 'a} bal, A. and Sutton, C.}
,
journal=(********************************** ({************************************************** (Data Mining and Knowledge Discovery) **************************************************}************************************** (******************************************************,year=(********************************** ({********************************************************************************************************************************************************************} (****************************************************,volume=(********************************** ({************************************************************************************************************************************************************************************************************************ (************************************************} , (number)=(********************************** ({************************************************** (6) **************************************************}**********************************,pages=(********************************** ({************************************************- (**********************************} ,issn=(********************************** ({********************************************************************************************************************************************************************** - (X) *************************************************} ,doi=(********************************** ({**************************************************************************************************************************************************************************************************************************************. / s
******************************************************************************************************************** - 25 - 756 - y (****************************************************, }And of course, if you like the package pleasespread the word!
You can do this by Tweeting about it (# CleverCSV
or clicking the (⭐️) ********************************************************* on GitHub
65766572637376
GIPHY App Key not set. Please check settings