[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]
On 21/11/10 11:20, Philip Whateley wrote:
Its sad that you have to key from hard copy produced originally from electronic data, I really thought the days of re-keying data were gone, shows how much I know. Could you try OCR scanning to at least get the data back into some form of electronic data as quickly as possible (although OCR is not "fool proof") ?On Sun, 2010-11-21 at 08:07 +0000, tom wrote:On 20/11/10 15:46, Philip Whateley wrote:On Sat, 2010-11-20 at 11:15 +0000, tom wrote:On 19/11/10 14:22, Philip Whateley wrote:Anyone know of a good editor for delimited text data files (csv, tab delimited etc.) I need it for directly creating data files for R. I can export from a spreadsheet but that is cumbersome. I am aware that CSVed - which would be exactly what I am looking for - will run under Wine, and I think there is a mode for emacs (although only beta), but I'm looking for something native Linux (unlike CSVed) and easy to learn (unlike emacs) I have looked at google-refine, but that won't create files, although it looks very good for correcting data errors. I am happy with either gui or command line. Many thanks PhilCan I inquire as to how the data is generated? Tom te tom te tomThe data is either marketing data from small surveys, or industrial process control data. In either case the data is collected manually and entered from paper reports. There is no option to collect the data electronically at source. The marketing data is a mixture of numeric, categorical and ordinal categorical data. The process control data is mainly numeric. I also analyse data from designed experiments, but that is usually small enough and static enough to enter directly into a data frame in R. Categorical data I usually analyse using a mixture of R and Mondrian. PhilI'll rephrase that: where does it come from? Can it not be processed directly into a format suitable for R by scripts etc. If it comes from process control data then it may be just a case of parsing the file carefully, though any post 19th C control data should be made available in PC readable form anyway. 99.99% of data need never go near any 'office' style program. Tom te tom te tomOk. The process control data mainly comes from shop floor control charts; problem solving tools which are maintained by operators on paper. There is also some process control data which is maintained electronically by the company, but as a consultant I am not allowed to have access to the company network, so can only receive this data as an Excel print out, or occasionally .xlsx on a CD. In any case the company wide electronic process control data is usually useless for problem solving because of the time lag between the change and its identification. The survey type data (from a different organisation) is also received on paper and entered manually. In order to get the data into R/Mondrian/gGobbi etc I need to enter it manually. This could be scripted I guess but the script would be complex as I need to enter some data by row (for example data points at particular factor levels) and some data by columns (for example blocks of factors). The main problems for me are a) needing to edit data afterwards (adding new columns, etc) and b) ensuring that the data entered by row lines up with the correct factors entered earlier by column. Certainly using an office package would be easier and more efficient than using a script as far as I can see, although a dedicated csv / tab delimited data editor would be even better. Many thanks for all suggestions, though. Phil
Is the data accumulative or does it change radically? If its accumulative I would go with storing it in a DB of some sort and mining the accumulative differences. Also for accumulative data you could run a diff over the previous and current "batch" of data, this would highlight the changes so that you could concentrate on them.
If you can build the "intelligence" scripting would be your best option. Script once use many. From what you are describing perl would probably eat it, but perl is not the easiest tool to get on with, (coming from someone who has used it on a less than frequent basis for the last 15 years).
Of course it all sounds easy as a strictly hands off bystander, but if the processes you are talking about have a long time span (ie something you will be doing for the rest of your working life) then I would invest in some scripts.
Having said all that it would be really nice to see a good csv / xml editor in open source. Ah if only my programming skills were 100% better than they are now :-(
Tom. -- The Mailing List for the Devon & Cornwall LUG http://mailman.dclug.org.uk/listinfo/list FAQ: http://www.dcglug.org.uk/listfaq