Consider Simplicity, Verily.

If you’re a perfectionist, it’s really hard to limit the efforts you put into developing code. A part of you wants to write the perfect code, while another reminds you that you haven’t time for that, and you will have to settle for good enough code. Today’s entry is exactly this: an ambitious design that was reduced to merely OK code.

turbo-napkin

I needed to have an exporter (but no importer) to CSV format for C++. One of the first thing that came to mind is to have a variant-like hierarchy that can store arbitrary values, each specific class having its own to_string function, and then have some engine on top that can scan a data structure and spew it to disk as CSV. That’s ridiculously complicated—very general—but ridiculously complicated.

The simplest approach is just to have a stream (either a specialization of, or having-a, std::ostream) that overloads the different operators <<, inserts commas between elements and knows about the end of line.

The hard part is to figure how to have a state variable bound to a stream what resets itself at each end of line (to insert commas between each field). Simply put, we overload the endl function to reset the comma insertion state variable. The code looks like this:

#ifndef __MODULE_CSV_OSTREAM__
#define __MODULE_CSV_OSTREAM__

#include <iostream>
#include <string>

namespace csv 
 {

class csv_ostream //: public std::ostream
{
 private:

     std::ostream & out;
     bool first;

     // this function prints a comma unless
     // it's the first fields since last std::endl
     void comma()
      {
       if (!first) out << ',';
       first=false;
      }

     // this function wraps strings according
     // to CSV escape mechanisms.
     std::string csv_quote(const std::string & to_quote)
      {
       bool has_comma=false;
       std::string quoted;
       for (char c : to_quote)
        {
         has_comma |= (c==',');
         if (c=='\"')
          quoted.append("\"\""); // ""
         else
          quoted.push_back(c);
        }
        
       return 
        has_comma ? "\""+quoted+"\"" : quoted;
      }

 public:

     ////////////////////////////
     // Ends the CSV line (and syncs
     // streams)
     void flush()
      {
       out.flush();
       first=true;
      }

    /////////////////////////////////////
    // for everything that is not a string,
    // we can use the default operator<<
    template <typename T>
    inline csv_ostream & operator<<( const T & t )
     {
      comma();
      //(*(std::ostream*)this) << t;
      out << t;
      return *this;
     }

    /////////////////////////////////////
    inline csv_ostream & operator<<(const std::string & s)
     {
      comma(); 
      out << csv_quote(s);
      return *this;
     }

    /////////////////////////////////////
    inline csv_ostream & operator<<(const char * s)
     {
      comma(); 
      out << csv_quote(s);
      return *this;
     }

    /////////////////////////////////////
    // this one is needed to get std::endl
    // to work properly (without making it
    // friend)
    void put(char x) { out << x; }


 // the ctor binds the CSV stream
 // over some kind of ostream (so
 // it also works for files and pipes)
 csv_ostream (std::ostream & csv_stream )
   : out(csv_stream),first(true)
  {}

 virtual ~csv_ostream() { flush(); }


 // these are stub definitions for
 // iomanip-like operators (such as
 // std::endl)
 typedef csv_ostream & (*csv_ostream_manip)(csv_ostream &);
 csv_ostream & operator<<(csv_ostream_manip manip) { return manip(*this); }
};

} // namespace csv

////////////////////////////////////////
//
// Overloads endl for csv_ostream
//
namespace std
 {
  inline csv::csv_ostream & endl(csv::csv_ostream & out)
  {
   out.put('\n');
   out.flush();
   return out;
  }
} // namespace std

#endif
  // __MODULE_CSV_OSTREAM__

There are two explicit overloads, for std::string and for const char *, because the type of "a string" is first const char *, then, if it happens to have a conv-ctor chain in its context (and it might not), std::string. You create an instance of the csv_ostream, binding it on an already opened stream:

csv::csv_ostream thingie(std::cout);

then you use it as a normal stream: operator (and endl) overloading does the rest for you, inserting commas and escaping things, picling the types you’ve provided an extra operator<< (because you’ll need to, to accommodate a new data type).

*
* *

Yes, you remembered: I did something, maybe a little more complicated, quite similar a while ago. There also, the idea was to provide a familiar abstraction with slightly modified semantics: you shove things in a file, they get structured and time-stamped. Here, they get CSVifizied: commas are added, strings are quoted according to the CSV (officious) standard, and the output can be imported correctly in, say, Excel (or Gnumeric). But to use it, if you already know about C++ stream, you don’t really have to learn anything more than how to call the constructor.

*
* *

One could argue that the first idea (having a full variant hierarchy and export engine) is a better solution because it provides potentially more generality. That could be right if I needed it, but I didn’t. I merely needed to export to CSV, to create a CSV file that can be imported by a database engine with no fuss. All the better if the solution takes the form of a familiar metaphor, disguises itself as something we’re all already familiar with, a C++ std::ostream.

Sometimes, solving a problem in a good enough is better than solving it perfectly.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: