November 18, 2008

Parsing CSV files that have embedded commas

Use regular expression to parse lines in a CSV to allow embedded commas:

VB:

While Not sr.EndOfStream
    Dim matches As MatchCollection = Regex.Matches(sr.ReadLine(), "(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)")
    For Each match As Match in matches
      Dim sItem As String = match.Group(0).Value
    Next
End While

C#:

while (!sr.EndOfStream){
  MatchCollection matches = Regex.Matches(sr.ReadLine(), "(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)");
  foreach(Match match in matches){
    string sItem = match.Group[0].Value;
  }
}

November 17, 2008

Custom configuration section

Code references for this post are taken from the following demonstration projects:
C#: href="http://www.4shared.com/file/78081512/11c27b29/Demo_-_Custom_Configuration_-_C.html
VB: http://www.4shared.com/file/78081513/66c54bbf/Demo_-_Custom_Configuration_-_VB.html

After having come back to custom configuration files from a previous application where I'd kind of haphazzardly thrown in a custom configuration in the "standard" format, I decided that in my current application I was going to spend some time to really design my custom configuration exactly as I wanted it, rather than how "it was supposed to be" according to the documentation found on the MSDN site - which for those interested can be found at http://msdn.microsoft.com/en-us/library/2tw134k3.aspx.

So I want to condense from my current extremely verbose format:

<MyCustomSection title="My custom configuration section">
    <SectionsCollection>
        <Section name="Section 1">
            <SubSectionCollection>
                <SubSection name="SubSection 1">
                    <ItemCollection>
                        <Item name="Item1" />
                        <Item name="Item2" />
                    </ItemCollection>
                </SubSection>
                <SubSection name="SubSection 2">...
            </SubSectionCollection>
        </Section>
        <Section name="Section 2">...
    </SectionsCollection>
</MyCustomSection>

down to:

<MyCustomSection title="This is my custom section">
    <Section name="Section1">
        <SubSection name="SubSection1">
            <Item name="Item1" />
            <Item name="Item2" />
        </SubSection>
        <SubSection name="SubSection2">
            <Item name="Item1" />
            <Item name="Item2" />
        </SubSection>
    </Section>
    <Section name="Section2">
        <SubSection name="SubSection1">
            <Item name="Item1" />
            <Item name="Item2" />
        </SubSection>
    </Section>
</MyCustomSection>

Okay, so I've defined what my custom section needs to look like... now I need to rewrite my custom section handler.

Well, as with the standard documentation, we must still have an entry in the configSection to allow the ConfigurationManager to determine what handler should be used to read our configuration section.  This will look somewhat like the following:

<configSections>
    <section name="MyCustomSection" 
      type="HandlerAssembly.HandlerNamespace.HandlerClass, HandlerAssembly" />
    <section name=....

HandlerAssembly = The name of your application assembly
HandlerNamespace = The namespace that contains your custom configuration handler
HandlerClass = The name you gave your custom configuration handler

Be aware that in VB, a Windows application and a Web application can differ slightly as does C#.  In a Windows application, you can reference the name of your configuration handler directly without specifying the assembly name; however, in an asp.net web application you are required to provide the assembly name.  For the sake of consistency, it is a good rule of thumb to always specify the name of the assembly in the configuration file.  This way you never have to remember which one requires it and which one doesn't.  Both types of application will work if you reference it, only one type will work if you don't.  In C# however, you must specify the reader as just HandlerNamespace.HandlerClass before the comma, and the assembly name after the comma:

<section name="CustomSection" 
            type="ReaderNamespace.MySectionReader, MyTestApplication">

So cutting the configuration file back to the bare bones to demonstrate - we should have something (ignoring all the default settings that are provided by .NET) that looks like the following:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <configSections>
    <section name="CustomSection" 
             type="MyTestApplication.MySectionReader, MyTestApplication" />
  </configSections>  
  <CustomSection name="Custom Section Header">
    <Section name="Section 1">
      <SubSection name="Subsection 1">
        <Item name="Item 1" />
        <Item name="Item 2" />
      </SubSection>
      <SubSection name="Subsection 2">
        <Item name="Item 1" />
        <Item name="Item 2" />
      </SubSection>
    </Section>
    <Section name="Section 2">
      <SubSection name="Subsection 1">
        <Item name="Item 1" />
        <Item name="Item 2" />
      </SubSection>
    </Section>
  </CustomSection>
</configuration>

So, on to writing our custom section handler.  We're going to need a few classes built - one that handles the collection of sections within our custom section, one that handles each section in the collection, one that handles each of the individual subsections and one that handles each item within our subsection.  Normally you see this demonstrated from the top down detailing the whole process, but I personally think it's easier to understand if you work from the bottom up.  Partly because our bottom level item doesn't contain any collections to deal with, so it's the easiest to write.

Imaginitively I called the bottom level item "item" - I know, hold the applause.  It's a relatively simple class:

VB:

Public class Item
    Inherits ConfigurationElement

    <ConfigurationProperty("name")> _
    Public ReadOnly Property Name() As String
        Get
            Return MyBase.Item("name")
        End Get
    End Property
    
End Class

C#:

class Item : ConfigurationElement
{
    [ConfigurationProperty("name")]
    public string Name
    {
        get { return (string)this["name"]; }
    }
}

Glancing at this class, you can see that it's inherited from the ConfigurationElement class, which requires you to add the reference to the System.Configuration namespace and add the necessary using/imports statement to import the namespace.

The ConfigurationProperty attribute tells the compiler that this property is actually a reference to a property within the configuration file and values should be retrieved from that rather than the class structure itself.

So now we've got our base element set up, we need to create our SubSection structure.   Our Subsection, unlike our Item, actually inherits from the ConfigurationElementCollection.

VB:

Public Class SubSection
    Inherits ConfigurationElementCollection

    Public Sub New()
        AddElementName = "Item"
    End Sub

    Protected Overloads Overrides Function CreateNewElement() As ConfigurationElement
        Return New Item
    End Function

    Protected Overrides Function GetElementKey(ByVal element As ConfigurationElement) As Object
        Return DirectCast(element, Item).Name
    End Function

     _
    Public ReadOnly Property Name() As String
        Get
            Return MyBase.Item("name")
        End Get
    End Property

End Class

C#:

class SubSection : ConfigurationElementCollection
{
    protected override ConfigurationElement CreateNewElement()
    {
        return new Item();
    }

    protected override object GetElementKey(ConfigurationElement element)
    {
        return ((Item)element).Name;
    }

    //Constructor
    public SubSection()
    {
        AddElementName = "Item";
    }

    [ConfigurationProperty("name")]
    public string Name
    {
        get { return (string)this["name"]; }
    }
}

Because we're inheriting from the ConfigurationElementCollection, we don't have to actually implement much of our collection at all.  We do have to define how items are added to our collection and any attributes that our collection may have.  In this demonstration, I've only given my collection a single attribute called name, but you can add as many as you wish giving them each names relevant to your application.

In order to use a custom node name rather than the default "add" we must override the AddElementName.  This can be done in a number of ways, but there are limitations that aren't documented depending on which you choose.  I prefer to override this in my constructor as this can be consistently achieved across the configuration element collections.  Mostly you will find the documentation handles this using the compiler attribute:

VB:

<ConfigurationCollection(GetType(SubSection), 
    AddItemName:="Item")> _
Public Class SubSection...

C#:

[ConfigurationCollection(GetType(SubSection), 
    AddItemName:="Item")]
public class SubSection...

While using this format in the standard documented way will work, if you stray from the original structure, and start nesting collections directly within other collections, you will notice very quickly that your application crashes and it will complain with the following message: Unrecognized element 'Item'.  When you define the node name right in the constructor by setting the AddElementName, it will work in all cases.  It appears that when nesting collections directly within collections, .NET will only recognize the standard "Add", "Remove" and "Clear" elements unless you override the AddElementName, RemoveElementName and ClearElementName properties directly in the constructor of your collection class.

You will also notice that each of our collection class methods returns information or works with information regarding the items held within the class, rather than itself as a collection.   The exception being the property that returns the value held by the name attribute of the collection class.

Onwards and upwards.  On to the Section class.  This class holds a collection of SubSections.  Much like the SubSection you will notice that we define the AddElementName property defining that we want to use the node name "SubSection" to define each of our sub sections.

VB:

Public Class Section
    Inherits ConfigurationElementCollection

    Public Sub New()
        AddElementName = "Section"
    End Sub

    <ConfigurationProperty("name")> _
    Public ReadOnly Property Name() As String
        Get
            Return MyBase.Item("name")
        End Get
    End Property

    Protected Overloads Overrides Function CreateNewElement() As System.Configuration.ConfigurationElement
        Return New SubSectionCollection
    End Function

    Protected Overrides Function GetElementKey(ByVal element As System.Configuration.ConfigurationElement) As Object
        Return DirectCast(element, SubSectionCollection).Name
    End Function

End Class

C#

class Section : ConfigurationElementCollection
{
    public Section()
    {
        AddElementName = "Section";
    }

    [ConfigurationProperty("name")]
    public string Name
    {
        get { return (string)this["name"]; }
    }

    protected override ConfigurationElement CreateNewElement()
    {
        return new SubSectionCollection();
    }

    protected override object GetElementKey(ConfigurationElement element)
    {
        return ((SubSectionCollection)element).Name;
    }

}

So we have our Section, SubSection and Item classes, but you may have noticed that our Section class references a SubSectionCollection rather than SubSection directly.  We need to create a simple helper class which is basically a container class for a collection of sub-sections, imaginitively, I'll call it SubSectionCollection. I know, I'm a genius :oP so lets create our container class:

VB:

Public Class SubSectionCollection
    Inherits ConfigurationElementCollection

    Public Sub New()
        AddElementName = "SubSection"
    End Sub

    <ConfigurationProperty("name")> _
    Public ReadOnly Property Name() As String
        Get
            Return DirectCast(MyBase.Item("name"), String)
        End Get
    End Property

    Protected Overloads Overrides Function CreateNewElement() As System.Configuration.ConfigurationElement
        Return New SubSection
    End Function

    Protected Overrides Function GetElementKey(ByVal element As System.Configuration.ConfigurationElement) As Object
        Return DirectCast(element, SubSection).Name
    End Function

End Class

C#

class SubSectionCollection : ConfigurationElementCollection
{
    //Constructor
    public SubSectionCollection()
    {
        AddElementName = "SubSection";
    }

    [ConfigurationProperty("name")]
    public string Name
    {
        get { return (string)this["name"]; }
    }

    protected override ConfigurationElement CreateNewElement()
    {
        return new SubSection();
    }

    protected override object GetElementKey(ConfigurationElement element)
    {
        return ((SubSection)element).Name;
    }
}

Now, we can nest as many collections inside each other as we wish without having to add extra property nodes to contain our collections, which simplifies the design of our configuration file...as well as simplifying our class structure.

The last step is to reference the collection held directly by our custom section.   This is referenced exactly the same way as we normally would with one very minor difference.  Where normally we would specify our configuration property by name, we instead pass in an empty string.

Replace:

VB:

Public Class MySectionReader
Inherits ConfigurationSection

    <ConfigurationProperty("name")> _
    Public ReadOnly Property Name() As String
        Get
            Return DirectCast(MyBase.Item("name"), String)
        End Get
    End Property

    <ConfigurationProperty("OldNeedlessNodeName", 
      IsDefaultCollection:=True, 
      IsRequired:=True)> _
    Public ReadOnly Property Items() As DefaultCollection
        Get
            Return DirectCast( _
  MyBase.Item("OldNeedlessNodeName"), DefaultCollection)
        End Get
    End Property

End Class

With:

VB:

Public Class MySectionReader
    Inherits ConfigurationSection

    <ConfigurationProperty("name")> _
    Public ReadOnly Property Name() As String
        Get
            Return DirectCast(MyBase.Item("name"), String)
        End Get
    End Property

    <ConfigurationProperty("", 
      IsDefaultCollection:=True, 
      IsRequired:=True)> _
    Public ReadOnly Property Items() As DefaultCollection
        Get
            Return DirectCast(MyBase.Item(""), DefaultCollection)
        End Get
    End Property

End Class

So now we've finished building our custom configuration section handler. All we need to do is provide a reference to it in our main application and we can treat it as we would any normal class:

VB:

Dim oSR As MySectionReader = _
   ConfigurationManager.GetSection("CustomSection")

Dim Title As String = oSR.Name
With Console
    .WriteLine(Title)
    .WriteLine("".PadRight(Title.Length, "-"))

    For Each section As Section In oSR.Items
        Dim OuterTitle As String = section.Name
        .WriteLine(Space(2) & OuterTitle)
        .WriteLine(Space(2) & "".PadRight(OuterTitle.Length, "-"))

        For Each SubSection As SubSection In section
            Dim InnerTitle As String = SubSection.Name
            .WriteLine(Space(4) & InnerTitle)
            .WriteLine(Space(4) & "".PadRight(InnerTitle.Length, "-"))

            For Each Item As Item In SubSection
                .WriteLine(Space(6) & Item.Name)

            Next
            .WriteLine()
        Next
        .WriteLine()
    Next
End With

C#

MySectionReader oSR = (MySectioMySectionReader oSR = _
  (MySectionReader)ConfigurationManager.GetSection("CustomSection");

string Title = oSR.Name;
Console.WriteLine(Title);'-', Title.Length));

foreach(SubSectionCollection section in oSR.Items)
{
    string OuterTitle = section.Name;
    string startChars = new String(' ', 2);
    Console.WriteLine(startChars + OuterTitle);
    Console.WriteLine(startChars + new String('-', OuterTitle.Length));

    foreach (SubSection subSection in section)
    {
        string InnerTitle = subSection.Name;
        startChars = new String(' ', 4);
        Console.WriteLine(startChars + InnerTitle);
        Console.WriteLine(startChars + new String('-', InnerTitle.Length));

        foreach (Item item in subSection)
        {
            Console.WriteLine(new String(' ', 6) + item.Name);
        }

        Console.WriteLine();
    }
    Console.WriteLine();
}

So there we have it - our custom condensed configuration section, our reader and the method by which to read the custom node names.  As well as how we access the default collection without requiring it inside a property element.

November 12, 2008

Predicates / List.FindAll()

When I came across the term predicates just after the advent of .NET 2.0, it took a great deal of my time to understand the term given Microsoft’s atrocious “EndsWithSaurus” example (http://msdn.microsoft.com/en-us/library/fh1w7y8z.aspx) which I referenced in my previous post. The basic concept is that given a list of dinosaurs: List<string>/List(Of String) we need to obtain a list of items that end with the string “saurus”. We will ignore the details of how Microsoft did that for the moment…

The 1.1 way…

So to demonstrate the original way of finding all matching items in our list (pretend our list of dinosaurs is called Dinosaurs and is already populated with a bunch:

VB:

Dim Sauruses As New List(Of String)
Dim SearchString As String = "saurus"
For Each Item As String In Dinosaurs
    If Item.SubString(Item.Length – SearchString.Length) = _
      SearchString Then
        Sauruses.Add(Item)
    End If
Next

C#:

List<string> Sauruses As new List<string>;
string SearchString = "saurus";
foreach(string item in Dinosaurs){
    if(item.SubString(item.length - SearchString.Length) ==
       SearchString{
           Sauruses.Add(Item)
       }
}

As you can see, the 1.1 way we would simply have iterated over the list and pulled out matching items which isn’t a particularly huge chore, although it would be nice if it were more elegant.

The 2.0 Way…

In 2.0 the concept of predicates has been added along with the FindAll method in our List class. What on earth are Predicates? In grammatical terms, a sentence is made up of two parts – the subject which is the object of discussion; and the predicate which provides some reflection on the subject perhaps in the form of a description.

Imagine the sentence: “My hair is brown”, given my previous description, you’ve probably already created two containers in your head: Subject and Predicate and you’ve most likely already attributed “Hair” as the subject and “Brown” as the predicate. In programming terms, a predicate is some method that reflects on the descriptive properties of an object and returns some value depending on our method definition.

I’ll simplify Microsoft’s predicate definition because I firmly believe that if an example is easier to read, it makes the concept easier understand…

VB:

Private Function EndsWith(ByVal s As String) As Boolean
    Return s.EndsWith = "saurus"
End Function

What’s going on here? Okay, think for a moment of a single item in our list of dinosaurs and forget the rest of the list for the moment. We are passing a string value into the method which is evaluated as either ending in saurus… or not. This will be used as our predicate and is no different than any regular function. In this case it provides us with a description of our string – stating that it either ends in “saurus” or doesn’t.

One thing to note is that when a function is to be used as a predicate it can only have a single argument that must match the type of the object that will be referenced. In our case, each list item is a string, so the argument must be of type string. The return type must be of whichever type the method invoking the predicate requires to complete its task. FindAll requires a Boolean value, so we return a Boolean.

In .NET 2.0, the List.FindAll method iterates through each of the items in the list and passes the item into the predicate we defined. So the s parameter will refer to the name of the current dinosaur we’re checking. Sadly, nobody had the foresight to allow arguments to be passed to the predicate as it is called on each iteration, so our options were to at worst hardcode what we’re looking for which is the example provided by Microsoft (I’ll bite my tongue here and refer to my previous post); at best involves wrapping the search in a class to allow for thread safety which is more work than necessary in most cases; or by using a class level static field, which isn’t great programming practice and is not thread safe, but unfortunately these are the cards we’ve been dealt and it is with those that we must play…

In this example, we set the _SearchTerm field as “saurus” and then we tell the FindAll method that the EndsWith method is the predicate that will be referenced for each of the items in the list. Think of the FindAll method in a similar manner as ForEach; it iterates through the list and runs the predicate method for each item. The s parameter in our predicate method becomes the currently referenced instance in the iteration.

Don’t be distracted by the use of AddressOf if you’re not familiar with it – The AddressOf keyword is really just telling the FindAll() method that with each iteration it is using the EndsWith() method to evaluate if the subject (in this case, the name of our dinosaur) meets the find condition. The EndsWith() method becomes the predicate container, The search condition: Ends with saurus becomes the content of our predicate container and when the predicate evaluates, it returns true or false notifying FindAll whether the current item should be in the return list or not… (Note: there is a better way to do this in 3.0 and I go on to describe it further down…)

VB:

Private _SearchTerm As String

Private Function EndsWith(ByVal s As String) As Boolean
    Return s.EndsWith = _SearchTerm
End Function

Sub Main()
    _SearchTerm = "saurus"
    Dim sauruses As List(Of String) = _
          dinosaurs.FindAll(AddressOf EndsWith)
End Sub

C#:

private static string _SearchTerm = null;

private static bool EndsWith(string s)
{
    return s.Substring(s.Length - _SearchTerm.Length) == _SearchTerm;
}

static void Main(string[] args)
{
    _SearchTerm = "saurus";
    List<string> sauruses = dinosaurs.FindAll(EndsWith);
}

I can’t wrap my head around why nobody thought that providing the ability to pass a search term into the FindAll method was unnecessary, but somehow it was missed… and the inability to use the search in multiple concurrent threads is ridiculous. If you want to use it in different threads, you would have to build a wrapper class to do the search for you which instead of saving you coding, actually causes far more than necessary! Where we’ve attempted to sidestep list iteration with our For…Each or While Iter.MoveNext… or however we’ve passed over the list, we’re now having to write a whole extra search class that allows us to pass in a search term and the FindAll method which had the potential to be extremely useful became useless in many cases.

The 3.0 Way…

In 3.0, there are a few ways this could be achieved, the most useful it seems is with another new concept called Lambda functions. A lambda function is resurrected from mathematics – the concept is basic, while its paradigm is another big shift for VB programmers that have never had this concept to grasp previously. Think of it as an inline function… it’s more or less the same as the predicate concept we’ve just discussed, but allows for more flexibility because unlike the predicate function which was separate and thus didn’t have access to private variables a lambda function is inline and has access to variables and objects that fall within the current scope of the current method. As demonstrated below…

VB:

Dim SearchString As String = "saurus"
Dim sauruses As List(Of String) = _
    dinosaurs.FindAll( _
        Function(item) item.EndsWith(SearchString) _
    )

C#:

string SearchString = "saurus";
List<string> sauruses =
    dinosaurs.FindAll(
        item => item.EndsWith(SearchString)
    );

Now you can see, we don’t need to hard code the search term as our inline (lambda) expression has access to our search criteria.

So how does it work? Well just like before the FindAll method does the iterative work for you. For each item in the list (item), the current dinosaur in this case. The lambda function tests to see if the item ends with the search criteria returning true or false to the FindAll() method, notifying it whether or not the current item belongs in the return list.

November 11, 2008

What's with the title?

Recently, a colleague of mine has been persistently harassing me to start a blog. I secretly think it’s so that he can spare himself much of the leg work I’ve already done… after much resistance, I've finally caved and succumbed to the blogosphere... I hate that term almost as much as "Google it!"...

I have spent countless hours trawling through the many disparate corners of the internet, attempting to understand various concepts and paradigms as they’re introduced into .NET. So often, the examples I find are poorly coded and have even worse descriptions. This is my attempt to give back to the programming community and demonstrate that these examples can be simple to understand when explained properly. So here, I will deliberate on the trials and tribulations of my own .NET insights, inadequacies and musings and try and collate all my thoughts and understandings on various topics in a single place so that others won’t be destined to repeat the same work that I have.

"EndsWithSaurus"?

Some time ago – it feels like a lifetime ago now, but it can’t have been much earlier than last year… I needed to do something that programmers have been doing since the dawn of programming; something we do on every trip to the grocery store - search for matching items in a list. Before I even started typing, I wondered if there’s something more elegant than looping over the list a la for…each. Enter stage left the List.FindAll() method which looks like it should do this – and it does. But it does so using a concept which at the time was alien to me – predicates (which are the subject of my next article).

A quick search on the MSDN website for List.FindAll will bring you across an article attempting to demonstrate this concept (http://msdn.microsoft.com/en-us/library/fh1w7y8z.aspx) – which as far as I’m concerned deserves eternal recognition on Jeff Atwood’s Coding Horror website (http://www.codinghorror.com/blog/). To this day I cannot comprehend why anything as mundane as finding items in a list would end with an example so poorly conceived that while it demonstrates the concept, is a leap to understand and promotes a coding practice so poor that it should horrify even the most novice programmer.

In the example we are provided with a method called “EndsWithSaurus” which… wait for it… tests whether or not a string ends with “saurus”… imaginative eh? The fact that the search term is a hard coded method still astounds me…

Don’t hard code anything that could be variable!

That such a basic tenet of programming has been blatantly ignored by such a major player in this industry is abhorrent in my eyes. Now any time I come across examples that promote such bad coding practices, I can’t help but think “ends with saurus” and chuckle to myself…