XML Serialize/Deserialize
Simplification of XML serializing/deserializing for fun and profit is (relatively) easy and not-particularly-but-somewhat useful in certain circumstances. It need not be complex as it is well documented and relatively straightforward to figure out yourself but I like code re-use so I wrapped the common case where we are not concerned with XML namespaces and other complexities. So here is a wrapper abstraction for the XmlSerializer class which helps serializing/deserializing using the generic tuple specifier in the Serializer class:
using System; using System.Collections.Generic; using System.Xml.Serialization; using System.IO; using System.Xml; public abstract class SerializerBase { protected static readonly Dictionary<Type, XmlSerializer> Serializers = new Dictionary<Type, XmlSerializer>(); } public class Serializer<T> : SerializerBase { //XmlSerializer works by generating an assembly on the fly; //if you use the XmlSerializer(Type) constructor, it caches it //and looks it up; but for any other constructor it doesn't. //And assemblies can't (usually) be unloaded. So you just get //more and more and more assemblies eating your memory. //need to cache the serializer if you are running this in a loop //private static readonly Dictionary<Type, XmlSerializer> serializers = new Dictionary<Type, XmlSerializer>(); public static string Serialize(T obj) { return Serialize(obj, new Type[] { }); } public static string Serialize(T obj, Type[] extraTypes) { return _serialize(obj, _makeSerializer(typeof(T), extraTypes)); } public static T Deserialize(string xml) { return Deserialize(xml, new Type[] { }); } public static T Deserialize(string xml, Type[] extraTypes) { if (string.IsNullOrEmpty(xml)) throw new ArgumentNullException("xml"); XmlSerializer serializer = _makeSerializer(typeof (T), extraTypes); using (StringReader stream = new StringReader(xml)) { try { return (T)serializer.Deserialize(stream); } catch (Exception ex) { throw new InvalidOperationException("Failed to create object from XML string.", ex); } } } private static XmlSerializer _makeSerializer(Type T, Type[] extraTypes) { if (extraTypes.Length <= 0) return new XmlSerializer(T); XmlSerializer xmlSerializer; if (Serializers.TryGetValue(T, out xmlSerializer)) return xmlSerializer; Serializers.Add(T, new XmlSerializer(T, extraTypes)); return Serializers[T]; } private static String _serialize(T obj, XmlSerializer serializer) { StringWriter stringWriter = new StringWriter(); try { using (XmlTextWriter xmlWriter = new XmlTextWriter(stringWriter) { Formatting = Formatting.None }) { xmlWriter.WriteRaw(""); XmlSerializerNamespaces xmlNameSpace = new XmlSerializerNamespaces(); xmlNameSpace.Add("", null); serializer.Serialize(xmlWriter, obj, xmlNameSpace); stringWriter.Flush(); return stringWriter.ToString(); } } catch (Exception ex) { throw new InvalidOperationException("Failed to create XML string from object.", ex); } finally { stringWriter.Close(); stringWriter.Dispose(); } } }
What Does Implementation Look Like
Suppose we have a trivial class of objects Customer.
using System.Xml.Serialization; [XmlRoot(ElementName = "customer")] public class Customer { [XmlAttribute(AttributeName = "id")] public int ID; [XmlElement] public string Name; [XmlElement] public string Surname; }
Then the implementation of the Serializer<T> for Customer would look like this:
Customer customer = Serializer<Customer>.Deserialize("<customer id=\"123\"><Name>Joe</Name><Surname>Soap</Surname></customer>"); Console.WriteLine(customer.ID); //123
We can create an instance of an object of type customer from a hard-coded string as above an reverse the process by using serialize as follows:
string strCustomerXml = Serializer<Customer>.Serialize(customer); Console.WriteLine(strCustomerXml); //<customer id="123"><Name>Joe</Name><Surname>Soap</Surname></customer>
So far this is all pretty straightforward and simple in the trivial cases where serialization is not really all that needed. Which brings up the question -
The short answer is "it isn't"; as the old berates about XML go "if you have a problem you might be tempted to solve it with XML - now you have 2 problems" and "XML is like violence - if it's not working then you are not using enough".
I forget who these quotes are attributed to but it matters little as with all knee-jerks it is both truth and deception depending on the circumstances we are dealing with and how we deal with them.
In my experience, XML is Good EnoughTM way of representing hierarchical data structures in systems where some human-readable abstraction is helpful for verification. In web CMS systems, for example, it's a good way of storing and translating discrete data which can later be converted to and from HTML, which is (supposed to be) a subset of XML.
Let's expand on the Customer class and make it slightly more complex so we can see why the Deserialize/Serialize methods are overloaded. Suppose the Customer class contains a property of a type which is another custom class as such:
using System.Collections.Generic; using System.Xml.Serialization; [XmlRoot(ElementName = "customer")] public class Customer { [XmlAttribute(AttributeName = "id")] public int ID; [XmlElement] public string Name; [XmlElement] public string Surname; [XmlElement(ElementName = "invoice")] public List<Invoice> Invoices = new List<Invoice>(); } [XmlRoot(ElementName = "invoice")] public class Invoice { [XmlAttribute(AttributeName = "id")] public int ID; [XmlArray(ElementName = "products")] [XmlArrayItem(ElementName = "product")] public List<Product> Items = new List<Product>(); } [XmlRoot(ElementName = "product")] public class Product { [XmlAttribute(AttributeName = "id")] public int ID; [XmlElement] public string Name; [XmlElement] public string Description; }
We can see here that the hierarchy has expanded and there are multiple hierarchies represented in a single class type Customer.
We are representing complex relations which in RDBMS SQL structure would be represented by normalized tables Customer, Product and Invoice.
Then a linking table InvoiceProduct containing foreign keys to the Invoice and Product table as well as a foreign key in the Invoice table to the Customer primary key, in order to represent the [0,*] relationship between customers and invoices (in this case Product doesn't represent stock but rather the product on offer).
This is a complex relational structure which would need to be resolved programatically in some way.
If we are using OO, hierarchical data as above is one way of resolving it. Another would be object-relational mapping (storing data in RDBMS and manually populating object properties from SQL table fields or using an automated ORM system e.g. NHibernate, Entity Framework).
Where this approach is helpful (even with object-relational mapping storage) is that when presenting the data as XML via a web service, some form of conversion from object data types to XML would need to take place. Using this approach simplifies the final step of outputting the data in the XML format.
So a meaningful use would be if you have a particular set of requirements where the object class definitions serve as a programmatic intermediary for computation but also for XML output.
Unrelated side note: I also store settings with a utility class which implements this to serialize objects from an XML file in the ~/App_Data folder; which allows me to play with non-environment, cross-server agnostic but app specific settings between deployment servers (for testing, staging and production). I don't have to worry about overwriting environment specific settings in web.config e.g. different connection strings and location/@path security while the XML is in a human-readable and, for an HTML techy, familiar markup format (editable on the production server, if need be, with any text editor).
OK so lets actually deal with the problem of deserializing the Customer class. The XML now might look like this:
<root> <customer id="123"> <Name>Joe</Name> <Surname>Soap</Surname> <invoice id="456"> <products/> </invoice> <invoice id="789"> <products> <product id="456"> <Name>SoapX</Name> <Description>Soap X: The soap-on-a-rope that makes soaping "dope"!</Description> </product> </products> </invoice> </customer> <customer id="90210"> <Name>Jane</Name> <Surname>Doe</Surname> </customer> </root>
2 customers. Soap, Joe (id 123) and Doe, Jane (id 90210) where Jane has no invoices and Joe has 2 invoices, 456 and 789.
One containing an empty list of products and the other with a list of products containing one item: product SoapX (id 123).
In order to convert to and from the XML representation of Customer, the Serializer<Customer> utility needs to be passed the complex types of the Customer properties (namely Invoice) as well as the complex types of the properties of the sub-properties (so for Invoice property, Product). The rabbit hole stops at some sub-sub-... sub-property. For this example at Product but if Product contained other complex types (e.g. Stock), those would need to be included in the array.
Before we see this illustrated with code, let's store the XML in a separate file Example.xml and load the instance of the objects from the file.
System.XmlXmlDocument xmlDoc = new System.XmlXmlDocument(); xmlDoc.Load("[full-path-to]/Example.xml"); var nodes = xmlDoc.DocumentElement.SelectNodes("customer"); for (int i = 0; i < nodes.Count; i++) { var node = nodes.Item(i); Customer customer = Serializer<Customer>.Deserialize(node.OuterXml, new Type[] { typeof(Invoice), typeof(Product) }); Console.WriteLine(customer.ID); string strCustomerXml = Serializer<Customer>.Serialize(customer, new Type[] { typeof(Invoice), typeof(Product) }); Console.WriteLine(strCustomerXml); }
123 <customer id="123"><Name>Joe</Name><Surname>Soap</Surname><invoice id="456"><products /></invoice><invoice id="789"><products><product id="456"><Name>SoapX</Name><Description>Soap X: The soap-on-a-rope that makes soaping "dope"!</Description></product></products></invoice></customer> 90210 <customer id="90210"><Name>Jane</Name><Surname>Doe</Surname></customer>
So it is fairly simple and straight forward.
Also, it's rather cumbersome keeping track of all the additional types per method call. It is tedious and, more alarmingly, error prone.
Going forward, we might introduce additional types and then we'd have to update all the implementations. We would not get a compile time exception if we missed any and, worse, we wouldn't even get any runtime exceptions either as the data would simply be omitted from being processed on either serializing or deserializing to and from XML.
For these reasons I introduce another abstraction level to automate additional type properties as soon as the modifications to any of the classes in the hierarchy chain occur. This way, adding a new property of a new type to any classes in the chain will automatically propagate to all the input and output serializing points.
using System; using System.Linq; using System.Xml.Serialization; [Serializable] public abstract class SimpleSerializable<T> where T : SimpleSerializable<T>, new() { private readonly T _raw = default(T); // Static Fields In a Generic Type // it's ok to have a static field in a generic type // so long as we get one field per type argument. // In the vast majority of cases, having a static field // in a generic type is a sign of an error. // The reason for this is that a static field in a generic type // will not be shared among instances of different close constructed types. // This means that for a generic class SimpleSerializable<T>, values of // SimpleSerializable<Customer>.types and SimpleSerializable<Customer>.types // have different, independent values. In this case, this is not only preferable // but actually unavoidable. private static readonly Type[] types = new Type[] { }; static SimpleSerializable() { types = typeof(T).Subtypes(new Type[] { }).ToArray(); } protected SimpleSerializable() { if (_raw == null) _raw = (T)Convert.ChangeType(this, typeof(T)); } protected SimpleSerializable(string representation) { _raw = Serializer<T>.Deserialize(representation, types); } public T Object() { return _raw; } public override string ToString() { return Serializer<T>.Serialize(_raw, types); } }
The key benefit of this abstraction, is the Subtypes extension which loads all the relevant property complex types on startup when the application is initially loaded.
It's worthwhile disclosing that I'm out of my depth when it comes to reflection bindings and suchlike core .NET/IL/CLR... er... stuff so take this extension with a pinch of salt and modify it as you encounter bugs (especially the excluded variable) which is how I got it working to being with.
It works but has not been field tested in any other field except my own UoD of web dev and web services. Nonetheless, here's a single-field-tested version:
public static class TypeExtensions { public static IEnumerable<Type> Subtypes(this Type typ, Type[] state) { Type[] excluded = new Type[] { typeof(Array), typeof(Type[]), typeof(Object), typeof(string), typeof(XmlNodeList) }; Type[] initial = Subtypes(typ) .Where(itm => !itm.IsPrimitive & (itm.TypeInitializer == null || itm.TypeInitializer.IsConstructor) & !itm.IsCOMObject & !itm.IsArray & itm.IsClass & !itm.IsEnum & !excluded.Contains(itm)) .ToArray(); foreach (Type type in initial) if (!state.Contains(type) & type.Subtypes().Any(t => t.IsAssignableFrom(typeof(ISerializable)))) { initial = initial .Concat(Subtypes(type, initial.Concat(state).Distinct().ToArray())) .Distinct() .ToArray(); } return state.Concat(initial); } private static IEnumerable<Type> Subtypes(this Type typ) { return typ .GetProperties(System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Instance) .MergeWith(typ.GetFields(System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Instance), prp => prp.PropertyType, fld => fld.FieldType); } }
So now we are free to define the Customer class as an extension of the SimpleSerializable, which will allow us to utilize it's initializer and ToString to set with and get XML format representation of the object instance without worrying about omitting complex subtypes in the properties. This means modifying the Customer class as follows:
[XmlRoot(ElementName = "customer")] public class Customer : SimpleSerializable<Customer> { public Customer() { } public Customer(string strObjXml) : base(strObjXml) { } [XmlAttribute(AttributeName = "id")] public int ID; [XmlElement] public string Name; [XmlElement] public string Surname; [XmlElement(ElementName = "invoice")] public List<Invoice> Invoices = new List<Invoice>(); }
The implementation can now be:
System.XmlXmlDocument xmlDoc = new System.XmlXmlDocument(); xmlDoc.Load("[full-path-to]/Example.xml"); var nodes = xmlDoc.DocumentElement.SelectNodes("customer"); for (int i = 0; i < nodes.Count; i++) { var node = nodes.Item(i); Customer customer = new Customer(node.OuterXml).Object(); Console.WriteLine(customer.ID); string strCustomerXml = customer.ToString(); Console.WriteLine(strCustomerXml); }