Referential Architectures and Data Concerns
Current technologies reflect the need of the business at certain times. The need started with managing data effectively and Transactional and Relational databases responded. Analytics were required and OLAP became mainstream. Businesses balked at dealing with slow IT and behold Qlikview and Tableau. Cell Phones, Social Media and the Internet made demands of data that were not thought of. Volume and Velocity suddenly become important. There is talk that the data warehouse days are numbered. I dont believe that to be true. They just are not set up to deal with the current need for getting data quickly to a point where analytics can take place. Time to Analytics is key and the standard path looks like this> Data Source sends data to external source. Data Integration takes place, business rules applied. Aggregation happens in the OLAP component and then the business can look at it. There is a production window for this that usually starts when a business ends for the night. The data is offloaded and manipulated as discussed and then sent to the data warehouse and then aggregated in OLAP (not always but assume so for the lesson). At the end its available for analysts to come in and slice and dice it. This usually should happen in the production window before morning. Companies usually underestimate the amount of data that needs to be managed daily.
Assume that within the next few years the amount of data managed by these systems will increase to 30TB for midsized companies. How do you manage that in the context of the current architecture?
Welcome to the Big Data and the New Architectures. Hadoop, Columnar databases and Correlational Databases. A brief introduction will be given here. Note these architectures are complements to the old not replacements. However the lesson to remember is similar to what occurred in India with Cellular Infrastructures. Most of the original hard wiring was so difficult to implement in areas where zoning was impossible to even understand that a lot of people skipped land lines and went directly to the new tech. This is very similar to what Google, Amazon and FB are doing with technologies such as Hadoop which previously did not need to exist.
Small and midsize companies can benefit from this technology and become extremely competitive in a short time. The advantage the larger players will have will relate to their ability to manage their own internal knowledge. Knowledge management is a crucial value to companies that seek to stay ahead of smaller companies that can seemingly now come out of nowhere. Lets look at some of that tech.
Hadoop was brought about by the need to look up information through massive libraries at a fast rate. This can only be done by parceling out the work and letting the servers take on a piece of that work, the intelligence lies in knowing how to split the work across servers and you can maximize this by keeping servers close to each other (in the same rack etc) or understand what the split is of servers that are close and those that are far. This is why the Cloud is not the best architecture for Hadoop since you remove a critical element of the original design which is knowing where the servers are in relation to each other.
Columnar Databases are akin to keeping data in columns instead of tables. The advantage would be that for the most part most sql statements on a relational database would need to go through the entire table to find everyone who lived on the same street. Whereas in a columnar database if you were to keep street data in alphabetical order the search would know where to start and when it had read the last record so it might only need to read 100 records from a very large set. This reduces the work significantly.
Correlational databases are the minimalists of the database, think of it as storing something unique only once. For instance if someone is called James then his name is stored in the name record. Then he is stored in the table as 1. If someone else is called James then you dont save his name again, you just save him as 2 in the table. To illustrate this. In relational you would have James stored twice as String type with a allocated size. In Correlational you would have James, 1, 2. The first as String the next two as integers (possibly a considerably smaller footprint to the original. As the database increases in size the payoff in managing size becomes exponentially better.
So those are the newer types of architectures that you should expect to see and if you are at all concerned about what sort of trouble the business is going to get you into next make sure that you get some POCs done and have some good consultants help you with the vendor selection.











