For the last few months, I have been working with the Elite Dangerous data set for the project here https://elitewebgl.com, currently very beta, very slow, but fun. Going through my blog and other posts you will see that I talk about the EDDN data frequently along with humanities, physics, and everything else under the sun.
The dataset for this game, served by the EDDN group, is quite rich and great for experimenting with different types of analytics. While working on that and WebGL 3D canvases, the ideas spawned for the post: Quantum Strings Exist In The Third Dimension We Are Actually in the Fourth Dimension.
Then I started to think, how do I know how to do this? I was never trained to do this. I only went to college for two semesters of EE and then struck out into the world. So how do I know how to do this? Or rather, I should say how you, me, and we all know how to do this.
Humans are natural pattern matchers. Our brains are wired to find patterns in everything we experience as we flow through this thing called life. We are the most powerful computers in the world. Even the smartest AI will never have wisdom nor form conclusions as fast as the human brain can. That is the spark that we can not give computers, the human spark. Don’t ever think any computer will be smarter than any human brain. We can make split-second decisions based solely on what we know about the current situation, in most cases faster than the network lag a computer would experience trying to find the same answer. We also have empathy and compassion, which computers do not and never will have. Computers are bit-beasts, 1 or 0, on or off, yes or no, death or life, nothing else. Even the theories on quantum computing show that though very fast, they will still be bit bound at some point. On the other hand, we have an analog brain, no bit limits here. If we can think it, we can do it, good or bad.
The world of big data and analytics has grown at an astounding rate over the last decade. Back then, a 50GB per week data source was considered a gold mine for analytics processing. Now that is just a drop in the bucket. In some high traffic situations, more than 50GB per hour of data is stored, analyzed, enriched, analyzed again, over and over every minute of every day, all day.
Along with this massive increase in the need to store and analyze this data has come a rash of certifications and training that I believe on some level has muddied the waters and may make some think that this kind of work is now out of reach. With the current paradigm shift of people from the hospitality and foodservice industries into other career paths, I want to offer hope to those looking down this road but may feel overwhelmed by the multitude of titles, training, and available certifications.
Some will have you believe you need a degree in calculus to understand how this works, or that is at least what some would want you to believe. But what is calculus? At the fundamental level, calculus is a combination of many other math disciplines, with the result being a way to calculate the continuous change of a system. It does not matter what the system is. It could be aerodynamics, thermodynamics, particle physics, or even down to watching the continuously changing price of grain; calculus is used everywhere. Actually, in most modern analytics use cases, the full name would be multivariable differential calculus. Sounds brain-numbing, but it is not.
Our brains are naturally wired to look at things this way. Unfortunately, looking at a calculus equation scares many people away. It for sure did scare me away for a long time. And then, while working on the project mentioned at the beginning of this post, I thought, we do it naturally all day, every day. The equations are just one way of explaining it to others. It simply comes down to taking a snapshot of a specific data set range and comparing it to other ranges in the same data set.
Let us take a simple look at this on the human level. You pinch yourself. It hurts. You know not to do that again. On the most basic level, this is calculus. You have gathered data; pinching yourself hurts. This data tells you not to do that again because it will hurt. That is also a pattern; pinching hurts. Simple but effective. Calculus falls right into this as simple as it is. You compared two conditions of the same system and figured out that one is preferable to the other.
Let’s kick the thought process up a little bit, to say, a paycheck. You know you will receive a paycheck at some point, whether weekly, bi-weekly, or monthly. You expect that your paycheck will generally be the same throughout the year. One day you receive your paycheck, and it is lower than what you expect. Regardless of what you do about it, you have noticed that the pattern did not match what you expected. Again, the same system, two different outcomes. One is preferable to the other. Because of how our mind is wired for patterns, you would notice this even if the difference was only 1 cent.
Ah, there is that word difference. That is what we are looking for in analyzing data, a difference. It does not matter what the difference is. The ultimate goal of data analytics and calculus is to find all differences between data ranges, no matter how minute. In general terms, differential calculus compares differences in the same data set.
Calculus can become very complex, bringing in multitudes of different data sets to compare to other data sets. Still, in the end, it is just looking for patterns or anything that breaks the known pattern, much like the paycheck example above.
Let’s follow another path and see how manufacturing uses analytics based on calculus to find hot spots in their business. We will go the route of a project I am currently working on to refine analytics used in the auto parts manufacturing sector. I have a lot of data and insight here as I have poured through pages and pages of this type of production data.
This design is an analytics platform that tracks everything from order entry to end-of-the-line product delivery. I use Elasticsearch, but the platform is inconsequential to the math that is used in the end.
The process starts with order entry. Once the order is entered, the system automatically checks for available raw material to make the requested parts. If the raw materials, in this case, rolls of steel, are available, then the order is created and assigned to a press operator. If the raw materials are unavailable, the system automatically sends an order materials request to the department that handles procuring raw materials. If needed, when the raw materials are delivered, they are entered into the system, and the order shows up as a “new delayed” order which is then assigned to the press operator.
Once the order is created and assigned, the operator will pick it up and scan it in at their machine. They will retrieve the roll of steel and load it onto their machine, scanning the QR code on the roll of steel so the system can track work order times and used material.
This system assumes that all machines are network capable or are using an addon module to access the serial diagnostic port of the machines and retrieve runtime information that way. Many machines give much information about what the machine is doing while it is running. This data is very useful analytics information that can help spot hotspots in production and help automate machine maintenance when needed.
As the parts move around the facility, they will be scanned in and out of the different departments to help track the time each department spends on the order and every other minute piece of data that can be collected. Many companies have to have trained metallurgists on staff to check parts at the microscopic level. With newer analysis equipment, all of this data could also be stored in the analytics platform for deep troubleshooting if needed. Many parts have to go through cycles of heat treatment before they are complete. Heat treatment ovens can also send real-time data streams for analysis.
So, where does analytical calculus come into all of this? In a lot of places, actually.
One use is the ability to track problems with raw materials by using a moving average of defective parts caused by bad material. This data is grouped by the supplier since we are looking for bad suppliers. Analyzed correctly, it could pinpoint one supplier, even down to postulating that this only happens with Supplier #1 in the first week of June every year. Generating this report is nothing more than just taking the averages from June last year and comparing them to every other month in the year. This kind of analytics is basically differential calculus.
Another use is to compare workers times to complete an order. It could easily be shown that one worker is more efficient at a specific machine than another work. This would allow shifting of machine operators to keep most efficient person at each machine. Or even further into what raw materials wear out part dies faster. The list goes on and on, it is only limited by the data you have to work with.
The process is the same for any analytics that compares any range of data against another range of data in this same system. Now, if you want to take your averages and compare them to the averages of another company for some reason, this would become advanced multivariable calculus because you are comparing two similar but different systems.
Math majors, do not fret. I am in no way trying to diminish what you accomplish daily. I understand calculus concepts but am better at finding the functions in whatever programming language I use to do these calculations rather than writing them out manually. It is to you that I tip my hat for providing the knowledge that others have used to write these math functions. The only thing that I am trying to explain is it does not have to be as difficult as it may seem for those wanting to venture into data science.
Here are the tips I have. If you can balance a checkbook, find an average of anything 2+1+3+4 = 10 Average: 10/4= 2.5, or even know the basic math functions of addition, subtraction, multiplication, and division, you can do this. Also, of course, 10/4 is a fraction, see it all works together. If you wanted to calculate this down to a fraction instead of decimal, it is 2 1/2. It is all the same in the end.
The main thing that I pass on to people looking into this is knowing your data and your platform. With the right ideas, platform, and guidance, the only limits are the richness of your data and the capabilities of your platform. Explore your data, find the patterns before you design or choose a platform.
The final note is, most analytics processes are using historical data to find patterns in current data and attempt to explain what might happen tomorrow based on what happened yesterday. The bigger the sample set grows, the better your analytics will be. But always remember, the more complex a system becomes, the more chaotic it becomes, possibly providing misleading information. Keep it lean and clean, and you will go far.
-Analyze your heart out. Everything can be analyzed-