R Basics – R Programming Language Introduction


I love programming languages. I started a long time ago with BASIC for my Timex 1000. Then I moved on to Pascal, some Assembler, really began to take off in Java, and graduated with Python and Ruby. I have looked into different languages like C, Lisp, and Perl, but as my chores in the company I worked for kept me from learning more and dwelling deep, I started to leave than languages behind.

Late on 2014 I came back to code as my need to dwell on Data Analytics began to take off. I could do much of the Statistics in EXCEL, but I couldn’t pass the opportunity to add some geeky collaterals with code of my own. The analysis I started to make were difficult enough that I could explain IT why EXCEL wasn’t enough. First, I really needed to move big quantities of data, enough to render EXCEL useless. Second, I really needed to apply enough statistical formulas to make the IT people give up (most business programmers I know are terrible at math…)

That’s when I came across two new finds: the Python Pandas library and the R language. Pandas is big and frightening, but I found several R courses in www.udemy.com and decided for a free small seminar on R called R Basics – R Programming Language Introduction by Martin Heissenberger.

Martin is both a Scientist and a Biostatistician. Martin not only knows, he knows enough to make my data analysis for selling more shoes look like child play. Martin knows Statistics, and above all, he knows R from a Statistician point of view, not a programmer. And I say this because Mr. Martin gave me something precious: a programming language where the main challenge is not learning about control flow, classes and syntax, but rather about easily solving Statistic problems. It has been a first time for me. Dwelling into R opened a new panorama I had never explored before.

In case you never heard about R let me give you a quick intro blatantly copied from Wikipedia. R is a free software programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.Polls and surveys of data miners are showing R’s popularity has increased substantially in recent years.

R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. S was created by John Chambers while at Bell Labs. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S.

R is a GNU project. The source code for the R software environment is written primarily in C, Fortran, and R. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. R uses a command line interface; however, several graphical user interfaces are available for use with R.

The course I would recommend lives here. Most free courses in ww.udemy.com are short versions of longer courses that you can breeze through in an hour to get a general feeling of the topic. This is not the case with R Basics. The course will not only get you started, but it is complete enough to put you behind R Visual Studio coding, solving data problems and generating graphs like a pro. The power of R is impressive. You can read a flat file with a million rows of data (something I can not do in Excel) and start analysing statistic formula in not time with simple commands. No need to highlight ranges. R knows intuitively what you want because the language operates like a mathematician would. You are analysing and solving data problems instead of fighting the language and reviewing where you misplaced a curly bracket (Java) or left a missing white space (Python). You are treating data frame structures and looking for null hypothesis instead of trying to get your collections to correctly iterate. If anything, you will be looking at your Statistics book, not your R manual.

The way Martin Heissenberger approaches teaching feels different. This is not a language teacher. He feels like a math teacher guiding you along this nifty tool called R to help you get much more done. After the 19 lessons you feel like you can accomplish a lot. Take a week to learn R with this tutorial and you could probably get some very sophisticated analysis done. Take a week to learn the powerful Pandas library and you will be wresting the Python language underneath. You can learn Python in a week, but you will not master Python in a week. Python is strong and well suited for math, but it is a broader language. Everything in R feels so much more focused.

I have now moved to a new course by Martin called R Level 1 – Data Analytics with R. This is an in-depth walkthrough of the language, with many hours of lecture and a bit more complicated. It goes beyond Statistics and into more general math topics, such as matrixes and the like. It also involves functions, control flow and other generalities of R. It’s the kind of web seminar you take over the course of several weekends, but so far I am having a blast getting a lot done and learning at a much faster pace than other languages. R is winning over my heart and Martin Heissenberger has a lot to do with it. He not only teaches a subject, he teaches purpose, and that has made R much more accessible and easy take on from the very beginning.

Do yourself a favour a visit Martin’s course at https://www.udemy.com/r-basics/#/

I guarantee you, you will not be disappointed.


Solving Euler Project Problem 18


I love problem 18 of the Euler Project. Those not familiar only need to know the Euler Project is a web page filled with mathematical riddles best solved by programming methods (although I imagine you can still use pencil and paper, and some people do.) I never got far, since the complexity of the problems usually outdid my own set of abilities, both in math and code.

Problem number 18 was the first problem in my life where I felt I made the jump from mere algorithm to incredible smart-algorithm. It was the first time I witnessed how a simple change of scope could turn up an elegant solution that made immense number crunching possible.

The problem is as follows:

By starting at the top of the triangle below and moving to adjacent numbers on the row below, the maximum total from top to bottom is 23.


7 4

2 4 6

8 5 9 3

That is, 3 + 7 + 4 + 9 = 23.

Find the maximum total from top to bottom of the triangle below:


95 64

17 47 82

18 35 87 10

20 04 82 47 65

19 01 23 75 03 34

88 02 77 73 07 63 67

99 65 04 28 06 16 70 92

41 41 26 56 83 40 80 70 33

41 48 72 33 47 32 37 16 94 29

53 71 44 65 25 43 91 52 97 51 14

70 11 33 28 77 73 17 78 39 68 17 57

91 71 52 38 17 14 91 43 58 50 27 29 48

63 66 04 68 89 53 67 30 73 16 69 87 40 31

04 62 98 27 23 09 70 98 73 93 38 53 60 04 23

NOTE: As there are only 16384 routes, it is possible to solve this problem by trying every route. However, Problem 67, is the same challenge with a triangle containing one-hundred rows; it cannot be solved by brute force, and requires a clever method! ;o)

Before that, I would brute-force all solutions since in my mind, that was for computers were for. Elegant solutions were for computer scientists working in FORTRAN… Yes, I was young and naive.

For the general public, yes, you can pretty much brute-force this solution but it becomes painful if the pyramid grows to 100 tiers, and near impossible using today’s computers if the pyramid has 1,000 tiers. The right algorithm can handle millions of tiers, and I suppose there is a limit depending on your choice of compiler, language, or OS.

Back to the problem, common sense would dictate to start at the top, add either the left or right number, and proceed in a loop. And this is exactly what I thought before I lost myself for days in messy code, out of bound arrays, and the complication of having to deal with a pyramid object as a two-dimensional array which is non-existing in Python. And no, at that time I had no idea bumpy has a matrix structure.

Now, back to the problem at hand, I kept trying to start from the pyramid top, and work my way down. Makes sense no? After all that´s what the problem says. Keeping track of the route was messy. And don’t even mention that the number of iteration grew 15! which is a very big number, and if you don’t believe me, Google it since some very smart person did a Javascript calculator that does really big factorial numbers.

After repeating, changing to C compiler, and watching my computer crash repeatedly, I gave up. How could someone come-up with an algorithm smart enough to get around the brute force method? I admit it. I am not smart enough. I grew stressed. I needed to solve the problem.

No big breakthrough for me. I cheated and found a way to explore pyramid numbers in the web. I did not copy code, I actually read the mathematical explanation and worked my way from there. Is all about reduction. You start from the bottom. The bottom of an unknown pyramid might be big, but it’s still is a finite number. It doesn’t grow as a factorial, the branches don’t explode but implode and diminish. The critical path of summation is given by any two objects and the one immediately above and in the middle. You have two numbers below, lets call them A and B. And you have a number above, say C. If A+C is bigger than B+C, stick A+C into C and keep reducing. After one iteration you reduced the lowest tier of N elements into the upper tier, now the sum of the biggest paths with N-1 elements. Continue reducing tiers of N-1 elements and pretty soon you will have only one possible reduction of the two lower elements with only one possible number at the top. The maximum sum of the two below with the one above is the right answer.

Pretty ingenious. Very simple and elegant. It would make beautiful code if Python had matrix data structures and my multiply nested Lists wouldn’t look so… Pythonish?

This is the source code for the solution. I have seen many algorithms, some involving just three or four lines of R, J, C or ruby. Mine is a bit longer.

# To solve the problem, the first thing is to load our pyramid of numbers
# Of couse, we know the size so we will simply recreate a pyramid 15 x 15
# I must admit that Pythons lack of a matrix structure forces us to sort of improvise with a nested list

myList = [75,95,64,17,47,82,18,35,87,10,20,4,82,47,65,19,1,23,75,3,34,88,2,77,73,7,63,67,99,65,4,28,6,16,70,92,41,41,26,56,83,40,80,70,33,41,48,72,33,47,32,37,16,94,29,53,71,44,65,25,43,91,52,97,51,14,70,11,33,28,77,73,17,78,39,68,17,57,91,71,52,38,17,14,91,43,58,50,27,29,48,63,66,4,68,89,53,67,30,73,16,69,87,40,31,4,62,98,27,23,9,70,98,73,93,38,53,60,4,23]
pyramid = [[0 for j in range(15)] for i in range(15)]

i=0 #the ubiquitous counter
#x=0 #initial step in the pyramid
var_y = 1 #the steps of the pyramid are given and fixed, but the rows will vary from top (1) to bottom (15)

for x in range(0, 15):
    for n in range(0, var_y):
        #print "pyramid [",x,"]","[",n,"]"," = ",myList[i] // uncomment for debugging purposes
        pyramid[x][n] = myList[i]
        i = i + 1
    var_y = var_y + 1

# The secret here is to cycle from the bottom up, not the top down, because we know the base lenght and the tier height, so the maximum sum path
# gets smaller, not bigger, with each step we rise. We are sort of flattening the pyramid with succesive sums of highest results
var_y = 14

for x in range(14,0,-1):
    for y in range(0, var_y):
        if (pyramid[x-1][y] + pyramid[x][y]) > (pyramid[x-1][y] + pyramid[x][y+1]):
            pyramid[x-1][y] = pyramid[x-1][y] + pyramid[x][y]
            pyramid[x-1][y] = pyramid[x-1][y] + pyramid[x][y+1]
    var_y = var_y - 1
print pyramid[0][0]

And the answer is 1074.


Book Review: Enterprise Sales and Operations Planning


Enterprise Sales and Operations Planning is a special book for me to review. First and foremost, it was given to me by a colleague as part of an exercise to implement a better Sales and Operations Planning process in our company. Little did that person know about my propensity to read books and later review them in my blog, especially anything related to business books. Several books were given to several people that day. I gave this particular author plenty of time to digest the ideas, processes presented, and above all the way the story unfolds.

Enterprise Sales and Operations Planning is a dry subject, make no mistake. Not a bit as charming as marketing strategy, not as alluring as financial ops, nor even close to the delicacies of system architecture implementation. On its very core, is the process of carefully aligning sales expectations to actual production and sourced capacity, with the hopes of delivering to the customer on a timely fashion to the exact specification and trying to maintain the costs involved within certain boundaries. This is not the book’s definition, it is my very own. To the author SOP is all about organized common sense. This is true except for the well-known fact that common sense is usually the less common of all senses.

SOP BookThe author tries to get his point across by telling a story. The story is about a fictional company with a fictional General Manager whose new position is a company manufacturing industrial goods. The company itself is underperforming. Sales are not necessarily bad, but the communication between sales, manufacturing, planning and purchases is poor. Thus the production line ineffective and deliveries broken; the company suffers from high inventories, low customer service and everyone generally blames it on everybody else.

This is not a situation atypical to many companies. Forecasting is tricky. Coordinating sales and manufacturing is tricky. I know not of a company that sells things that doesn’t have the problem. That is where SOP comes in. It is an organized process to align demand planning with manufacturing in order to deliver on promises to customers. In the story plot, a consulting firm helps the company establish a SOP process and the theory is explained thru the narrative as the GM and consultants try to get buy-in from the management team in a series of seminars. Albeit somewhat simplistic, it is an easier way to explain a subject which by itself is dull no matter how important to an organization as a success factor.

And that is the real problem here. Sales and Operations Planning is very important. It is also a very lengthy topic. If this book is big, it takes a much bigger book to explain the intricacies of forecasting models, purchase, procurements, systems to tie all together, demand planning versus forecasting, etc. Maybe that is why the author prefers to touch the main parts of the process and leave the very mechanics to others. In the narrative, the consultants give ample references to technical books for additional reading in obscure topics as master planning in manufacturing and such.

Sales and Operations Planning

Sales and Operations Planning

The process proposed is broken down into four review cycles that include product, demand, supply and financial. Everything starts from an 18-month sales forecast which is more a demand plan. The process also involves a lot of trust into management agreeing into true practical deliverables and the commitment to compensate surges in demand and delivery times with the consequential resources needed to achieve them.

I get the sensation the methodology and processes work better in industrial than commercial organizations. I might be wrong, but the narrative does take place in a factory after all. The roles of marketing and retail are minimized to be almost non-existent. Yes, for the most part all of us involved in operations and sales work aside marketing and other similar departments, but the role feels so alienated that the this-is-for-manufacturing-companies feeling is elevated.

The overall process flow is well-designed and straightforward to follow. Anyone reading the book will find the strategic implementation easy. However, don’t expect fool-proof solutions. The book uses guidelines to work with forecasts, but not one forecasting method is explained. The same accounts for all other functions. The process is more about corporate glue (tying all functions together so they all work in harmony) as opposed to specific methods on any particular issue. You get a scaffold: a skeleton process where you can layer your own, but there will be lots of areas to cover in any given implementation where the team itself will have to make-up for.

This is not necessarily an evil. As I said, there is ample reference to secondary bibliography on the less abstract and more complicated matters of production, production scheduling, etc. The book is all about SOP process and not SOP processes that make-up the SOP process. But many will be feeling the lack of more practical guidelines.

George Palmatier does an effective job in delivering a framework. Inside the plot the consultants are clear when they confess that the whole process is one where implementation should be rather quick avoiding delays, and results should be forthcoming in no more than three ninety-day cycles. The faster the system is in place and people are coordinating their needs, the faster the company will reap the rewards. How you fill-in the gaps will make the difference on the SOP scaffold. The book sometimes feels more about company communication than anything else, because the different characters shape themselves with immense differences and friction among different team members. Some people will relate and it’s very hard not to draw parallels to one’s own company. But at times the plot seems rigid and only distracts from the theory behind SOP.

Overall this is a nice book to have in your repertoire if only very directed in its topic of interest. Engineers, accountants and others working for manufacturing companies will have no trouble grasping the issue. For other types the topic remains dense, but that does not diminish the importance of Sales and Operations Planning.