Bimajolt: 2014

Monday, 22 September 2014

WELCOME TO R SIDE OF LIFE

We have many programming languages many build with different capabilities but we can all agree that most languages are multipurpose.(e.g. you can build a simple calculator using all/almost all programming languages), but when it comes to statistical programming R is the King. What is R ? R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. I know you may be bored by this R and S language so this is my deal. I want you to follow up as I do/we do a simple statistical analysis on data obtained from WHO on TB burden estimate. To be specific we will analyze the estimated incidence of all types of TB in Kenya from 1990 to 2012.

This is what we are going to do in steps.

We will download the data from WHO website.
Upload it in R
Do some statistical analysis with the data

Downloading the data

You can download data in R in different formats. This means data contained or saved using other programs such as SPSS and Excel can be easily be imported into R. In this case I would love to download my data directly into my disk and finally upload it to R. Note it’s also possible to download data directly into R (I will show you right now). My data is in CSV format (comma separated variable) which is one of the best formats to work with. Go to https://extranet.who.int/tme/generateCSV.asp?ds=estimates and download the data. (If you have installed R in Windows the default working directory (folder) is Documents so for those who have problem in specifying file path I would recommend you to move that file to Documents). You can get the working directory by typing this command getwd(). You can also set your working directory using this command setwd(“specify directory(folder) path in here”). For those who would love to download the data directly try this. > data.all = read.csv(“https://extranet.who.int/tme/generateCSV.asp?ds=estimates.csv”, header = T, sep = “,”)

Uploading data in R

Now we have our data already downloaded so let us get it loaded into R before we dive in the fun of doing statistical analysis. Before then let us look the data.(Comma separated files (.csv) can be opened using Microsoft Excel or any other spreadsheet program) You would realize it us 4905 rows and 39 columns(All the countries of the world are represented from Afghanistan to Zimbabwe each having 23 rows).

If you have moved your data into your working directory the code is very simple.

> data.all = read.table("TB_burden_countries_2014-09-23.csv", header = T, sep = ",")

If not it’s still simple , assuming you never moved the file from downloads (in windows) you can use this code.

> data.all = read.table("C:/Users/"YOurcompname"/Downloads/TB_burden_countries_2014-09-23.csv", header = T, sep = ",")

Doing statistical analysis on the data

Good now we have our data so let me show you why I call R the king of statistical programming. Since we have downloaded the whole worlds data (TB burden estimates ) . Let us specifically get the Kenyans data from this whole data. For new comers just follow me for those who are already familiar with R I will have to mention that R imports data with many variables as a data frame. If you look carefully you will find that Kenya data is on rows 2284 to 2306. This is how I will get it using this command.

>data.kenya = data.all[c(2284:2306),]

I have used a technique called indexing to put together the rows which contain Kenyan data from data.all and assigned them to the object data.kenya. We can check the dimensions and structure of our new data.kenya using this codes.

>dim(data.kenya)     AND      >str(data.kenya)

We can also view the data using this code >View(data.kenya)

Note, I am using Rstudio data view on large window.

The real work

My objective is to determine whether there is a relationship between years and Estimated number of incident cases (all forms) which is the variable (column) abbreviated as e_inc_num. With all data analysis the first step is always to explore the data. In this case, scatter plots are very useful in determining whether or not the relationships between the variables are linear.

Let us plot a simple scatter plot using this code:

>plot(data.kenya$year, data.kenya$e_inc_num)

Before proceeding further let us attach our data so that we can refer our variables without having to specify the data frame where they come from. I mean

>attach(data.kenya)

We can plot the same scatter plot using this code:

>plot(year, e_inc_num)

The above command gives a simple scatter plot with the first variable on the horizontal axis and the second on the vertical axis.

The scatter plot needs to be beautifully represented just copy paste this code into R and enter I will explain it now.

>plot(year, e_inc_num, main = "Relationship Btn Yrs and No of TB Incidences ", col.main = "red", sub = "Kenya", col.sub = "blue" ,xlab = "Years", ylab = "No of TB Incidences",col.lab = "red", pch = 13, col = "red")

The main = Title of the plot, col.main = the color of main, sub = subtitle of the plot, col.sub = the color of subtitle, xlab and ylab = name of x axis and y axis respectively, col. Lab = color of axis, pch = The shape of the dots(change it to any number), col = color of the dots.

Now that we have our plot let us try to fit a linear model to this data of ours

Fitting linear model

The function lm is used to perform linear modeling in R .Use the code below to create the linear model.

>linear.kenya = lm(e_inc_num~ year)

Displaying the model by typing 'linear.kenya' gives limited information. To get more information, one can look at the attributes of this model, its summary and attributes of its summary.

>attr(linear.kenya,"names")

[1] "coefficients" "residuals" "effects"

[4] "rank" "fitted.values" "assign"

[7] "qr" "df.residual" "xlevels"

[10] "call" "terms" "model

There are 12 attributes. Most of them can be displayed with the summary function.

lm(formula = e_inc_num ~ year)

Residuals:

Min 1Q Median 3Q Max

-25554 -9658 -1761 11272 21500

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -1.037e+07 8.588e+05 -12.08 6.44e-11 ***

year 5.228e+03 4.292e+02 12.18 5.51e-11 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 13650 on 21 degrees of freedom

Multiple R-squared: 0.876, Adjusted R-squared: 0.8701

F-statistic: 148.4 on 1 and 21 DF, p-value: 5.51e-11

The first section of summary shows the formula that was 'called'. The second section gives the distribution of residuals. The pattern is clearly not symmetric. The maximum is too far on the right (21500) compared to the minimum (-25554) and the first quartile is further left(-9658) of the median(-1761) than the third quartile(11272) is. For those who are not statisticians I know this is boring let me leave . Statisticians can go deeper and analyze this results deeper.

Regression line, fitted values and residuals

A regression line can be added to the scatter plot with the following command:

>abline(linear.kenya)

The expected value is the no of expected TB incidences estimated from the regression line using a specific year.

>points(year, fitted(linear.kenya), pch=18, col="blue")

A residual is the difference between the observed and expected value. The residuals can be drawn by the following command.

>segments(year, e_inc_num, year, fitted(linear.kenya), c="pink")

The actual values of the residuals can be checked from the specific attribute of the defined linear model.

>kenyamodelresduals = residuals(linear.kenya)

> kenyamodelresduals

1 2 3 4 5

2467.391 -1760.870 -5989.130 -8217.391 -9445.652

6 7 8 9 10

-10673.913 -9902.174 -8130.435 -4358.696 1413.043

11 12 13 14 15

6184.783 11956.522 16728.261 21500.000 16271.739

16 17 18 19 20

21043.478 15815.217 10586.957 5358.696 -9869.565

21 22 23

-15097.826 -20326.087 -25554.348

The sum of the residuals and the sum of their squares can be checked.

> sum(kenyamodelresduals)

[1] -2.728484e-12

> sum(kenyamodelresduals^2)

[1] 3914228261

The sum of residuals is close to zero whereas the sum of their squares is the value previously displayed in the summary of the model. The distribution of residuals, if the model fits well, should be normal. A common sense approach is to look at the histogram.

>hist(kenyamodelresduals)

Based on the number of observations it can be hard to say that the residuals are normally distributed . I would love to first leave you on this point to make your own conclusions. I would come later to tell how to interpret the results clearly and accurately.

Monday, 18 August 2014

CHILD EDUCATION POLICY

The child education policy is a life insurance product specially designed as a savings tool to provide an amount of money when your child reaches the age for entry into college (18 years and above). The funds can be used to pay for your child's higher education expenses. Under this policy, the child is the life assured, while the parent/legal guardian is the policy owner.

If you opt for a payor benefit rider, the education policy also provides assurance that, in the event of the policy owner's untimely demise, the child will have access to the funds to help finance his or her studies.

Why do you need child education plan?

The cost of higher education is increasing. The need for access to higher education and the cost will put a financial strain on you and your family. That is why it is important to start planning for your child's education as soon as possible, because the earlier you begin, the more time you allow your money to grow. The child education policy will provide the funds needed by your child to pursue further education and assures that whatever happens in the future, your child will still have the means to pursue some of his/her goals in life

When choosing plan:

When choosing a policy, always:

1)Consider how much money you want to set aside for your child’s education.

2)Make sure that the premium is affordable.

3)Choose a policy that gives you flexibility so you can gradually increase the savings in the future
4)Ensure you opt for the payor benefit rider

Basic types of policies

1)Endowment policy

a)An endowment policy combines a savings component with protection coverage

b) Endowment policies may be either participating or non-participating

· Non-participating policies do not participate in the life insurance funds profits but all insurance benefits are fully guaranteed

· Participating policies have a portion of insurance benefits guaranteed, however the total amount of benefits at maturity is not guaranteed because it depends on the insurance company’s life insurance fund’s performance

2)Investment-linked policy

a) An investment-linked policy combines the elements of investment and protection based on your requirement as the policy owner

b) It offers flexibility as you are able to increase or top-up your monthly premium contribution as your income improves. You may also be more aggressive with your investment

c) An investment-linked policy will allow you to choose the types of funds your money will be invested in. However, like any other investment, there are risks involved and there is no guarantee on the returns, which may be higher or lower than the amount projected

Saturday, 21 June 2014

Income Inequality

I want today to discuss about wealth and income inequality here in Kenya, using the same arguments as used in "Capital in the Twenty-First Century". Capital in the Twenty-First Century is a book by French economist Thomas Piketty. It focuses on wealth and income inequality in Europe and the US since the 18th century. It was initially published in French in 2013, with an English translation released in April 2014. Though the book uses data from Europe and US after looking at points represented in it it’s clear that here in Kenya the situation is not very much different and we can comfortably use it to understand the sources and effects of wealth and income inequality.

The general presumption of most inequality researchers has been that earned income, usually salaries, is where all the action is, and that income from capital is neither important nor interesting. Piketty shows, however, that even today income from capital, not earnings, predominates at the top of the income distribution. He also shows that in the past—during Europe’s Belle Époque and, to a lesser extent, America’s Gilded Age—unequal ownership of assets, not unequal pay, was the prime driver of income disparities. And he argues that we’re on our way back to that kind of society. Nor is this casual speculation on his part. For all that Capital in the Twenty-First Century is a work of principled empiricism; it is very much driven by a theoretical frame that attempts to unify discussion of economic growth and the distribution of both income and wealth. Basically, Piketty sees economic history as the story of a race between capital accumulation and other factors driving growth, mainly population growth and technological progress.

Consider how this worked in Belle Époque Europe. At the time, owners of capital could expect to earn 4–5 percent on their investments, with minimal taxation; meanwhile economic growth was only around one percent. So wealthy individuals could easily reinvest enough of their income to ensure that their wealth and hence their incomes were growing faster than the economy, reinforcing their economic dominance, even while skimming enough off to live lives of great luxury.

And what happened when these wealthy individuals died? They passed their wealth on—again, with minimal taxation—to their heirs. Money passed on to the next generation accounted for 20 to 25 percent of annual income; the great bulk of wealth, around 90 percent, was inherited rather than saved out of earned income. And this inherited wealth was concentrated in the hands of a very small minority: in 1910 the richest one percent controlled 60 percent of the wealth in France; in Britain, 70 percent.

No wonder, then, those nineteenth-century novelists were obsessed with inheritance. Piketty discusses at length the lecture that the scoundrel Vautrin gives to Rastignac in Balzac’s Père Goriot, whose gist is that a most successful career could not possibly deliver more than a fraction of the wealth Rastignac, could acquire at a stroke by marrying a rich man’s daughter. And it turns out that Vautrin was right: being in the top one percent of nineteenth-century heirs and simply living off your inherited wealth gave you around two and a half times the standard of living you could achieve by clawing your way into the top one percent of paid workers.

Capital still matters; at the very highest reaches of society, income from capital still exceeds income from wages, salaries, and bonuses. Piketty estimates that the increased inequality of capital income accounts for about a third of the overall rise in US inequality. But wage income at the top has also surged. Real wages for most US workers have increased little if at all since the early 1970s, but wages for the top one percent of earners have risen 165 percent, and wages for the top 0.1 percent have risen 362 percent. Piketty is unconvinced. As he notes, conservative economists love to talk about the high pay of performers of one kind or another, such as movie and sports stars, as a way of suggesting that high incomes really are deserved. But such people actually make up only a tiny fraction of the earnings elite. What one finds instead is mainly executives of one sort or another—people whose performance is, in fact, quite hard to assess or give a monetary value to.

Who determines what a corporate CEO is worth? Well, there’s normally a compensation committee, appointed by the CEO himself. In effect, Piketty argues, high-level executives set their own pay, constrained by social norms rather than any sort of market discipline. And he attributes skyrocketing pay at the top to an erosion of these norms. In effect, he attributes soaring wage incomes at the top to social and political rather than strictly economic forces.

The key point is that when we make the crucial comparison between the rate of return on wealth and the rate of economic growth, what matters is the after-tax return on wealth. So progressive taxation—in particular taxation of wealth and inheritance—can be a powerful force limiting inequality. Indeed, Piketty concludes his masterwork with a plea for just such a form of taxation. Unfortunately, the history covered in his own book does not encourage optimism.

Piketty ends Capital in the Twenty-First Century with a call to arms—a call, in particular, for wealth taxes, global if possible, to restrain the growing power of inherited wealth. It’s easy to be cynical about the prospects for anything of the kind. But surely Piketty’s masterly diagnosis of where we are and where we’re heading makes such a thing considerably more likely. So Capital in the Twenty-First Century is an extremely important book on all fronts. Piketty has transformed our economic discourse; we’ll never talk about wealth and inequality the same way we used to.

The central thesis is that wealth will concentrate if the rate of return on capital (r) is greater than the rate of economic growth (g). Over the long term, Piketty argues, this will lead to the concentration of wealth and economic instability. Piketty proposes a global system of progressive wealth taxes to help create greater equality and avoid the vast majority of wealth coming under the control of a tiny minority. The central thesis of the book is that inequality is not an accident, but rather a feature of capitalism, and can only be reversed through state interventionism. The book thus argues that unless capitalism is reformed, the very democratic order will be threatened. Piketty proposes that an annual global wealth tax of up to 2%, combined with a progressive income tax reaching as high as 80%, would reduce inequality.

Friday, 11 April 2014

Greatest Discovery in Mathematics.

Mathematics, this 11 letters word brings trouble to most of people in the world we live today. If you ask 10 people if they hate mathematics there is high chance that more than 6 of them will respond with a big capital YES. Now if we all go back in time to 1500 years ago all of us will hate mathematics except few geniuses (am saying 0.0000001 of current around 7bn world population around 700 individuals in the whole world). I want to explain this and one wonderful discovery that changed the whole situation to mathematics which now an average person can easily learn.

We can do maths now

What mathematical discovery more than 1500 years ago: Is one of the greatest, if not the greatest, single discovery in the field of mathematics? Which involved three subtle ideas that eluded the greatest minds of antiquity, even geniuses such as Archimedes? Was fiercely resisted in Europe for hundreds of years after its discovery? Even today, in historical treatments of mathematics, is often dismissed with scant mention, or else is ascribed to the wrong source? The answer is our modern system of positional decimal notation with zero, together with the basic arithmetic computational schemes, which were discovered in India about 500 CE.

As the 19th century mathematician Pierre-Simon Laplace explained:” It is India that gave us the ingenious method of expressing all numbers by means of ten symbols, each symbol receiving a value of position as well as an absolute value; a profound and important idea which appears so simple to us now that we ignore its true merit. But its very simplicity and the great ease which it has lent to all computations put our arithmetic in the first rank of useful inventions; and we shall appreciate the grandeur of this achievement the more when we remember that it escaped the genius of Archimedes and Apollonius, two of the greatest men produced by antiquity.”

As Laplace noted, the scheme is anything but “trivial,” since it eluded the best minds of the ancient world, even extraordinary geniuses such as Archimedes. Archimedes saw far beyond the mathematics of his time, even anticipating numerous key ideas of modern calculus and numerical analysis. He was also very skilled in applying mathematical principles to engineering and astronomy. Nonetheless he used the traditional Greek-Roman numeral system for performing calculations. It is worth noting that Archimedes’ computation of π was a tour de force of numerical interval analysis performed without either positional notation or trigonometry.

Archimedes never nailed it!!

This is not true old man!

Perhaps one reason this discovery gets so little attention today is that it is very hard for us to appreciate the enormous difficulty of using Roman numerals, counting tables and abacuses. As Tobias Dantzig (father of George Dantzig, the inventor of linear programming) wrote, Computations which a child can now perform required then the services of a specialist, and what is now only a matter of a few minutes [by hand] meant in the twelfth century days of elaborate work. Michel de Montaigne, Mayor of Bordeaux and one of the, most learned men of his day confessed in 1588 (prior to the widespread adoption of decimal arithmetic in Europe) that in spite of his great education and erudition, “I cannot yet cast account either with penne or Counters.” That is, he could not do basic arithmetic.

In a similar vein, at about the same time a wealthy German merchant, consulting a scholar regarding which European university offered the best education for his son, was told the following: If you only want him to be able to cope with addition and subtraction, then any French or German university will do. But if you are intent on your son going on to multiplication and division—assuming that he has sufficient gifts—then you will have to send him to Italy.

The best source currently available on the history of our modern number system is by French scholar Georges Ifrah . He chronicles in encyclopedic detail the rise of modern numeration from its roots in primitive hand counting and tally schemes, to the Babylonian, Egyptian, Greek, Roman, Mayan, Indian and Chinese systems, and finally to the eventual discovery of full positional decimal arithmetic with zero in India, and its belated, kicking-and-screaming adoption in the West. Ifrah describes the significance of this discovery in these terms: Now that we can stand back from the story, the birth of our modern number-system seems a colossal event in the history of humanity, as momentous as the mastery of fire, the development of agriculture, or the invention of writing, of the wheel, or of the steam engine.

Indeed, the development of this system hinged on three key abstract (and certainly non-intuitive) principles:

(a) The idea of attaching to each basic figure graphical signs which were removed from all intuitive associations, and did not visually evoke the units they represented;

(b) The idea of adopting the principle according to which the basic figures have a value which depends on the position they occupy in the representation of a number; and

(c) The idea of a fully operational zero, filling the empty spaces of missing units and at the same time having the meaning of a null number.

It is astonishing how many years passed before this system finally gained full acceptance in the rest of the world. There are indications that Indian numerals reached southern Europe perhaps as early as 500 CE, but with Europe mired in the Dark Ages, few paid any attention. Similarly, there is mention in Sui Dynasty (581–618 CE) records of Chinese translations of the Brahman Arithmetical Classic, although sadly none of these copies seem to have survived. The Indian system (also known as the Indo-Arabic system) was introduced to Europeans by Gerbert of Aurillac in the tenth century. He traveled to Spain to learn about the system first-hand from Arab scholars, prior to being named Pope Sylvester II in 999 CE. However, the system subsequently encountered stiff resistance, in part from accountants who did not want their craft rendered obsolete, to clerics who were aghast to hear that the Pope had traveled to Islamic lands to study the method. It was widely rumored that he was a sorcerer, and that he had sold his soul to Lucifer during his travels. This accusation persisted until 1648, when papal authorities reopened Sylvester’s tomb to make sure that his body had not been infested by satanic forces.

The Indo-Arabic system was reintroduced to Europe by Leonardo of Pisa, also known as Fibonacci, in his 1202 CE book Liber Abaci. However, usage of the system remained limited for many years, in part because the scheme continued to be considered “diabolical,” due in part to the mistaken impression that it originated in the Arab world (in spite of Fibonacci’s clear descriptions of the “nine Indian figures” plus zero). Decimal arithmetic began to be widely used by scientists beginning in the 1400s, and was employed, for instance, by Copernicus, Galileo, Kepler and Newton, but it was not universally used in European commerce until after the French Revolution in 1793 . In limited defense of the Roman system, it is harder to alter Roman entries in an account book or the sum payable in a cheque, but this does not excuse the continuing practice of using Roman numerals and counting tables for arithmetic. The Arabic world, by comparison, was much more accepting of the Indian system—in fact, as mentioned briefly above; the West owes its knowledge of the scheme to Arab scholars.

So who exactly discovered the Indian system? Sadly, there is no record of the individual who first discovered the scheme, who, if known, would surely rank among the greatest mathematicians of all time. The very earliest document clearly exhibiting familiarity with the decimal system is the Indian astronomical work Lokavibhaga (“Parts of the Universe”). Here, for example, the number 13,107,200,000 is written as panchabhyah khalu shunyebhyah param dve sapta chambaram ekam trini cha rupam cha (“five voids, then two and seven, the sky, one and three and the form”), i.e.0 0 0 0 0 2 7 0 1 3 1, which, when written in reverse order, is 13,107,200,000. One section of this same work gives detailed astronomical observations that confirm to modern scholars that this was written on the date it claimed to be written: 25 August 458 CE (Julian calendar). As Ifrah points out, this information not only allows us to date the document with precision, but I also proves its authenticity. Methods for computation were not explicitly mentioned in this work, although the frequent appearance of large numbers suggests that advanced arithmetic schemes were being used. Fifty-two years later, in 510 CE, the Indian mathematician Aryabhata explicitly described schemes for various arithmetic operations, even including square roots and cube roots, which schemes likely were known earlier than this date. Aryabhata’s actual algorithm for computing square roots, as described in greater detail in a 628 CE manuscript by a faithful disciple named Bhaskara I, . Additionally, Aryabhata gave a decimal value of π = 3.1416 . From these and other sources there can be no doubt that our modern system of arithmetic—differing only in variations on the symbols used for the digits and minor details of computational schemes—originated in India at least by 510 CE and quite possibly by 458 CE. The Greatest Mathematical Discovery? David H. Bailey ∗ Jonathan M. Borwein † May 12, 2010.

Adapted from http://www.davidhbailey.com/dhbpapers/decimal.pdf written by David H. Bailey and Jonathan M. Borweiny

Monday, 22 September 2014

WELCOME TO R SIDE OF LIFE

Downloading the data

Uploading data in R

Doing statistical analysis on the data

The real work

Fitting linear model

Regression line, fitted values and residuals

Monday, 18 August 2014

CHILD EDUCATION POLICY

Saturday, 21 June 2014

Income Inequality

Friday, 11 April 2014

Greatest Discovery in Mathematics.

Featured Posts

Featured Posts

Featured Posts

Blog Archive