Data Management and Visualization - Module 3
Leave out missing data from the evaluation:
I've created new culomn 'RankOfIncome' for making category for the range of 'incomeperperson':
Ranks:
A: incomeperperson > 10000
B: incomeperperson in 5000 - 10000
C: incomeperperson < 5000
Replace empty cells with category (nan -> 0):
Statistical analysis of variables A, B, C:
Recoding C->A, A->C into a new culomn 'RankOfIncomeReversed':
Statistical analysis of the reversed variables A, C:
I did the same evaluation for 'urbanrate' where I categorized as follows:
urbanrate > 66% : 'A'
urbanrate in 33-66% : 'B'
urbanrate < 33% : 'C'
As 3rd variable I did the same evaluation for 'lifeexpectancy' where I categorized as follows:
lifeexpectancy> 75 : 'A'
lifeexpectancy in 60-75 : 'B'
lifeexpectancy < 60 : 'C'
Summary:
I made evaluation on the income and ranked it to 3 categories. I’ve categori A a for more than 10,000, B for between 5 and 10,000 and C for less than 5000. I've created similar statistics on urbanrate and lifeexpectancy as described above.
Once I created these variables I could make evaluation on them so I defined the number of the elements which falls into one category and also created the percentage of them.
Next thing I did based on the sample I replaced the categories order (income ) so I replaced A with C so now in the new column A means the less category of the incomes and C means the highest category of the income. Afterwards I’ve calculated the percentage the number of them.













