Scorecard : Data Visualization – Banking Case

Scorecard : Data Visualization – Banking Case

标签(空格分隔): Scorecard


Microsoft Excel – use Secondary Axis to Create Two Y Axes

Analytics Lab

Malcolm Gladwell in his book ‘Outliers‘ says that it takes approximately 10,000 hours of training or practice for anybody to master a subject. There is nothing in the world that could beat hands-on practice for learning. Keeping this in mind let me start a new series on YOU CANalytics called ‘Analytics Lab’ where we will practice some of the case studies we have discussed in the past on YOU CANalytics. We will use R and Microsoft Excel (both easily available tools) for this.

It’s a bird…It’s a plane…It’s Excel man! Did I ever tell you what a super hero Excel is. In my opinion Excel is the best tool to start learning advanced analytics! Today we will start the ‘Analytics Lab’ using concepts we have discussed in the banking case study. This time we will create a two Y axes plot by using the secondary axis function in Excel. In the next article we will create the same plot on R.

Banking Case

Recall the banking case: in that you were playing the role of the Chief Risk Officer (CRO) for CyndiCat bank. The bank had disbursed 60816 auto loans in the quarter between April–June 2012. Additionally, you had noticed around 2.5% of overall bad rate for these loans. The idea was to identify customer segments with distinct bad rates. Bad rate, by the way, is percentage of customer defaulted on their payments. You did some exploratory data analysis (EDA) using tools of data visualization and found a relationship between age with bad rates (Part 1). If you recall, you have observed the following age groups wise population histogram and bad rate trend.

Presentation1
Presentation1

Using Secondary Axis to Create Two Y Axes in Excel

A good practice is to present these two graphs in a single chart. This is useful because you and your audience don’t need to jump from one graph to another during analysis. Let me present the data we are going to use for our purpose. I recommend you copy past this data on Excel and evaluate for yourself the merits of a single chart in comparison to two.

Age Groups Number of Loans Bad Loans Good Loans Bad Rate
21-24 310 14 296 4.5%
24-27 511 20 491 3.9%
27-30 4000 172 3828 4.3%
30-33 4568 169 4399 3.7%
33-36 5698 188 5510 3.3%
36-39 8209 197 8012 2.4%
39-42 8117 211 7906 2.6%
42-45 9000 216 8784 2.4%
45-48 7600 152 7448 2.0%
48-51 6000 84 5916 1.4%
51-54 4000 64 3936 1.6%
54-57 2000 26 1974 1.3%
57-60 788 9 779 1.1%

The last column i.e. ‘Bad Rate’ is the percentage of ‘Bad Loans’ to ‘Number of Loans’ i.e. for the age group 21-24 it is 14/310=4.5%.

Slide1
Slide1

The first thing you need to do is create a bar plot of the first two columns i.e. ‘Age Groups’ and ‘Number of Loans’. Use Column in Charts within Insert tab (as shown above) to create a vertical bar plot. The default excel bar plot will look different. I have done some additional formatting for the bar plot to make it appropriate for my purpose. You could leave the default excel format for most part. The only thing you want to change is the color of the bars from default excel blue to something mellow like grey.

Slide3
Slide3

The next step is to add the trend line for the last column i.e. ‘Bad Rate’ on top of the above chart. For this, right click you mouse anywhere on top of the above bar chart in Excel and then choose Select Data… You will notice a pop up window similar to the one shown below

Slide4
Slide4

In the above pop-up window click on Add below Legend Entries (Series). You will get a new pop-up window similar to the one shown below. For series name choose the cell $E$1 (the cell that contain the name ‘Bad Loans’) and for series values select cells $E$2 to $E$14 i.e. the values of ‘Bad Rate’ for all the ‘Age Groups’.

Slide5
Slide5

Once you will click OK in the above Edit Series pop-up window, you will get additional bars for ‘Bad Loans’. Notice, these bars are really tiny in comparison to ‘Number of Loans’ bars.

![Presentation4(https://i1.wp.com/ucanalytics.com/blogs/wp-content/uploads/2014/05/Presentation4.jpg)

Select any of these tiny bars for ‘Bad Rate’ by a mouse click. Then right click your mouse and choose Format Data Series… You will see a pop-up window similar to the one shown below.

Slide6
Slide6

In the Plot Series On section choose Secondary Axis and press Close in the pop-up window. You will get a bar plot similar to the one shown below. Notice the secondary Y Axis at the right side of the plot with ‘Bad Rate’. Remember, I have re-formatted the default Excel plot so your plot will have different appearance. Now we are left with just one final thing.

Presentation2
Presentation2

We need to now convert the above orange bars for ‘Bad Rate’ to a line plot. This is really simple, click on the orange ‘Bad Rate’ bars and convert them to line by choosing **Line ** in Charts from Insert tab

Slide2
Slide2
Presentation3
Presentation3

Great! We are done. You might appreciate how useful it is to present these two plots in one single chart with 2 Y axes or the secondary axis in Excel. In the above chart, it is much easier to see where you have thinner data and modify your bad rates accordingly. Now, you could easily create coarse classes as discussed in the banking case study.

2 Axes Risk Plots in R

Analytics Lab – R

Welcome back to Analytics Lab on YOU CANalytics! In our last article, we learned the procedure to visualize risk across a parameter (age groups) in Excel. In this part we will generate the same visualization in R. This lab exercise is a part of the banking case study we have previously worked on (you will find links to the banking case series at the bottom of this article).

Banking Case Study

Recall the banking case: in that you were playing the role of the Chief Risk Officer (CRO) for CyndiCat bank. The bank had disbursed 60816 auto loans in the quarter between April–June 2012. Additionally, you had noticed around 2.5% of overall bad rate for these loans. The idea was to identify customer segments with distinct bad rates. Bad rate, by the way, is percentage of customer defaulted on their payments. You did some exploratory data analysis (EDA) using tools of data visualization and found a relationship between age with bad rates Part 1. If you recall, in the previous article we have generated the following plot on Excel. The plot collectively depicts age groups wise population distribution and bad rate trend. As mentioned above this time we will create a similar plot on R.

Presentation3
Presentation3

Pep Talk before Jumping into R

If this is the first time you are using R & R studio then don’t get intimidated with this brilliant tool for analysis – you will love it once you get familiar with it. R is a language and computing environment for statistics, analysis, data mining, and graphics. If you have used Excel functions (such as =vlookup(), =sum() etc.) then you will find R commands / functions somewhat similar.

R & R Studio

First of all (if you don’t have R and R Studio), you need to download R (link) and R Studio (link) from the given links and install them on your computer. By the way R Studio is an excellent free editor to work with R language. You will find good documentation for R Studio at this link. Additionally you will need the following CVS (data) file to work on this example.

click on the link to download the file

Moreover, you can find the entire R code in the following text file. But wait and read on before you check out this text file.

data<-read.csv("YOU CANalytics R Visualization.csv") #change the directory name#
x <- data$Age.Group
y1 <- data$Number.of.Loans
y2 <- data$Bad.Rate
par(mar=c(5,5,2,5)+.1)
barplot(y1,col="grey",border=0,names.arg=x,angle=45,xlab="Age Groups",ylab="Number of Loans",cex.lab=1.7, cex.main=1.7, cex.sub=1.7,cex.axis=1.2,cex.names=1.2)
par(new=TRUE)
plot(y2,type="l",col="Orange",lwd=5,xlab="",ylab="",xaxt="n",yaxt="n")
axis(4,cex.axis=1.2)
mtext("Bad Rate",side=4,line=3,cex=1.7)
legend("topright",col=c("orange"),lty=1,lwd=5,legend=c("Bad Rate"),cex=1.25)

R Coding

I recommend that you copy and paste the individual lines of code given below in R Studio console (located at the bottom left of R Studio panel as highlighted in the below picture), and execute them one-by-one. In R (unlike C, Java, and other similar languages) one doesn’t need to compile the entire code to produce results – each command in R is executed independently.

R Studio
R Studio

The first line of the code is importing or reading the CVS file in the R environment. You will need to modify the path of the CVS file based on the location of the file on your computer. The extracted CVS file is named ‘data’ in the R environment.

data<- read.csv("YOU CANalytics Visualization R.csv")

Now in the next three lines of code we are going to assign variable names x, y1 and y2 to ‘Age Groups’, ‘Number of Loans’, and ‘Bad Rate’. These three variables are present in the ‘data’ file. Also, you could view this data in R using the command View(data)

x <- data$Age.Group
y1 <- data$Number.of.Loans
y2 <- data$Bad.Rate

‘Par’ function in R is used to realign plots. In this case mar operator is used to realign margins in the subsequent plot(s). You should play around with the numbers in the function to see the change in the plot, although you will see the impact of this command after you plot the bar plot through the next line of code. By changing the values inside ‘c()’ plots will move according to c(bottom, left, top, right) on the graph canvas.

par(mar = c(5,5,2,5) +.1)

Now, we will plot our first bar plot using barplot function. Notice you could create a bare bones bar plot using this command barplot(y1, names.arg=x). The other arguments within the barplot function in the following command are used to produce cosmetic effects in the plot. I recommend that you play around with the other arguments in the barplot function.

barplot(y1, col="grey", border=0, names.arg=x, angle=45, xlab="Age Groups", ylab="Number of Loans", cex.lab=1.7, cex.main=1.7, cex.sub=1.7, cex.axis=1.2, cex.names=1.2)

The following plot is generated using the above command.

Rplot
Rplot

Again we will use ‘par’ function to overlay a line plot on top of the bar plot. We will use the following command:

par(new=TRUE)

Now, we will draw a line plot using the following command. This new plot will be overlaid on top of the previous bar plot because of the above par statement. In the plot command type=”l” argument tells R to draw a line graph, with orange color (col=”Orange”) and thicker line width (lwd=5).

plot(y2, type="l", col="Orange", lwd=5, xlab="", ylab="", xaxt="n", yaxt="n")

The following plot will be generated using the above set of commands.

Rplot01
Rplot01

Penultimately, we need to format the new Y axis and label it “Bad Rate”. This is precisely what we are doing through the following two commands.

axis(4, cex.axis=1.2)
mtext("Bad Rate",side=4, line=3, cex=1.7)

The result of the above two commands is labels for the Y axis for ‘Bad Rate’on the right as shown below.

Rplot03
Rplot03

Ultimately, we will place a legend on the top right corner of the graph to make it more friendly to read. We will do it using the legend function as shown below.

legend("topright", col=c("orange"), lty=1, lwd=5, legend= c("Bad Rate"), cex=1.25)

This has completed our exercise and we have the final plot that we were looking for.

Rplot04
Rplot04
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 200,302评论 5 470
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 84,232评论 2 377
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 147,337评论 0 332
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 53,977评论 1 272
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 62,920评论 5 360
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,194评论 1 277
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,638评论 3 390
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,319评论 0 254
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,455评论 1 294
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,379评论 2 317
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,426评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,106评论 3 315
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,696评论 3 303
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,786评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,996评论 1 255
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,467评论 2 346
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,043评论 2 341

推荐阅读更多精彩内容

  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 9,298评论 0 23
  • 今天是中秋,不想读尼采,感觉自己今天读的书不符合自己的主题,但是今天过节不是么?当作是自己读书的瓶颈挑战吧(偶像的...
    白小二阅读 174评论 0 0
  • 曾经在Dreamore上有过这样一个项目,一个用手机记录生活的影展,《飘流手机:168小时生活影展》,发起人是西芹...
    Dreamore阅读 270评论 0 2
  • 上一章:小小二三事(六) 目录 当时说不感动也是假的,没想到雷公电母还能做到如此地步。苏小小打开门邀请雷公电母进小...
    蜜豆喵阅读 400评论 0 1
  • 建议学习时长:60分钟学习方式:了解 学习目标 了解软件生产的各个周期。 了解软件生产中常用的工具。 详细介绍 什...
    知行社阅读 1,526评论 0 4