MSExcel, despite its bad name for data analysis, can still be a very good tool to reach out to for one-off analysis jobs, especially when the output needs to be interactive. After all, nothing comes close to the reactivity – as in reactive programming – simple MSExcel formulae provide that do not require hardcore development skills.1
Getting our hands dirty
Let’s say the requirement is to rank the customers’ YTD Trade Count and MTD Trade Count. One method is to do that by hand:
A | B | C | |
---|---|---|---|
1 | Customer | YTD Trade Count | MTD Trade Count |
2 | ABC | 190 | 7 |
3 | DEF | 288 | 38 |
4 | GHI | 69 | 32 |
5 | JKL | 168 | 38 |
6 | |||
7 | |||
8 | |||
9 | Customer | YTD Rank | MTD Rank |
10 | ABC | 2nd | 4th |
10 | DEF | 1st | 1st |
11 | GHI | 4th | 3rd |
12 | JKL | 3rd | 1st |
Yeah, it works! But if the number of customers is in the hundreds, doing it by hand is a huge waste of time. We can do better than that.
MSExcel has one handy function LARGE
2 which the
official documentation says:
Returns the k-th largest value in a data set. You can use this function to select a value based on its relative standing. For example, you can use LARGE to return the highest, runner-up, or third-place score.
and its syntax:
LARGE(array, k)
The LARGE function syntax has the following arguments:
- Array Required. The array or range of data for which you want to determine the k-th largest value.
- K Required. The position (from the largest) in the array or cell range of data to return.
Great! Type out the formula =LARGE(B$2:B$5,ROW()-9)
, copy and paste it
across the range B10:C13
, and we’ll get:
A | B | C | |
---|---|---|---|
1 | Customer | YTD Trade Count | MTD Trade Count |
2 | ABC | 190 | 7 |
3 | DEF | 288 | 38 |
4 | GHI | 69 | 32 |
5 | JKL | 168 | 38 |
6 | |||
7 | |||
8 | |||
9 | Customer | YTD Rank | MTD Rank |
10 | ABC | 2nd | 4th |
10 | DEF | 1st | 1st |
11 | GHI | 4th | 3rd |
12 | JKL | 3rd | 1st |
13 | |||
14 | |||
15 | |||
16 | YTD Ordered | MTD Ordered | |
17 | =LARGE(B$2:B$5,ROW()-16) | =LARGE(C$2:C$5,ROW()-16) | |
18 | =LARGE(B$2:B$5,ROW()-16) | =LARGE(C$2:C$5,ROW()-16) | |
19 | =LARGE(B$2:B$5,ROW()-16) | =LARGE(C$2:C$5,ROW()-16) | |
20 | =LARGE(B$2:B$5,ROW()-16) | =LARGE(C$2:C$5,ROW()-16) |
And that yields the following:
A | B | C | |
---|---|---|---|
1 | Customer | YTD Trade Count | MTD Trade Count |
2 | ABC | 190 | 7 |
3 | DEF | 288 | 38 |
4 | GHI | 69 | 32 |
5 | JKL | 168 | 38 |
… | … | … | |
16 | YTD Ordered | MTD Ordered | |
17 | 288 | 38 | |
18 | 190 | 38 | |
19 | 168 | 32 | |
20 | 69 | 7 |
ROW()
returns the row number of the cell the function is applied to,
thus ROW()-16
at B17
signals to LARGE
to return the first largest value;
at B18
to return second largest…
Although that simply sorts the two columns in descending order, it beats the manual steps in the following ways:
- Typing out the formula, copy and paste takes only 2 steps, but manual
sorting takes 4:
- Copy values from
B2:B5
and paste them atB17:B20
- Sort values in
B17:B20
- Copy values from
C2:C5
and paste them atC17:C20
- Sort values in
C17:C20
- Copy values from
- MSExcel automatically apply the ordering when any values in
B2:C5
change
Type out the rank at column D (actually, for ranks 4th and beyond, using
the formula =ROW()-15&"th"
can save some typing with some copy & paste
laziness):
A | B | C | D | |
---|---|---|---|---|
16 | YTD Ordered | MTD Ordered | Rank | |
17 | 288 | 38 | 1st | |
18 | 190 | 38 | 2nd | |
19 | 168 | 32 | 3rd | |
20 | 69 | 7 | 4th |
Replace the hand-coded rank at cell B10
with the formula
=VLOOKUP(B2,B$17:$D$17,5-COLUMN(),FALSE)
, then copy and paste the cell across
B10:C12
yields the same result to the do-by-hand version but with an
important advantage: any changes to the values in B2:C5
automatically
refresh the ranks.
COLUMN
returns the column number the function is applied to.
The 5-COLUMN()
fragment in the VLOOKUP
formula
is to determine automatically the lookup index. For example, at cell B10
,
5-COLUMN()
resolves to 3, at cell C10 resolves to 2.
Wrapping up
With no coding, some creative use of MSExcel functions, and lots of copy-and-paste, MSExcel’s efficiency is unbeatable. Yes, using pandas or R makes the work reproducible, but sometimes not everything is a nail. MSExcel has its place in data analysis; knowing what tool to use appropriately is far more important and flexing the scripting muscles.