Array Formulas in Excel
Aug. 14th, 2009 10:56 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Array Formulas in Microsoft Excel
These things are wonderful.
Harry, Ron and Hermione have received various amounts of Sickles, and these are recorded in a spreadsheet thusly:
A | B | |
1 | Ron | 13 |
2 | Ron | 2 |
3 | Harry | 88 |
4 | Harry | 67 |
5 | Hermione | 76 |
6 | Hermione | 28 |
7 | Harry | 3 |
8 | Hermione | 78 |
9 | Harry | 5 |
10 | Ron | 42 |
First off, how many entries (not Sickles) has Ron got in our list?
Find a spare cell in the spreadsheet. For reasons that will become clear later on, I'm going to semi-randomly *cough* pick cell F3.
Now, the IF statement in Excel works like this: =IF(test, value if test is true, value if test is false).
So, in cell F3, we enter the following BUT DO NOT PRESS ENTER:
=SUM(IF($a$1:$a$10="Ron",1,0))
To enter this formula, press CTRL+SHIFT+ENTER. The code WILL NOT WORK if you just press 'Enter' as you might usually do.
What this expression does is it walks down the spreadsheet, from cell A1 to cell A10 (the dollars are absolute cell references, whose utility will become apparent later on), and for each cell, it tests whether the value contained therein is "Ron".
If the value is "Ron", then it takes the 'value if test is true' part of the IF expression, and adds it to its running total (the SUM bit of the formula). If the value of the cell is something different to "Ron", then it adds the 'value if test is false' to the running total.
In the above example, if the value is true, it adds 1 to the sum, if the value is false, it adds zero (in other words, leaves the SUM unchanged).
So, this means that for EACH occurrence of "Ron" in the cells from A1 to A10, the formula is going to add 1, and thus we get a COUNT of how many times "Ron" occurs.
Now, we're going to modify the formula, so that instead of merely COUNTING Ron's entries, we're going to tally his Sickles...
So this time, we're going to EDIT cell F3 and change it from
=SUM(IF($a$1:$a$10="Ron",1,0))
to
=SUM(IF($a$1:$a$10="Ron",$b$1:$b$10,0))
Remember: CTRL+SHIFT+ENTER
Just the one change, changing the 'value if test is true' part of the IF statement, so that instead of adding '1' to our SUM, as we walk down the column of cells, we're looking along the corresponding row and adding the number of Sickles to our running total.
It is VITAL that your cell ranges, $a$1:$a$10 and $b$1:$b$10 MATCH UP. They need to be the same 'length' of cells (in this case, 10 cells), and Excel will walk through the formula, taking the first cell in the 'A' range, and adding the first cell in the 'B' range's values as appropriate. If you have different numbers of cells in your 'test' (A) and 'value' (B) columns, then Excel's going to complain that it doesn't know what to do with the formula once it runs out of cells to match.
Trust me, although you CAN have offset ranges (comparing, say, A1:a10 to B21:B30), life is much easier if you keep your rows aligned (so both ranges start and end on the same row).
In summary, an Array Formula works its way through its cell range, performing its calculation on a Row by Row basis.
So now, in a single formula, we've scanned down a whole column of values, and summed JUST THOSE THAT PERTAIN to Ron.
But that's not ALL!
We started in Cell F3. Now go to the cell above, F2, and enter Ron, so that it matches our star Gryffindor's name as it appears in column A. And then, in G2 add Hermione, and in H2 add Harry, so that we have the glorious Trio in adjacent columns in row 2.
Now we edit our formula in F3 again, and change it from
=SUM(IF($a$1:$a$10="Ron",$b$1:$b$10,0))
to
=SUM(IF($a$1:$a$10=F$2,$b$1:$b$10,0))
Remember: CTRL+SHIFT+ENTER
IMPORTANT: note that it's F$2: there's only one $ reference, fixing the Row at 2, but leaving the column free to wander.
Now, drag the formula (the little black handle at the bottom right hand corner of cell F3) ACROSS to cover cells G3 and H3, the cells immediately below Hermione and Harry's monikers.
And lo! Verily doth the formula copy itself across, adjusting to pick up the Sickle totals for Hermione and Harry without you having to change a thing!!!.
If that isn't cause for w00t, then perhaps the next step will be.
Let's go back to our original data, and add a third column, House, putting the values in column C.
A | B | C | |
1 | Ron | 13 | Slytherin |
2 | Ron | 2 | Slytherin |
3 | Harry | 88 | Hufflepuff |
4 | Harry | 67 | Ravenclaw |
5 | Hermione | 76 | Ravenclaw |
6 | Hermione | 28 | Gryffindor |
7 | Harry | 3 | Hufflepuff |
8 | Hermione | 78 | Slytherin |
9 | Harry | 5 | Ravenclaw |
10 | Ron | 42 | Gryffindor |
We can nest IF statements. That is, put one IF inside another, like this:
=IF(test1, IF(test2, value_if_test2_is_true, value_if_test2_is_false), value_if_test1_is_false)
In the above, what's happening is that the whole red block of code is ONLY executed if test1 is true, because if test1 isn't true, then the outer IF statement will immediately jump to the 'value_if_test1_is_false' bit, and ignore test2 entirely.
So, to COUNT how many times Ron got Sickles from Slytherin, we'd need something like this:
=SUM(IF($a$1:$a$10="Ron", IF($c$1:$c$10="Slytherin",1,0),0))
Remember: CTRL+SHIFT+ENTER
In the above, each time a ROW has a "Ron" in column A and a "Slytherin" in column C, it's going to add 1 to its running total, or zero otherwise. If column A isn't Ron, it simply adds the zero that's at the end of the expression. If column A is Ron, then we go into the red part of the formula, and work out what to do next.
If the value in column C is Slytherin, then we add 1, otherwise, even though we've got a Ron in column A, because we don't have a match for our second, Slytherin condition, we add zero, or, effectively, the row gets ignored in our count.
Naturally, we can modify the formula to SUM the Sickles:
=SUM(IF($a$1:$a$10="Ron", IF($c$1:$c$10="Slytherin",$b$1:$b$10,0),0))
Remember: CTRL+SHIFT+ENTER
Just like when we were only testing for "Ron", instead of adding '1' for each time we get a match, we're now telling Excel to add the value it finds in column B to our running total if we have a match, or to 'add zero' (do nothing).
Now, we've been entering this forumla into cell F3. Don't paste it across Hermione and Harry's cells just yet, because it only references Slytherin.
Working downwards, in cells E3, E4, E5 and E6, put in the Houses in whatever order you want - make sure, obviously, that the spelling matches the way they've been entered in Column C, since we're going to be matching these in our formula:
Now edit F3 again:
=SUM(IF($a$1:$a$10="Ron", IF($c$1:$c$10=$e3,$b$1:$b$10,0),0))
Remember: CTRL+SHIFT+ENTER
Note that, again, we're mixing relative and absolute cell references, this time the $ is fixing the COLUMN, column E, but leaving the Row (row 3) free to change as we use auto fill.
NOW take Ron's formula in F3, and drag down through columns F4, F5 and F6. You'll see now we have a summary of Ron's Sickle takings, by House.
With all four cells selected, F3, F4, F5 and F6, drag the autofill handle across to the right to cover Harry and Hermione's cells too, to get the Trio's collections sorted by person and by House.
All from entering a single formula once!
ps: You don't need to just use 'SUM', you can use 'MIN', 'MAX', 'AVERAGE' as well - and probably other functions, though those I'm not sure about.
no subject
Date: 2009-08-14 12:58 pm (UTC)And for a moment, I thought you had a spycam in my office as I have spent this morning wrestling with arrays. But I'm trying to deal with arrays in VBA, so slightly different. But still coincidental!
no subject
Date: 2009-08-14 02:00 pm (UTC)Of course, here we're letting Excel handle the arrays, whereas in VBA I'm assuming that you're dealing with them directly.
Hope you're having fun in VBA land :-)
no subject
Date: 2009-08-14 02:17 pm (UTC)no subject
Date: 2009-08-14 02:28 pm (UTC)no subject
Date: 2009-08-14 02:33 pm (UTC)no subject
Date: 2009-08-14 02:35 pm (UTC)Anyway, the power behind Array Formulas, apart from the fact that you don't need a new sheet, is that you can then reference their results in other formulas and stuff. Plus you can nest and nest and nest, and do all sorts of tricksy things.
I'm not pretending that there aren't better ways to do things, mind... just that This Works For Me!
no subject
Date: 2009-08-14 02:42 pm (UTC)Actually I think they use array formulas... couldn't say, really, I'm no excel expert. (mostly because I rarely use formulas since I don't need them with those magic table things).
I think you don't need to create a new sheet if you prefer, you can keep it all on the same (it's much like graphs, I prefer to have them on separate sheets but I know other people don't).
no subject
Date: 2009-08-14 02:46 pm (UTC)I just like Array Formulas because you can see what's actually happening... although they can get... verbose when you get enthusiastic levels of nesting!
no subject
Date: 2009-08-14 02:58 pm (UTC)Do you have a special technique to type your formulas? I admire people who use formulas in excel without getting frustrated because I don't see anything in the formula bar and I keep forgetting brackets and stuff. Ah, gotta love SAS :D
no subject
Date: 2009-08-14 03:04 pm (UTC)Yeah, you can 'only' nest IF statements seven times, I think, but that's never been an issue for me. Yet.
And if I found myself needing that amount of differentiation, I'd write something in C#, probably...
no subject
Date: 2009-08-14 06:40 pm (UTC)=IF(AND(test1, test2, ..., testx), value_if_test_is_true, value_if_test_is_false) :o)
no subject
Date: 2009-08-14 08:10 pm (UTC)I don't use Excel much at all, but will try and get a handle on AND, because that could be extremely useful! :D
no subject
Date: 2009-08-14 06:18 pm (UTC)But you probably know that if you have only one condition like in the beginning, the functions COUNTIF and SUMIF do this without the need for array magic :o) (And more than one condition could be evaluated in an additional cell first and then used with COUNTIF/SUMIF)
Especially when you do this for a large amount of data, array formulas tend to get slow (they're loops, after all), so you might want to look for alternatives sometimes...
But of course this is not the point of this example :o)
no subject
Date: 2009-08-14 06:35 pm (UTC)So this counts the occurences of Ron getting money from Slytherin:
=SUMPRODUCT((A1:A10="Ron")*(C1:C10="Slytherin"))
And this would give the amount of money he got from there:
=SUMPRODUCT((A1:A10="Ron")*(C1:C10="Slytherin")*(B1:B10))
*geek*
no subject
Date: 2009-08-14 08:11 pm (UTC)Thanks! :D
no subject
Date: 2009-08-14 08:13 pm (UTC)And you can embed them in other sums and all the rest. The really magic bit is carefully constructing your key (top left cell) formula, and then auto-filling to complete the matrix.
however, I'm definitely going to have to investigate COUNTIF and SUMIF now, because they look pretty darned useful too.
* especially, it seems, when it's me doing the doing!
more complex logical tests?
Date: 2011-06-06 05:28 pm (UTC)=SUM(IF("599"<$A$27:$A$615<"651",1,0))
This logical test works against a single value, but not against an array. Any ideas if the problem is something silly, like formatting, or is this a limitation to the array formula itself? Thanks for your help!!
Re: more complex logical tests?
Date: 2011-06-08 03:54 pm (UTC)=SUM(IF($2$27:$A$615 > "599", IF($a$27:$a$615 < "651",1,0),0))
nesting the conditions means that if the first condition fails, the formula returns zero. If the first condition is met, the formula then goes on to evaluate the inner IF statement, which checks the other half of the inequality.