Friday, December 23, 2011

W. Edwards Deming and American mathematics education

Just in case you forgot, I'm still hear, gentle readers.  In the finest traditions of hyperbolic understatement, I will say that this just completed Fall semester was a little busy.  The good news is that I have returned, and the world won't end for at least another week if you believe those South American calender enthusiasts.

The good news

As it turns out, I have managed to spend a good bit of time reflecting on the educational process over the last semester-- just not blogging about it until today. Rather than keep you in suspense-- because I know you could be watching YouTube videos of cats in amusing situations if you weren't reading this-- I will cut to the chase.

Fortunately, I have figured out what the problem with the American mathematics education system is.  In fairness, I should point out that I am not unique in this.  Or even ahead of the curve, actually.  To whit, have a look at the following TEDx talk of Gary Stager.

Additionally, hop on over to Generation YES and have a gander at Sylvia's thoughts on the Khan Academy.

So, what's the deal?

Simply put, the problem that we have with American mathematics is this.  From my bridge, I see basically two reasons that motivate people to learn about any particular thing.  The first, and most natural, is that the learning process is exciting and enjoyable.  Ever wonder why teenagers who won't learn to factor trinomials can effortlessly survive more than 20 waves of a zombie attack?

I have played Nazi Zombies into the 30s, and I can tell you that factoring trinomials is much easier.  So how do guys like Syndicate get good at it?  The same way you get to Carnegie Hall.  They practice.  A lot.  Jazz musicians call it the woodshed-- supposedly named after the location of Charlie Parker's intensive practice sessions.  So, here's the big question.  Why do they practice so much?

The obvious, yet often overlooked answer is because it's fun.  People who obsessively practice their craft do so partially because they want to get better, but also because the process of getting better is fun and exciting to them.  You tell me, is the following fun and exciting?

Notice a few things about the preceding video.  First, Mr. Jones does an excellent job of presenting his example in a clear, concise, and easily understood manner. Even though voluntary response sampling is rather suspect, the comments below the video certainly don't contradict this conclusion.

Second, notice that the example is presented almost entirely out of context.  We can reasonably assume that Mr. Jones has provided some of that either in class or other videos, but there really isn't anything motivating the mechanics of this factoring technique other than an amorphous desire to find the monomial factors of the indicated polynomial.

And that brings me to the second thing that motivates people to learn something. What will I be able to do with this?  Here's one of my favorite examples.

One of the nice things about this presentation is that Sean purposely avoids the gory details of data mining and exponential/logarithmic regression.  This is a good decision on his part for multiple reasons.  First, it would take way too long to provide enough detail to mean anything to the audience.  Second, these kind of details are really boring.  They would completely cut the legs out from under his presentation.  The only purpose those details would serve would be to convince the audience that it's a complicated process.  Guess what.  They can already tell.

The problem with mathematics education

The problem I'm dancing around is that our educational system completely ignores the two things that would motivate students to learn mathematics.  Our American teacher-centered, didactic-ueber-alles approach designed for the sole purpose of appeasing the Gods of Standardized Testing is not only educationally unsound, it also gives mathematics a bad name.

But how did this happen?  Simple.  We allowed it to happen.  The more proper question is why did this happen?  Unfortunately, the answer to that question is also simple.  We are lazy.  It is easier to assess whether students can factor trinomials than to assess how well they can creatively solve real world problems that involve the solution of a second-degree polynomial equation.  Perhaps more importantly, it is much easier to assess such things in a standardized testing environment.

Our misplaced focus

According to Don Small, writer of one of the more highly regarded reform textbooks for college algebra, the Problem-Solving/Modeling Process consists of three steps illustrated like so.

The three steps are model creation, analysis, and interpretation.  In Sean Gourley's TED talk, he focused on model creation and interpretation.  And these are the exciting steps.  This is the sort of thing that causes people to say, "That's why we learned this?  Awesome!"

But we have chosen to focus on the analysis step in our classrooms.  While this is an important step, we are ignoring both steps that involve interfacing with the real world.  The result is all over our contemporary mathematics classrooms.  Ask any high school math student to come up with a real world example where he or she would need to use algebra to solve a problem.

And the worst part is that the analysis step is the part that people are worst at. Consider the following Wolfram Alpha widget created by yours truly.

If you have a job factoring polynomials, guess what.  You've just been replaced by a computer.  But guess what else.  Nobody has a job factoring polynomials.  In reality, factoring polynomials is a small part even of an algebra teacher's job. Similarly, it's a small part of solving the analytical step of any real life problem involving a polynomial function.  And yet, we act as though this skill is the lynch pin of higher mathematics.

Imagine if we took the same approach with teaching English.  A person who still muddles up the difference between the nominative and objective cases would never be allowed to write her memoirs.  But then the world would be a safe place for people who say, "It is only I."

And now for the bad news

Of course, the natural question is "What do we do to fix this problem?"  The bad news is that it won't be easy.  We have allowed this twisted view of mathematics grandfather itself into our culture as well as our educational hierarchy.  Being an industrial engineer, I am a strong advocate of Deming's principles for trans-formative change.  Rather than summarize all 14 points, here are three that make for a good conversation starter.

Point 1: Create constancy of purpose towards improvement. This is the big one. "As they say on TV, the mere fact that you realize you need help indicates that you are not too far gone."

You see, it will be hard to convince the powers-that-be that our system is a) broken and b) can't be fixed by more standardized testing.  But we as educators have to accept that we can't simply put our heads down, change our own classrooms, and expect the system to fix itself.

Point 3: Cease dependence on inspection.  The goal of total quality management (TQM) is to reduce variation and constantly improve the product.  If we are honest with ourselves, we will accept that the purpose of standardized testing is to identify "defective products"--  that is, deficient students.  There will be no need to waste countless hours of instructional time on standardized testing if we build quality into the product.  Identifying and quantifying mistakes is not a way of life in a system that is designed to produce a quality product.  Of course, a certain amount of testing is needed for assessment purposes, but it shouldn't be the tail that wags the dog.

Point 10: Eliminate slogans.  Deming had a rather cynical view of management, it would seem.  In the world of sub-par management, people make mistakes and need to be hassled until they straighten up and fly right-- or replaced with different people.  In Deming's worldview, most mistakes are results of a poorly designed system.  But hassling employees is easier than improving the system.  The short version is, "Blame the system, not the people."

The punchline

Rather than wallow in helplessness, it is my intention to contribute to the current dialogue in reform of the US mathematics education system.  It will be a long, slow road to modernizing our system, but it is incumbent on us as educators, parents, and citizens to insist on nothing less than a first-class education system.

What we have in our education system-- particularly in K-12, but colleges are not exempt-- is a culture where "bad at math" is not only socially acceptable but also the cultural norm.  Simply put, we have decided as a society that we are willing to accept our substandard results.  And the reason is that we are unwilling to agree that the system is fundamentally flawed.

Monday, August 15, 2011

Why are we here? - Redux

I spent the day waiting to hear what was wrong with my motor vehicle.  Fortunately, I was able to cross a big item off my list while working at home.  Said item was the creation of my Prezi introduction to College Algebra.  I used the popular web series Red vs. Blue as a loose theme in the creation of this introduction.  I also plan to incorporate many of the same ideas into my workshop for our Freshmen orientation series.

Teaser: My new teaching philosophy is, "Math is like Jesus."

Here's a link to my Prezi, Why are we here - College Algebra Ed.  Feel free to leave comments either here or the Prezi website.

Incidentally, Red vs. Blue is decidedly not safe for work due to language, so don't look it up unless you can do so on a home computer.

Monday, August 8, 2011

Outlier - Textbook example

I have a short post today, gentle readers.  It appears that The Joy of Teaching... Algebra had a noticeable spike in pageviews on 2 Aug 2011 (last Tuesday).  And if there's one thing I love, it is real data.  The following is a line graph of my daily views over the last month.

Click to enlarge

There are cases where an experienced data analyst can spot an outlying data point without the use of quality control charts.  I am a journeyman data analyst, and in my professional opinion, the 2 Aug number of 68 pageviews is a clear outlier.  By comparison, the second largest point is 20 on 11 July.  Since I wrote an entry on 11 July, it is entirely possible that many of these 20 views are my own. I've since turned on the option to avoid counting my own views, so it is unlikely that these views are mine.  At least, I hope they aren't, since I find narcissism rather unbecoming.

The significance of this statistical anomaly is a mystery to me at this point. Consequently, I would love to know if any of my gentle readers can shed some light on this occurrence.  If it might help, the views all appear to have occurred over a two hour time frame beginning approximately at 9:00 pm Central on Tuesday 2 Aug.

Upcoming items

I've not forgotten my promises to report on my linear algebra classroom blogging and my qualitative comparison of letter grades.  These are ongoing items, and I will be prepared to write about them by the time classes start on 22 Aug.

Sunday, July 31, 2011

Grading standards and linear algebra blogging

I have a short update today.  Enjoy the relief from my usual cavalcade of verbiage

Thanks to our Chief Academic Officer, I just saw the article College-Wide Grading Standards on the Foundation for Critical Thinking website.  The article described almost exactly what I was planning to post for part 2 of my series on assigning grades.  Instead of posting part 2 of my series, I will modify my entry to incorporate this excellent resource.

The article provides the qualitative descriptions of grades A through F, so my current plan is to discuss the differences in each grade category.  I hope to report on my results of developing an operational grading strategy based on these standards.

In other news, I've decided that my Linear Algebra course next semester will include a requirement that students create a blog of their learning progress in lieu of turning in homework problems.  I will still provide homework problems for the students to work on, but they will blog their progress instead of turn their solutions in for a grade.

All that and much more is forthcoming, so don't touch that dial.

Friday, July 29, 2011

Grading with flair - Part 1 - 15 is the minimum

Today's blog entry is the first in a series of at least two where I will discuss my philosophy of evaluating student work and the assigning of grades.  As Tolkien would say, it is a tale that grew in the telling.  Thus, I've decided to allow myself multiple writings as I refine the Sinkhorn Rubric.

One of my all-time favorite quips goes a little something like "If you think your teachers are tough, wait until you get a boss."  Too true.  You see, all good teachers need to be patient with learners.  As I mentioned in my last post, Yoda was patient with Luke Skywalker.  Purposefully, I did not go into great detail as to why great teachers are patient with their students.  My purpose was that I did not wish to steal any thunder from this blog entry.

The link here between Yoda and evaluation of students is that honest, constructive assessment of student work requires many things: among them are clear expectations, acceptance of creativity, and patience.  And the most important of these is patience.  Students, like children and employees, benefit from high expectations.  But it is counter-productive to insist that an instructor's evaluation of a student does not include at least a certain level of opinion.

In this first post, I discuss the biggest concern I have with evaluating student performance.  I will beg your indulgence, since I take a longer time than usual to get to the point.  So please, be patient.

But first, we need to talk about your flair

If you have never seen Mike Judge's feature length directorial debut, you should stop reading this and watch Office Space as soon as safety and decency allow.  In the movie, Jennifer Aniston's character, Joanna, is a waitress working in one of those restaurant chains.  You've been there.  It's the place where members of the waitstaff are evaluated solely on their willingness to behave in the most obnoxious and ingratiatingly sycophantic manner allowed by law.

The emblem of this behavior is flair.  In the Mike Judge world of a Chotchkies restaurant (Judge himself plays Joanna's supervisor), servers are required to wear at least 15 employee-supplied buttons-- the so-called flair-- on their uniform suspenders. "People can get a burger anywhere.  They come to Chotchkies for the atmosphere."  Suspender flair is presumably a big part of this atmosphere.

Joanna thinks, correctly, that the "atmosphere" of Chotchkies both debases their employees and insults the intelligence of their customers.  Stan, the supervisor character, picks the wrong day to criticize Joanna for wearing only the minimum 15 pieces of flair, and her response is honest, priceless, and a textbook example of a person railing against a divisive and poorly designed system of assessment.
Joanna - "You know what, Stan?  If you want me to wear 37 pieces of flair, like your pretty boy, Brian, over there, why don't you just make the minimum 37 pieces of flair?"
Stan - "Well, I thought I remembered you saying you wanted to express yourself."
Joanna - "Yeah.  You know what?  Yeah, I do.  I do want to express myself, okay.  And I don't need 37 pieces of flair to do it."
There may be children who read my blog, so I won't include a video of what happens next, but picture Jennifer Aniston dressing up as Stone Cold Steve Austin for Halloween.

Suffice it to say, Stan is not a born leader.  Mike Judge's character is a person who apparently got ahead by scoring high on the poorly designed Chotchkies rubric of evaluating employees.  The Chotchkies rubric does not identify and reward leadership, since leadership is apparently not something valued in the mid-level management at this particular establishment.

Guerrilla Grading

In our accreditation-driven education system, exhaustively detailed grading rubrics have become a holy grail of sorts.  In my opinion, it is extremely important that grading be both as fair and objective as possible.  Thus, a rubric is a natural means of establishing a standard to which students can compare their work as a part of their own self-assessment.

This is a good thing for students and instructors.  But I like to remember that professional educators are, well, professionals.  And one of the most important things that all professionals share is the need to operate with some degree of autonomy.  Simply put, educators earn a living at least in part because they know a good paper when they see one.

For me, the difficulty in grading student work is not determining right answers from wrong answers-- or even flawless logic from an argument that needs to be refined. My difficulty is in assigning points.  More properly, my difficulty is in deciding how many points to subtract for a specific error or omission.  Is a sign error a one point mistake or a three point mistake?  The answer to that depends on the problem. Suppose I ask for the roots of x^2+4, a sum of perfect squares.

This is what I'm looking for.

If I instead get the following, the difference is merely a sign error.  But it is exactly what I don't want.

A question of this nature is specifically designed to test if a student can recognize that a certain quadratic equation has no real roots. And I want the student to do this without having been provided the graph.  Students can discover this algebraically by solving the equation, visually by plotting the graph themselves, or intuitively by recognizing that x^2+4 is a vertical translation of x^2.

The Sinkhorn Rubric[TM] - Version 1.0

As it turns out, life is a pass/fail course.  According to Jorge Cham, this is also a reasonable characterization of life in graduate school.

The Sinkhorn Rubric is based on the pass/fail system.  A Jorge Cham pass/fail system is kind of like getting either a C or an F.  For me, an A is good, a C is good enough, and an F is not good enough.  The first and most basic version of the Sinkhorn Rubric is as follows.
A - The student successfully completed the assignment with a substantive addition of desirable elements including, but not limited to, at least one of the following: clarity, brevity, insight, professionalism, innovation, or creativity.
C - The student successfully completed the assignment with no catastrophic errors or significant omissions.
F - The student did not complete the assignment, failed to complete a significant portion of the assignment, or did not provide significant justification for his or her conclusions.
At Chotchkies, Version 1.0 would look like this.
A - A full 37 pieces of flair.  And a terrific smile.
C - The minimum 15 pieces of flair.
F - Flipped off the boss.  And a line cook who just happened to be standing there.
In my studies of student success in courses like college algebra and developmental math, the DFW rate is considered one of the most important performance measures.  The DFW rate is the fraction of students attempting a course who do not complete the course with at least a C.  That is, they earn a D, an F, or withdraw.  Consequently, I submit for your consideration that there is a peer-reviewed precedent of sorts for Version 1.0.

Beer, exams, and beauty contests

In a previous life, I was a blue-ribbon homebrewer.  My summer wheat took first in the specialty category at the Kentucky State Fair many moons ago.  I also was a beer judge as well.  Judging homebrew is kind of like reviewing wine for a magazine, but homebrewers will drink pretty much anything that is both non-toxic and made with malted barley.

In the Homebrew division of the Kentucky State Fair, there are three rounds.  In round one, each beer is judged against a style recognized by the American Homebrewers Association (AHA).  Points are awarded based on how well the entry adheres to the style guidelines in categories such as flavor, bouquet, and appearance.  The X number of beers with the X highest scores move on to round two.

Round two is the medal (ribbon, actually) round.  Judges in each category decide on their three favorite beers, then arrange the top three in order of first, second, and third.  In round three, the blue ribbon winners in each category compete for the coveted Best in Show award.

You see, round one is like grading exams.  You have to decide what to take points off for, and you also decide how many points to take off for each mistake.  Even though the AHA puts out every effort to make the process as objective as possible, it is still highly subjective.  In fact, all judges in each category of round one are required to make certain that their scores fall within a certain tolerance of each other.  And just like grading exams, judging beer for long periods of time has a tendency to dull the senses.  For some strange reason...

Rounds two and three are like judging a beauty contest.  In round two, all judges in, for example, the light ale category get together and decide which of the beers in their category are the three best.  These are the ribbon winners.  Then the judges place the ribbon winners in order of preference-- first, second, and third place.

Now let me ask you, my gentle readers.  Would you rather judge round one or round two?  That is, would you rather grade papers or judge a beauty contest?

AC/DC and fully ordered sets

From a mathematical perspective, the major difficulty associated with schemes for scoring homebrew entries is that the scores are generally whole numbers.  As a subset of the real numbers, whole numbers are what is called a fully ordered set. That means that whenever two numbers in a fully ordered set are compared, either the numbers are equal or one is larger than the other.  So, whenever two distinct (i.e. non-equal) whole numbers are considered, one of them must be larger than the other.

The simplicity of a fully ordered set is quite beautiful to a decision maker.  For example, consider that you wish to buy a 2010 Buick LaCrosse.  If Buick A costs $19,000 and Buick B costs $19,500, then Buick A is less expensive than Buick B. You will never need an expert to tell you that a $19,500 car is more costly than a $19,000 car.

And that's the beauty of a fully ordered set.  But as soon as there is a matter of opinion, things get a little more dicey.  We can easily look up the numbers to see that Australian rock band AC/DC's 1981 album For Those About to Rock reached a higher maximum position on the Billboard album charts than Back in Black.  For Those About to Rock reached number 1.  Back in Black did not reach number 1 despite becoming the second best-selling album in history.

But if you sit down and talk to any serious (or even a casual) AC/DC fan, you will be in for a long discussion of which of these is the better album.  An album grading rubric might help, especially if all manner of AC/DC experts were involved in the creation of this rubric.  But the determination of which is the better album ultimately must be, by nature, a matter of opinion.  That's why music departments teach Music Appreciation instead of Music Scoring and Ranking.

Partially ordered sets and a cheap, used Buick

When the rubber hits the road, any kind of sophisticated decision comes down to a matter of opinion.  A critical and informed expert opinion is preferable to a monkey throwing a dart, but it is still an opinion.  It's also not easy, which is why systems analysts get away with charging exorbitant consulting fees.

The big issue that I've spent so much time introducing is the concept of a partially ordered set.  In a partially ordered set, two non-equal elements can't always be neatly compared.  Let's go back to the Buick example. Buick A costs $19,000 and Buick B costs $19,500.  A is cheaper, so we should buy Buick A if the only thing we care about is the cost of the car.  This is a fully ordered set, but how realistic is it?

A businessman once told me that the smartest thing you can do for a customer who cares only about cost is to refer him immediately and enthusiastically to your closest competitor.  The implication is that cost is only one factor to consider when purchasing an automated material handling system.  Just like a material handling system, the cost of your family car is important.  But a cheap car isn't all that great if the wheels fall off as soon as you drive off the lot.  So let's consider the mileage of the vehicles.

Suppose that Buick A has 50,000 miles on the odometer to go with the $19,000 price tag.  If Buick B costs $19,500 and sports 5,000 miles, it's quite likely that you would happily pay an extra $500 to get a vehicle with fewer miles.  But what if Buick B costs $19,500 with 45,000 miles on the odometer?  Now you have a decision to make, because the Buick problem of cost and mileage involves two distinct elements of a partially ordered set.

The simplicity of Version 1.0 is a great strength.  A novice instructor could easily assign any sample of student work into one of the three categories in this first incarnation of the Sinkhorn Rubric.  The Version 1.0 rubric does not imply that two students who earn a C have turned in identical work.  What the rubric implies is that each student who earns a C did well enough to pass but not well enough to earn an A.

For next time

I doubt any dean or chief academic officer would be fond of a grading scale with only A, C, and F, so I will expand the Sinkhorn Rubric to allow for a B and a D.  If you have something to add to the discussion, feel free to use the comments below.  I will decide on the numbering scheme for my versions and get back to you next time.

In the meantime, let me ask you a question.  What is the difference between a B and a B+?  I don't want to know the numerical difference.  What I'm getting at is the qualitative difference between B and B+.  What does a student have to do to get a B+ that is significantly better than a B but not worthy of an A or an A-? Would an employer who hired a B+ student of accounting expect significantly more than that of a B student?