It’s Not About the Score; It’s the Cut-Score

***As this information continues to change and develop, this post will be updated.

There has been a lot of news about cut-scores in the last few days; but the only thing people really seem to understand, is that they don’t really understand cut scores. So, let’s break it down.

What is a “cut score”?
This one is pretty straight-forward:  It is a cut-off point. If you had a number line, and divided it by what scores are advanced, which are proficient, etcetera, those dividing points are the “cuts.”

What is a “quick score”?
A quick score is a score is temporary. It is delivered quickly, after the tests have been administered, to give students and teachers a score that can be used for final grades and or placement in the next year’s courses.

If these tests are as carefully created and reliable as they claim to be, why can’t they deliver the real score, rather than a “quick score”?
That’s a good question. Keep asking it. (I’m pretty sure you already know the answer.) If anyone gives you an answer that doesn’t involve manipulating test data, question that, too.

What is an “equated” score?
This is just taking a score and making it comparable to another score. For example, we might have one test that produces a score of 1 to 5, and a similar test that produces a score of 0% to 100%. Comparing those, a score of 4 would mean two very different things. To make a comparison, we might “equate” (translate) the 4 to 80%.

That sounds easier than it is. When creating an “equated” score, additional factors are taken into consideration, beyond just the number. One of those tests may have questions that are far more difficult than the other, for example. So, to make an equated score, the difficulty of each test is also considered, along with the number of questions, the kinds of questions, and other factors, to get down to what a score on one of these tests would “equate” to, on the other.

What are “pre-equated” and “post-equated” scores?
“Pre-equated” scores are scores that are equated BEFORE students are tested.

“Post-equated” scores are equated AFTER students are tested.

Why does it matter whether cut scores are equated before or after testing?
Now we are in a tricky area. It looks simple; but it is not.

(If you would like to read a paper on the topic, A Comparison of Pre-Equating and Post-Equating using Large-Scale Assessment Data explains things well.)

Most organizations use PRE-equated scores, because they are better able to justify where the cut-offs occur. If an test creator has carefully considered their questions, the course content those questions map to, and the difficulty of the questions, they should be able to reasonably set the cut-off points where students should be expected to perform if they have “basic” knowledge, “proficient” knowledge, or “advanced” knowledge.

Since the Tennessee Department of Education uses Post-equated scores – they can willy-nilly set the cut scores to whatever they want, AFTER the tests have been taken.

This changes EVERYTHING your students and teachers have been told. They were given curriculum to use and scores to work toward. Their “quick scores” showed that they mastered the material. But the Department of Education gets to go back and CHANGE THEIR MINDS about the cut-scores.

Can you imagine being in a class, where the teacher gives you a grade and over the summer, sends you a note that he re-calculated the grades and your grade is completely different?!  That is what the Tennessee Department of Education can do by determining cut-scores AFTER the tests have been taken and scored.

A local reporter told me that she is trying to do a story on this, but is having trouble getting information from the Department of Education!  How is that acceptable? Why aren’t parents on the doorstep of the TN DOE?

 

The Tennessee Education Association has also been trying to get answers and posted this information on May 27, 2015, on their Facebook page:

TCAP Update:
Following the state’s conference call, we now know that the state did change its methodology for calculating quick scores for students in grades 3-8. It is now using the cubed-root method the state has been using for high school EOCs. This change in methodology resulted in apparent grade inflation, leading parents and educators to believe students had performed better than in previous years. The change resulted in about a 4-point increase in cut scores from the method used in 2014.

Please visit the link below for documents provided by the state in its attempt to explain these changes. TEA still has many, many questions about the reliability of both the quick and cut scores, why these changes were made and how proficiency levels are determined. We will continue our efforts to get more answers from the state and insist that they‪#‎showthemath.

State Documentation of TCAP scores

Below are some of the official answers TEA has been able to get, so far. Please note that TEA has been doing their due diligence on this issue and there has been more information, each time I have looked at their page.  Please use the link above, to follow their findings.

Quick Score/Proficiency level correlation:
We have not changed the mark or expectation for student proficiency on TCAP; there have been no changes to cut scores for proficiency levels. I’d also like to clarify that quick scores are no longer tied to TCAP performance levels. For example, a quick score of 85 is not equivalent to the cut score for proficient. We compare student performance each year based on the scale scores.  The scale scores determine the cut points for performance levels (i.e. below basic, basic, proficient, advanced). We always produce equating tables in the fall that clearly define the raw score equivalent cut points based on the scale score. This is designed to help teachers know what to expect early in the school year. The equating tables for 3-8 achievement can be found here.  The equating tables for EOC can be found here.

Student performance expectations for the proficiency threshold have not changed.  They are exactly the same as last year, and these expectations are exactly the same as the equating tables which we published online in the fall for teachers to access. Quick scores do not determine proficiency levels. I have attached a FAQ – A Guide to Understanding Quick Scores – that we created to help explain the purpose for quick scores.  In addition, please see the attached TCAP Scoring Flow Chartthat shows how and where quick scores fall into the scoring process.  It is clear from the flow chart that quick scores have no relationship to performance levels.  Quick scores are used only to calculate a 100-point grading scale. There are various methodologies that can be used to create a 100-point grading scale from the raw score, and, this year, we used the cube root method for grades 3-8, as we have done for EOCs over the past several years.

Quick Score Calculation:
What was the rationale for making this change to the cube root method? Is it possible to see the formula used for this calculation?

The rationale for making the change was to create a consistent methodology for generating quick scores and one that was not dependent upon TCAP performance levels like the interval scaling method used in 3-8 achievement since 2012. We updated the methodology to be consistent with what we are doing for End of Course exams.  We will be engaging directors of schools in more conversations about quick scores for 2015-16.

I have attached (linked above) a memo from April 2012, TCAP Quick Score Conversion Guidance, which includes the interval scaling methodology for generating quick scores in grades 3-8.  I have also attached the Cube Root Quick Score Calculation guidance that details the cube root method used this year for all grades.

Proficiency Levels:
What are the proficiency level ranges for Below Basic, Basic, Proficient, and Advanced for the various assessments? How do these ranges compare to previous years?

The equating tables for 3-8 achievement and EOCs are posted online, and they show the scale score ranges for each performance level.  These scale score ranges are the same for 2015 as they were in 2014.

 

The fact that teachers, districts, parents, and communities are having difficulty getting timely and adequate answers from the State Department of Education should be very concerning. It certainly makes things look fishy.

Next Up: High School EOC Cut Scores, Predicted Scores, and Misuse in Teacher Data

Florida Fails to Deliver Useful Testing Schedule

Florida is now saying they will not have test results until sometime next fall, partially due to the process used in computing “cut scores.”

I thought these tests are supposed to be used to “inform instruction.” Isn’t that what they keep telling us? How do they “inform instruction” when the results aren’t available for MONTHS???

When I take a computer based certification test, I get my score IMMEDIATELY – on the screen, with a copy sent to a printer.  The difference?  No Dept. of Ed, scamming the scores.  Ahem… I mean, “computing cut scores.”

New Florida law will delay school test results for months

“Once the cut scores are set, Stewart said, the reporting of results should return to the early summer, as in the past.”

How is that better?  Those results still do NOTHING to “inform instruction,” and can only be used to sort students and build data banks.

Education Commissioner Pam Stewart says test scores could be delayed eight months.

FL Education Commissioner Pam Stewart

Transcript: Datapalooza 2012

How much data is currently being stored about YOUR children? Do you have any idea who is storing that data, what they know, or how that information can be used?

This is a work in progress – as a slightly improved version of the transcript of the Knewton data video below:

So the human race is about to enter a totally data mined existence and it’s going to be really fun to watch. It’s going to be one of those things where our grandkids are going to tell our kids, “I can’t believe you grew up in a world like that” just the way our kids complained that we went to record stores.

You know, when Tom Cruise walks through the mall in Minority Report and the ad beams right to his eyes and says “Hey Mr. Cruise you should you go on that Caribbean vacation you’ve been thinking about.” I know some entrepreneurs who work on that technology right now. And, um, I’m still waiting for the day when my refrigerator’s going to know when I’m running out of milk and it’s ordered for me automatically on Fast Track. I think that day’s coming in a few years it’s not far off.

The world in 30 years is going to be unrecognizably data mined. So what does that mean for education?

Well, education happens to be, today, the world’s most data minable industry by far, and it’s not even close. So maybe, one day, healthcare will be up there – when they have little nanobots that are in your bloodstream that are doing real time analysis, but until then it’s not close. Education beats everything else, hands down.

So let’s look at other big data industries:

The really big data industries in the world right now are, not surprisingly, on the internet because that’s where it’s easy to grab the data and that’s also where there’s a congregation of talent that understands data.

So, um, well, let’s just look at it by the numbers – because the name of the game is “Data Per User.”

Okay, so, one of the things that fakes us out about data and education is: education, because it’s so big (it’s like the fourth biggest industry in the world that produces incredible quantity of data).

But data that just produces one or two points per user, per da,y is not really all that valuable to an individual user. It might be valuable to like a school district administrator, but maybe not even then. So let’s just compare. Netflix and Amazon get in the ones of data points per user per day. Google and Facebook get in the tens of data points per user per day. So you do 10 minutes of messing around in Google you produce about a dozen data points for Google. Okay great.

So Newton today gets five to ten million actionable data, per student, per day. Now we do that because we get people (if you can believe it) to tag every single sentence of their content (we have a large publishing partnership with Pearson, and they tag all their content) and we’re in open standard so anyone can tag to us.

If you tag all your content and you do it down to the automatic concept level, down to the sentence, down to the clause, you unlock an incredible amount of trapped hidden data.

Why do you do that?

Well if you use programmatic taxonomy models and item response theory and that thing at the bottom (we haven’t given that a name yet), what you figure out is: everything in education is correlated to everything else down to the concept.

underlyingscience

 

Now this is where education’s different from search or social networking. If someone tagged every single line, every single sentence of all the world’s web pages for Google, or every single line of dialogue from Netflix, which no one’s done, but even if they had they’re not really a whole lot of interesting correlations there.

Everything in education is correlated to everything else. Every single concept is correlated in a predictable way to everything else using psychometrics right.

So if you do 10 minutes of work in Google you produce a dozen data points for Google. Because everything that we do is tagged at such a grandeur level if you do 10 minutes of work for Newton you cascade out lots and lots of other data, and here’s why. When you took the SAT there might be 40 different concepts about equilateral triangles that are tested on all the SATs ever given in any one year.

But you didn’t get all 40 questions you got two questions on equilateral triangles, because, they figure, if you’re in the Top 14th percentile at those two questions, 13th percentile on this one and 15th percentile on that one… If you’re in the Top 14th percentile on those two questions in equilateral triangles, the odds are a 98th percentile chance that you’re in the Top 14th percentile at every concept on equilateral triangles. And there’s a 96% chance that you’re in the Top 15th percentile at all triangle concepts, three, four five, 30, 60, 90, isosceles, etc., etc.

You did a little bit of work for Newton and we used just established signs of psychometrics to cascade out hundreds of other data.

So we can produce incredible quantities of data per user, per day. It’s really, really hard to get that, okay? But, if you can get all that tagging done…

({refers to slide} …and that’s one of our tags. That’s a small part of our overall taxonomy. That’s just part of one course and we have dozens of taxonomies), then you can do this.

Granular understanding

 

What you can do with the data, if you actually do all that work, is you can figure out exactly what students know and how well they know it. You can figure it out down to the percentile versus the rest of the population.

So, Newton students today: we have about 180,000 right now, by December it’ll be 650,000, early next year it’ll be in the millions and the next year it’ll be closer to 10 million, and that’s just through our Pearson partnership.

So for every one of the students, we can figure out, within a few hours, what they’re strong at and what they’re weak at, at the beginning of the course. So we can produce a unique syllabus for each student each day, literally unique.

There’s not enough time in the universe for any two students to have the same syllabus on any one day, that’s how many there are. So it’s optimized for each kid down to the atomic concept. And then we can figure out things like well here’s your homework tomorrow night, you’re going to struggle with that homework or you’re going to fail it, because concepts in that homework that we know you haven’t mastered the previous concepts for that build up to that. Or there’s concepts in that homework that [inaudible 04:53] very highly concepts always have trouble with.

So we know you’re going to fail, we know it in advance and we can prevent it in advance. We go grab some content from somewhere else in the portfolio and going to seamlessly blend that into your homework tonight. So every kid gets a perfectly optimized textbook, except it’s also video and other rich media dynamically generated in real time. And it also uses the combined data power of the entire network. So here’s what I mean by that, like I said next year we’ll have close to 10 million students, a few years from now we’ll have a 100 million.

A 100 million first shows up to learn something like rules of exponents or subject per agreement, whatever. We take the combined data problem all hundred million to figure out exactly how to teach every concept to each kid. So the 100 million first shows up to learn the rules of exponents, great let’s go find a group of people who are psychometrically equivalent to that kid. They learn the same ways, they have the same learning style, they know the same stuff, because Newton can figure out things like you learn math best in the morning between 8:40 and 9:13 am. You learn science best in 42 minute bite sizes the 44 minute mark you click right [inaudible 05:47], you start missing questions you would normally get right. You learn social studies best with video clips or 22% video to 78% text, or whatever your optimal cocktail. We can tell when we should return content to you for optimal retention.

We literally know everything about what you know and how you learn best, everything because we have five orders of magnitude and more data about you than Google has.

We literally have more data about our students than any company has about anybody else about anything, and it’s not even close. That’s why we can do all that stuff right.

So then what we can do is take that profile the 100 million kids, next it’ll be 10 million. We can go figure out okay who’s exactly like that kid? Whose learning styles up and down the line are just the same? Who knew the same stuff at the same level of mastery when they had [inaudible 06:24]? Great.

Statistically speaking it has to be the case that some 5% or 10% through shared bad luck did the absolute wrong thing for themselves without knowing it. They did questions that were too hard, that got discouraged, they bounced. They accessed text they should have gotten the video, whatever. It also has to be a fact or statistics that through pure blind luck, some Top 1% the absolute perfect thing for themselves without realizing it.

And we go take the whole combined data power that network of millions, soon to be tens of millions, eventually it’ll be hundreds of millions of people. And for every single concept that your child learns 2000 concepts in a particular semester along math course, for every single autonomic concept we take the combined data part, that vast network and use it to fund perfect plan forward for that kid for that concept. So that’s what we do right now.

Let me give you a couple of examples. This is one student. There’s a few hundred learning clusters there, there’s a few tens of thousands of autonomic learning objects there. That’s one student’s path, this is a real student in a US college right now. And you’ll see that each student has a totally different path. Some students have short paths, some have long paths, in this particular course there were students who finished it in 14 days, there were students who finished it in two semesters.

This is a course at ASU. They had to change their semester structure to a modulate semester structure because we were suddenly telling them things like if you give this woman here the final right now she’ll get an A, it’s only 14 days into the course. I promise you she’ll get an A. You can keep her in that seat if you want, and that’s what we’ve always done now we don’t have to.

So let’s show you this. This is a 150 student’s one class and they kind of all look like fleas but that’s all an individual learning path. Notice that some of them are going really fast, some of them are going really slow, and then they’ll all kind of speed up when the test comes. It’s kind of like organic and so those different color coded things are like concept clusters. Like some test obviously just happened, that’s why they all started working.

And you can look at some of those students and think boy that pure schmuck is really in a lot of trouble because they’re going too slowly. So where we think we’re going with this obviously it’s in market right now. We’re going to be in K-12 starting next year and it’s an open platform anyone can plug it in and use it by APIs. And where we think we’re going with the data side of it, which is the really fun stuff for today, is we think within a few years we’ll be able to start predicting great performance.

So teachers grade persistently year in and year out, if that teacher grades consistently we can match up the student profiles down to the autonomic concept levels versus great performance. We can tell you you’re on track to get a B- in this course right now. Either that or if your teacher gets totally inconstant we can’t tell you that, but that’s another problem.

If your teacher grades consistently we can tell you what your grade’s going to be based on what you know and how fast you’re learning it. But if you do another 30 minutes a day for three days a week you can get it up to an A-. We can tell you things like that.

We’re really excited to correlate with other people’s datasets by open API things like, something we’ve talked about as kind of a joke but it really should work, is like the food diary. You tell us what you had for breakfast every morning at the beginning of the semester, by the end of the semester we should be able to tell you what you had for breakfast because you always do better on the days you have scrambled eggs or whatever. And more importantly we should be able to tell you what you should have for breakfast.

So the power of data when you unlock millions of data points per user per day you can accomplish things that people aren’t even conceiving of right now.

But that world is coming we’re trying to bring it to you and we’re going to be an open system to allow anyone to just plug that data, take it out, and then plug it back in.

Thanks very much.

 

More Tests!

From the “Dumbest Thing I’ve Heard All Day” department:

Dr. Jared Bigham is Director of College and Career Readiness at the State Collaborative on Reforming Education (SCORE) and a leader of the Expect More, Achieve More Coalition:
“We can’t afford to wait until students are 17 yrs old to determine if they are college & career ready.”

Umm…whut?

When should we determine whether students have reached the point we expect them to reach by age 17 or 18?  Maybe when they are in third grade?  Sure, why not?  It seems that we are headed that way, so why not just go ahead and evaluate third graders, to see whether they have reached the level of college and career readiness we expect to see from 17-18 year olds. If their teachers haven’t prepared these third graders to succeed in college – well – they are clearly not doing their jobs.

WHY, oh why, did I start writing, before I finished reading the whole article?  It was just so ridiculous, that I couldn’t wait to put my parody in print.

and then…

…I read the rest of it:

“With an assessment that matches our standards and the way students are learning in class, we could know as early as third grade if a student needs academic support to stay on track for college and career readiness.”

If that isn’t a “WTH” moment, I don’t know what is.

Today’s third graders are around 8 years old.  When they start college – ten years from now – we don’t even know what kinds of careers will be available.  At best, we are guessing.  How much more sense it would make, to educate students to be flexible, thinking, citizens.  Instead, we are setting them up for a multitude of missed opportunities.

 

Consider a few jobs that barely existed 10 years ago:

IOS Developer makes cool apps for your phone
Android Developer makes cool apps for my phone
Zumba Instructor you thought aerobics died – nope
Social Media Manager  LinkedIn, Facebook, Twitter, YouTube: all founded 2003-2006
Data Scientist bringing all those test scores together
UI/UX Designer making the “user experience” pleasant on our digital devices
Big Data Architect between organizational needs, data scientists, & data engineers
Beach Body Coach distributors of BeachBody LLC products
Cloud Svcs Specialist specializing in always-available technical services
Digital Mrktg Specialist marketing an array of digital services to customers

 

An interesting irony:  a number of these new careers that will continue to experience huge growth as the testing frezny continues.  The data YOUR KIDS produce through inappropriate testing are inherent to their job growth.

Let that sink in.

But, what can we expect, from a site so wrapped up in test scores, that they are STILL mis-representing Tennessee’s ginormous ACT “gains.”  They even point out that “most state averages on ACT only increase 0.1 points in a year…” WHICH IS EXACTLY WHAT WE GAINED, after the last two year’s losses are figured. (Remember, three-year averages are used to show growth.)

So kids, forget recess.  You’ve got calculus homework to finish.

Parents of infants and toddlers might want to order some of the items below. Everyone else is already behind.

     

Essay Test Scoring

The more we find out about the scoring of essay tests, the uglier it gets…

“After three years working as a scorer, Dan DiMaggio says he’s a skimming machine. “It’s ugly,” he says. “You just go as fast as possible.”

 

“Eventually, DiMaggio got used to not asking questions. He got used to skimming the essays as fast as possible, glancing over the responses for about two minutes apiece before clicking a score.”

Read the full article here.

Teachers Opt Own Children Out of Testing

“Some states have laws and policies that allow parents to opt their children out.  Tennessee does not.  Yet…  

There is currently a Bill in the Legislature that, if it passes, would allow parents to legally Opt-Out of testing for their children without penalties (HB 1841 / SB 2221) .  The Bill’s sponsor, Rep. Gloria Johnson, is also a teacher.

 

 (Unfortunately, a half a BILLION dollar fiscal note has been attached to the bill, and the bill has been rolled to the final calendar to prevent it from passing.  Contact Governor Haslam if you’re not happy about that.  His phone # is 615-741-2001 and his email is: bill.haslam@tn.gov). “

To read about opting Tennessee students out of testing, please click this link.

 

More Money for Pearson & Gates – Since 2011

If you still think “Common Core” standards were developed by anyone in the education field, or that the standards are a new idea, you might want to look at this press release from April 27, 2011:

“NEW YORK–The Pearson Foundation today announced a partnership with the Bill & Melinda Gates Foundation to support America’s teachers by creating a full series of digital instructional resources. Online courses in math and reading/English language arts will offer a coherent and systemic approach to teaching the new Common Core State Standards. Common Core Standards were developed by the National Governors Association, in partnership with the Council of Chief State School Officers. Forty-one states, two territories, and the District of Columbia have adopted the standards.”

Posted in News ReleasesPreK-12