EA's Qs: Metacritic Harsh Dose of Reality Edition
It looks like Wall Street wasn't as excited by John Riccitiello's Jerry McGuire moment as I was. The stock is down over 10% since the announcement. The daily stock price is not the be all end all, and every CEO will tell you they do not let the stock price dictate corporate decisions, but stock hits make access to capital harder, making things like purchases of Take Two relatively more expensive. The decline is probably more attributable to the losses disclosed on last week's earning call, or the expiration of the Take Two tender offer, but the guidance of no quarterly guidance coupled with likely delays did not help. I say “likely delays” even though the company didn't he was very specific about products slipping from a quarter but staying in the fiscal year. Spore in the first quarter of 2009 would still be a fiscal 2008 release.
The point I highlighted in the earlier post, was the commitment to quality against a backstop of the objectively measurable deliverable. Wall Street loves predictability. The game business hardly has any. EA committed to improve quality and endeavored to provide a measure for improvement. At first blush, to me at least, Metacritic made sense.I always knew the numbers were kind of squishy, but it kind of made sense. Higher critical scores mean higher quality and like I said above, companies are not managed to stock price. Well. . . they are not managed to daily stock price, but they are managed to maximize shareholder value, or long term stock price. Before we become the only form of entertainment in history to allow critics to influence the creative process, let's consider the source. The Metacritic score is not a valid measure because it is indicative of neither quality or sales.
On its face, John’s reference to Metacritic seems to make sense. The underlying theory for the site draws upon the wisdom of crowds. A popular anecdote of crowd wisdom is attributed to Sir Francis Galton. In 1906 he held a contest at a county fair to guess the weight of a cow. The guesses of livestock experts varied widely, and none were close. The average of the guesses made by a 1000 people in the crowd came within a single pound. The same experiment has been repeated over and over with jars of jelly beans. The theory is accurate when all data is given equal weight. When it comes to critics, you should be able to put all critics in, good and bad, and the average should be the right score. The right score, meaning the best objective indicator of quality and quality means sales. Unfortunately, Metacritic injects subjectivity into the equation.
Some people indicate there is a correlation between Metacritic scores and sales. They go so far as to identify a correlation between scores and market value. From my gut, I call BULLSHIT. The timing is way off. Metacritic scores post on the date of release. Sales forecasts and analyst reports come from channel checks at launch. Ask Michael Pachter whether it was the Metacritic score, or call to Wal Mart which gave him better insight into the success of Bioshock. Moreover, orders are placed well before Metacritic scores are calculated and are clearly not influenced by the scores. Orders are influenced by a buyer’s review of the game, the publisher’s marketing commitment to the game and the publisher’s pipeline of future titles. The hundred million dollars in movie marketing money will drive more orders than than a 90 for Psychonauts. If the big marketing budget is coupled with a product from the company about to release Call of Duty 4 - even better. Large marketing, strong publisher pipeline, means large order. If those buyer commitments are big, projected revenue increases and with it, projected marketing, sometimes leading to incremental orders, most of the time leading to stronger consumer awareness and therefore stronger sales. All of these events occur while Metacritic still has an N/A next to the game. In case it is hard to visualize, think of a John Woo choreographed gunfight where the buyer and the sales guy each has a gun to the other's head and free hand on their balls and the camera starts spinning faster, and faster around the scene.
You don't have to believe me though, listen to Metacritic’s founder. In an interview earlier this year:
Have you heard of specific instances where a Metacritic score has affected the sales of a game - for better or worse?One of the publishers he refers to is Activision, and the studio was highlighted in Nick Wingfield’s, September 2007 article pointing to Metacritic's correlation to sales.
Not specifically. Of course friends and users of the site have informed me that they haven't purchased games (or seen movies or bought albums) with low Metascores, but I've never been told by a publisher or developer that they've been able to definitively make a causal connection between poor sales and low scores from my site.
However, at least two major publishers have conducted comprehensive statistical surveys through which they've been able to draw a correlation between high metascores and stronger sales (and vice versa), but with a much tighter correlation in specific genres of games than in others. (emphasis added)
Activision Chief Executive Robert Kotick says the link was especially notable for games that score above 80% on Game Rankings, which grades games on a 1-to-100 percentage basis, with 100% being a perfect score. For every five percentage points above 80%, Activision found sales of a game roughly doubled. Activision believes game scores, among other factors, can actually influence sales, not just reflect their quality.Despite it falsity, this meme grew and was embraced by the industry, until Robin Kaminsky, head of Global Brand Management for Activision repeated quote and corrected the meme in her DICE talk this year. (The whole thing is up on line and you should watch it, it is very good.) She provided the context for the quote. It seems Metacritic scores are one factor in determining sales. She explained high Metacritic scores, coupled with strong marketing and sell in, mean high sales. The findings were supported by a break down of high scoring products in 2006 and 2007. Two thirds of the 18 products scoring 90 or above sold less than 2 million units, the break even point for a USD 20 million product. Only 2 products would sell in excess of 7 million. The largest grouping of products, 7, would sell less than a million. In case this is not persuasive enough, we can look at the other side of the score box. Until Call of Duty 3, the highest selling Call of Duty was Finest Hour with over 4 million units sold and a Metacritic score of 76. We can also look to Mario Party 8's score of 62 or Wii fit's score 80 for a product which retailers cannot keep on the shelf - did anyone think a pasty, sofa sitting, 24 hour a day controller holding, dark room sitting, Mountain Dew drinking, talking to d3vi1b0y007 through Xboxlive, breaking only to see Iron Man critic was going to give a fitness title a 90?
If Metacritic worked as objectively as Sir Galton’s analysis of every piece of data, we would quite possibly have an indicator of quality, and therefore an accurate measure of our gaming cow. Sadly, it does not. Metacritic does not include the entire data set, only those selected by Doyle:
This overall score, or METASCORE, is a weighted average of the individual critic scores. Why a weighted average? When selecting our source publications, we noticed that some critics consistently write better (more detailed, more insightful, more articulate) reviews than others. In addition, some critics and/or publications typically have more prestige and weight in the industry than others. To reflect these factors, we have assigned weights to each publication (and, in the case of film, to individual critics as well), thus making some publications count more in the METASCORE calculations than others.I get why he does it, and he likely had the best intentions, but it doesn’t work. The critical view is subjective. Doyle's determination of the critic's value is also subjective. So we are really getting a third generation facsimile of a subjective view of the quality of a title. If you factor in the uncertainty of the gallant, but flawed effort to convert A to F scales to numerical equivalents, Sir Francis Galton would certainly call foul.
Doyle justifies elimination or moderation to guaranty reviews from the best reviewers. Even this doesn't really make sense. If you really think about it, the most likely consumer of a review is uninformed, the mainstream buyer. The people who bought GTA IV at midnight knew they were going to buy it and knew it was coming out. The person who buys only one game a year may consider the same factors as the Entertainment Weekly or Variety reviewer in their purchasing decision. By limiting the reviews to hardcore gamers, we are further restricting accessibility to the market and putting one more lock on the door to our mother’s basement where we all sit and play games. Worse still, once Doyle injects himself, the service becomes an observational study of game scores and not Galton’s objective measure. It should be noted, Doyle never said it wasn’t, but people who utilize the date must understand, they are seeing his analysis of market data, not an objective measure.
As pointed out by Scott Miller in his book Developmental Research Methods:
A . . . general problem [of observational studies] is observer bias. Expectations researchers bring to research can sometimes bias their results, moving outcomes in the direction of what was expected or desired. In observational study the danger is that observers may see and record what they expect to occur, rather than what actually happens.Metacritic not only injects its own interpretation for the scoring, it determines the very field from which it will draw. Do you think there is any conscious or unconscious bias prior to a games release? Is the next GTA going to be good? Is the next movie based game going to bad? If I know GTA is going to sell a billion, do I consider whether a higher Metacritic score will influence Take Two to use it in their ads, thereby elevating my brand? How about if it gets a perfect score and the media -which loves measures and lists - uses my score as a new angle on the game? I am not saying any of this happened, but we can certainly see the potential. Of course, the scores are kept within a margin by the market. If a bad games scores to highly, the site will lose credibility. But if the 1up score in the 20s is ignored, a game could earn a few more points and the site maintains credibility. The value of being embraced by the CEO of the number one publisher in the industry as a standard? Priceless.
A study by Kent, O'Leary, Diament and Dietz (1974) provides an example. . . . The findings of the Kent et al. study suggests one way to reduce the probability of observer bias: Make the scoring categories as specific and objective as possible. The greater the leeway for interpretation in scoring, the greater the opportunity for the observer to inject their own biases.
The observational influence is not limited to the observer. The observed are influenced as well.
The behaviors recorded in an observational study may be a function of any number of antecedent or contemporary factors. One factor we do not wish to have influence the behavior, however, is the mere presence of the observer. Yet the presence of the observer, and the concomitant knowledge that one is being observed may alter behavior in various ways.Post Metacritic reviews seem to have more outliers. Critics know they will get attention if they give a shockingly low review of a game. They also know they will get more clicks if they are the lowest review on Metacritic than one of many in the fat part of the bell curve. There was always influence from the publisher's pipeline, now there is additional influence from Metacritic. Critics may be inclined to appeal to Metacritic, the weaker the pipeline, the stronger the Metacritic influence. If the publisher has a strong pipeline, the reviewer not only caters to the publisher, but can gain disproportionate influence by appealing to Metacritic. Being a Metacritic reviewer is like being a Nielson family, only better because the publishers know who you are. Doyle has said Ben Fritz of Variety is not considered for Metacritic scores, do you think he is going to get any exclusives?
So, back to the Jerry McGuire moment. I guess we really shouldn’t rely on the Metacritic as an indicator of quality or shareholder value. Quality should be measured the old fashioned way, sales. For this thought, I go back to an interview given by a really smart guy last February:
EA's Riccitiello wants to avoid the trap of just pursuing a good Metacritic score, a mindset he said frequently leads to too much executive meddling.
"The process often gets in the way more than it helps," he said. "That sort of circus has unfortunately sort of defined our company for too long. And it's not a good process.". . .
That's one view, but Riccitiello has another: "You don't cash Metacritic, you cash checks."