Textual analysis for better risk management
Most banks use credit-rating models to help them make decisions about lending to companies. Such models are indeed a requirement for banks using Basel II’s internal-ratings-based approach. But these models often have significant shortcomings. First, they are frequently backward-looking. Second, they rely on borrowers’ formal financial reporting, which means that data are always at least 6 months old; toward the end of the fiscal year, data are nearly 18 months old. Third, qualitative assessments of borrowers are often simplistic. And finally, many banks rely on their credit-rating models to provide both a current snapshot and a longer-term view, with the result that they do neither well.
Textual information can help banks overcome some of these challenges and improve their credit-risk assessment, in particular their approach to qualitative assessment. This information includes professionally produced content such as analysts’ reports and business journalism, as well as informal texts such as blogs and posts on social networks. Compared with the financial information available about small and midsize enterprises (SMEs) or corporates, the amount of textual content about companies is immense and provides a wealth of information. News articles describe the latest developments of companies; analysts’ reports provide insightful analyses on companies’ strategies, competitive positioning, and outlook; product ratings on online-shopping sites provide unfiltered views of customer satisfaction; and microblogs such as Twitter distribute the latest news (and sometimes gossip) with unprecedented speed.
Enormous quantities of textual information are available; this information offers companies a deep look at their health and performance, and it is notoriously difficult to use. But we have developed a prototype model that can identify and quantify sentiment within a trove of textual information, and it has performed extremely well in pinpointing default risks at a very early stage on a wide range of performance measures. We argue that if banks can put even a portion of this information to use in their systems, the accuracy, timeliness, and forward-looking character of their credit-risk-assessment systems would all be improved. And textual analysis can also help banks in other areas, improving their traditional analyses of industries and sectors.
Challenges of textual data
The challenges of mining this information and separating the signal from the noise are substantial. To use textual data, banks must first face a practical challenge: computational capacity. The amount of text-based information available is already enormous, and it’s getting bigger. Banks’ computers would strain to read and analyze it in daily operations or even to batch process it for model development. A database with news articles on about 1,000 companies easily exceeds 20 GB, orders of magnitude more than a financial database on these companies. Storing this much data is not difficult, but any kind of statistical analysis becomes an “overnight job,” even with optimized algorithms and systems.
Second, textual data are unstructured. While it is relatively easy to analyze financial data in a statistical way—figures become meaningful at a certain size and in relation to sample averages—texts are a priori meaningless to a computer. There are no standard or statistical procedures for a machine to analyze and interpret texts.
Third, texts are often ambiguous. In particular, the meaning of short messages in social media is difficult to interpret—even for humans. While complicated sentence structures can be taught to computers, the concept of sarcasm or irony is extremely difficult. In fact, almost all the semantic difficulties of written language pose immense problems for machines.
Companies in other sectors have already begun to employ a new technique, sentiment analysis, that banks can also use to get around these obstacles. The basic idea is simple and elegant: textual information in any form (words, sentences, paragraphs, articles, or books) is assigned a “sentiment index”—a number that represents a kind and degree of opinion expressed by the writer, such as optimism, trust, skepticism, mistrust, pessimism, and so on. Gauging sentiment with an index makes it possible for machines to analyze the information; it can be converted, aggregated, and compared. And the index can be used with statistical analysis to build prediction models. Obviously, the difficulties come in the details of assigning the sentiment index.
At the core of the process is a lexicon that lists words or phrases that represent a certain kind of sentiment and, importantly, reflect the specific context in which the text appears. Phrases that a novel’s heroine might use to describe her pessimism might not be the same as those—such as “legal proceedings,” “owner disputes,” or “financing problems”—that indicate financial concerns about companies. The lexicon must be contextualized appropriately if it is to have the kind of accuracy banks need.1
Just as important as properly defining the lexicon is selecting and filtering data sources. A broad search will yield more potentially relevant articles to be analyzed, but it will also pull in much irrelevant material. That poses a problem when seeking information about companies with ambiguous or generic names. When stories about Berkshire Hathaway, the US multinational conglomerate, are wanted, stories about Anne Hathaway, the US film actress, are not, as one hedge fund found out to its cost. With smart search tags and additional text filters, these challenges can be overcome.
Applications and benefits
Sentiment analysis and the information it yields can improve banks’ credit-rating models, and it can also help with two other important tasks.
In rating models, banks can use the sentiment index as an additional rating factor. Information gleaned from text searches is aggregated quarterly into a sentiment index for each company. After statistical analysis, the index is then integrated into the rating system at an appropriate weight. This can be particularly valuable in assessing new corporate customers for which banks typically have only limited information, most of it provided by the customer. A systematic screening of public information can reveal important additional insights. In emerging markets, where reliable customer data are scarce, the analysis of textual information can yield insights as well
Using textual data to perform sentiment analysis can help banks develop more accurate early-warning systems, improve their credit-rating models, and create better portfolio-management systems. It also has the potential to help banks better understand customer needs, improve customer satisfaction, and ultimately, shape long-term strategy.
70% of Consumers Trust Brand Recommendations From Friends
Seventy percent of consumers trust brand recommendations from friends, but only 10% trust advertising, according to a new report from Forrester Research.
The study, based on responses from 58,000 respondents, also found that 46% of consumers trust consumer reviews and 9% trust text messages from brands. The findings come after at least one Facebook partner has affirmed that the social network’s Sponsored Stories — which are based on friend recommendations on behalf of a brand — are more effective than standard banner ads.
Forrester’s report advocates branded content, which analyst Tracy Stokes writes “has the ability to create brand differentiating by bridging the gap between TV’s emotive power and digital media’s efficient reach.” Stokes views branded content as a “pull” model vs. advertising’s traditional “push” approach. Forrester defines branded content as:
Content that is developed or curated by a brand to provide added consumer value such as entertainment or education. It is designed to build brand consideration or affinity, not sell a product or service. It is not a paid ad, sponsorship or product placement.
According the Stokes’ research, the need is more acute in Europe, where consumers are generally more skeptical about online ads and messages from brands.
Marketers appear to have gotten the message. The report states that 79% of brands say their organizations are shifting into branded content. The problem, however, is that branded content is far from tried-and-true as a strategy. Writes Stokes: “For every Oreo or Old Spice there are hundreds of unseen messages and videos.”
Not all Predictive Analytic tools are born equal
Let’s return to the same example as for the previous section about “CRM tools” in general. Let’s assume that you want to do a direct-mail marketing campaign for one of your product. You must find all the customers that are susceptible to buy your product and send them a brochure or leaflet (this is a classical “propensity to buy” setting but the same reasoning applies to any other settings: cross-selling, upselling, probability of default, etc.).
The main objective of any analytic tool is to generate the best, most accurate, ranking (or “list of candidates”). There are mainly 2 different approaches to generate the ranking: segmentation or prediction. Different software use different approach:
- Segmentation tools: This covers 99% of the available tools. These tools are very easy to create and are, most of the time, only a small component into a larger “Operational CRM tool”.
Some Example: Probance, Miner3D, Webtrends Segments,…
All these “segmentation-based” tools are very inaccurate and always lead to very poor ROI’s compared to Predictive-Analytic-tools.
- Predictive Analytic tools:These tools are quite intricate to create and although they all deliver superior performances compared to simple “Segmentation tools”, the quality of the delivered results (quality of the ranking) varies greatly between them.
We will now give a little bit more detail about these 2 approaches.
By its very nature, segmentation is a technique well-adapted for exploratory work.
In opposition, predictive analytics is discriminatory in nature.
Let’s give a small example:
“Segmentation techniques” can be represented like this:
Each candy in the cookie-jar represents a know prospect. The big red cross represents our segmentation: we decided to create 4 segments based on our analysis of our population.
The objective of the exercise here is to create the best ranking: We want to select only the prospects that will buy our product. To create this ranking, we will analyze our population and we will pay a special attention to the people that already have the product (these costumers are represented with a Chokotoff label: )
There are many different ways to define segments of your population. Segments can be defined:
- Using simple business-rules: for example:
- Segment 1 is composed by the men with age<30
- Segment 2 is composed by the men with age>30
- Segment 3 is composed by the women with age<30
- Segment 4 is composed by the women with age>30
- Using “advanced analytics”: using tools like Stardust (KMeans clustering, Hierarchical clustering,etc.) or SpadSoft.
In most of the time, the segments inside your population have been created “by a very smart guy a few years ago”. In the above example, the criteria that was used to create the 4 segments is the position of the candy in the cookie-jar.
Let’s now create a “ranking” based on our segmentation. A ranking is simply an ordered list that contains all your prospects sorted from the one with the highest probability of purchase to the one with the lowest of purchase.
We can easily compute the “probability of purchase” of the different customers inside a specific segment: it’s the percentage of buyers inside the segment.
To create our ranking, we will order our segments from the “best” one (the one with the highest number of buyers in percent) to the “worst” one (the one with the lowest number of buyers in percent).
Here is an illustration of the ranking:
The same ranking can also be illustrated in this way (the “Y” axis is now the cumulative number of buyers found):
The blue curve in the above chart is named the “Lift curve”. The lift curve allows you to “see” the quality of your ranking. Different “Analytical CRM tools” will have different lift curves. The lift curve directly translates into ROI! The TIMi suite includes a unique tool that directly estimates, based on the lift curve, the ROI (in Euros or Dollars) of your marketing campaign. A good “Analytical CRM” software will be able to directly find all the buyers and it will generate a “high lift curve”. The higher the lift curve, the better your ranking, the higher your ROI.
“Predictive techniques” can be represented like this:
The large bold red circle represents our predictive model. The objective of this predictive model (in technical terms: the “target”) is to find all the people that have bought your product.
This predictive model is making 4 errors:
- Two blue candy and one yellow candy are classified as “customers currently having the product”. These errors are interesting: they represent very good “leads” (i.e. customer that do NOT have your product yet but are very likely to purchase it)
- one customer is classified as somebody that did not bough your product but, in reality, he bought it.
Predictive models that are built with TIMi also give you in addition the exact probability of purchase of each customer. In the above example, the prospects that are inside the thin red curve have the highest probability of purchase.
The quality of the ranking obtained through predictive technique is visible on the lift curve:
Comparison (in terms of ROI) of
segmentation- based and
In a classical lift-curve, the X-horizontal-axis is traditionally in percent: it’s the percentage of the population selected. Also, in a classical lift-curve, the Y-vertical-axis is traditionally also in percent: it’s the percentage of the buyers found.
Let’s plot the 2 lift curves (the one obtained from the segmentation and the one obtained from the predictive model) on the same chart (the X and Y axis are now in percent):
In the above chart, there is a yellow line: This yellow line represents the “random selection”: For example, if you select randomly 50% of your population, you will “find” 50% of your buyers, thus the yellow line goes through the coordinate (50%; 50%). The “random selection line” represents the worst selection/ranking that you can do (i.e. it’s a pure random selection).
In the above chart, the lift that characterizes the ranking obtained through predictive analytics (the red one) is higher than the one obtained with the segmentation technique (the blue one). This is always the case. What does it means?
- Predictive model creates better rankings: On this example the ranking obtain through predictive analytics will typically generate 20% more cash than the ranking obtained through segmentation technique.
- The predictive model is able to extract out of your database the “right people”: The predictive model exactly “extracted” the right people: i.e. the ones that are interested in buying your product (and very few other people).
- Depending on the context, a difference between 2 rankings that is as small as 2 or 3 percent on the lift could mean millions of euros (or dollars) of difference between the corresponding marketing campaign. This is especially true in the banking, telecommunication and insurance world. In these fields, a few added percent on the lift curve directly translate to hundred thousands of added ROI for your marketing campaigns. You NEED to have the best lift. Otherwise, you are losing money at each marketing action.
The lift obtained with TIMi are systematically better than the lift obtained with any other commercially available analytical CRM software (it’s very common to have an improvement from 10% to 20% at X=10%) (i.e. it’s very common to have an added ROI of 10% to 20% when using TIMi, compared to other tool analytical CRM tool).
Why are there so many people still using segmentation techniques to create their ranking?
The answer is:
- Creating a predictive model used to be extremely difficult: Very often, you had to hire expensive “specialized consultants” during 2 or 3 months to obtain a medium-accuracy predictive model (and you have to wait even longer if they were using SAS). With TIMi, everybody can now create extremely accurate rankings/lifts in a few mouse clicks, in a few minutes. This is a revolution.
- Standard “predictive software” (like SAS, SPSS,…) have enormous difficulties analyzing huge databases. They run for hours without giving any results. Whatever the size of your database, TIMi always gives an extremely accurate ranking in a few minutes.
- Usually, software that are able to create correct predictive models are (very!) expensive (SAS,SPSS,…). To have the same functionalities as the “basic TIMi package”, it’s very common to pay between 170.000 and 240.000 euros per computer, per year. TIMi costs 24.000 euros per 4 computers per year. This is a revolution.
- To create correct predictive models with other tools, you need first to “clean” our databases to remove all errors (like “negative ages”). This process is usually extremely time-consuming (and expensive) and thus people generally avoid using predictive modelling. In opposition, TIMi completely remove the need for “cleaning” and allows you to directly analyze “RAW data”. TIMi can directly connect to your operational system and instantaneously give you highly accurate rankings & lifts. This is a revolution!
Here are some lift curves (automatically generated with TIMi) that illustrate the quality of different rankings:
Key marketing trends for 2013
As more companies take advantage of digital platforms and new ways of reaching consumers, marketers have predicted a number of trends that should be important for businesses this year:
Insight selling - Offering lifestyle information about a product that is educational and informative instead of relying on hard sell.
Content marketing - Interactive or digital material that influences a consumer’s choice during the research and decision making process.
Thought leadership - Creating the perception that the company is innovative and an expert in its field.
Digital analytics - Using the trail of information consumers create online to shape appealing content and inform insight selling.
Invest in Your Customers More Than Your Brand
To appreciate how broken most contemporary models of advertising and promotion have become, listen to Jeff Bezos complain about how Amazon’s core values are misunderstood. “One of the early examples…was customer reviews,” he recalls. “One [critic] wrote to me and said, ‘You don’t understand your business. You make money when you sell things. Why do you allow these negative customer reviews?’ And when I read that letter, I thought, we don’t make money when we sell things. We make money when we help customers make purchase decisions.”
Exactly. The overwhelming majority of advertising/promotion/marketing/branding investments and expenditures most organizations make today are more about “selling things” than “helping customers.” What do you think customers find more appealing? Amazon invests accordingly. Customers aren’t idiots; they know when they’re being sold. They’re both smart and wired enough to seek out — and appreciate — quality assistance.
Consider Amazon’s recommendation engines. They’re “membrains” interfacing advice and influence: advisory in recommending reasonable and relevant options, influential by basing those options on the choices of people with comparable interests. Bezos’ recommenders are predicated — and dedicated — to the proposition that providing meaningful contexts for customers makes purchasing decisions easier, safer and better. Shoppers are but a click away from learning more about their potential buy. That’s compelling. Recommendation engines and reviews have both proven remarkably (cost)effective sales, marketing and promotional media for Amazon.
The secret of their success, of course, is that they don’t sell. That insight’s neither counter-intuitive nor paradoxical; it reflects the marketplace reality that customers can easily discover everything about your products and services that you don’t want them to — whether it’s true or not. Consequently, the Bezos bet is that relevant recommendations and reviews — good advice — are better brand investments than digital sales pitches. Close the deal by being openly helpful and helpfully open, not by “selling better.” Amazon transformed customer behaviors and expectations by consistently favoring innovative “advice” over sales-oriented “advertising” and promotion. Credibility comes from commitment to facilitate decision, not calculate persuasion.
That’s brand building’s digitally-mediated future. In mobile and tabletized environments, “advertising” increasingly gives way to “advice” and “aducation” — genres that effectively and affectively persuade because they authentically try to do and be more than sales gimmicks. Digital technologies push firms to recognize, rethink and reorganize how they should make their customers smarter and more confident. Turning customers into bargain hunters, after all, doesn’t necessarily make them smarter; it teaches them to pay more attention to the price they pay than to the value they get.
The advice/aducation marketing challenge comes from redefining advertising as an investment that makes your customers more valuable to you, not just an investment that makes your brand more valuable to your customers. Amazon innovatively reinvests with that philosophy and that’s how — and why — it’s successfully redefining retail. Sales don’t drive the UX; they’re its happy byproduct. That digital design sensibility has yet to seep into marketing’s mainstream.
Like retail, advertising and promotion are living through their own version of the showrooming phenomenon. But rather than furtively (or brazenly) price check on one’s mobile in the retail aisle, customers treat typical ads, offers and “calls to action” as yet another piece of data to input into instant search. Precisely because it’s an ad or a coupon, it can’t be trusted. Ironically, that’s digital advertising’s “brand.” Everything worth buying is checkable, Yelpable or Amazonable. Crassly put, advertising becomes less about building brand awareness than triggering digital due diligence.
Where showrooming hollows out traditional retailing’s pricing and promotional strategies, “adzooming” similarly undermines brand narratives and advertising claims. The marketing implications here are infectious, viral and potentially deadly. How receptive will customers, clients and prospects be to hard sell/soft sell advertising and promotions online (and elsewhere) when they’ve been digitally trained and empowered to look for and receive quality recommendations and advice?
The answer to this question is not, “Gee, we need better advertising and promotion!” It’s “organizations need something better than advertising and promotion.”
The distinctions that make a difference will be value-added aducation and advice. After decades of complaints about the poor quality of its instructions and documentation, for example, Ikea set up a YouTube channel showing people how to easily put together its most complex furniture. The “ad”vice and “ad”ucation here is simple and straightforward: the more comfortable and confident Ikea can make its customers about assembling its products, the simpler and easier it becomes for them to make the buy. If you’re marketing, branding and selling Ikea’s brand future, you’ve got to wonder whether “training” and “education” will play marginal or pivotal roles in (re)engaging customers.
Take a quick look at MyLowe’s and P&G’s Pampers sites. They’re nascent — dare I say “baby”? — steps not just to rethink customer engagement, loyalty and “lock-in,” but whether and how to better educate and train customers. Again, information is a necessary but not sufficient condition here. Textbooks — digital or otherwise — are not educators. Who’s going to be the Salman Khan for a Starbucks, a Ford, a Haier and/or a GlaxoSmithKline?
Financial services firms, health care providers, automobile companies and consumer packaged goods enterprises already understand that adding new features and functionality to products and services now matters far less to their branding efforts than figuring out how to get customers to sample and test them. How are you using digital media to help your best customers and prospects to better educate themselves? How are you making them smarter and more capable? Companies like Amazon, Google, Apple, Ikea and IBM have answers to that question. What’s yours?
As fond as brand advertisers may be of talking lizards, doofus dads and hip hamsters behind the wheel, the cutesy and clever is rapidly decaying into memebait. They’ll command attention but little else. The digital and digitizing future belongs to the best aducators and advisors who make clients, customers and prospects measurably smarter and authentically more confident. That’s a challenge a David Ogilvy, Jay Chiat and Rosser Reeves would appreciate. But my bet is their clients will do a better of rising to that challenge than their successors.
A Better Way to Measure Consumer Influence
The power of consumers to influence each other when making purchase decisions is touted as one of biggest benefits of social media in marketing strategies. Consumer reviews, recommendations, referrals and other advice are becoming de rigueur in social-marketing programs.
The conundrum is how to measure consumer influence and what to do with it. This is where many approaches run off the rails.
Many companies, such as Klout and Kred, score individuals against size and participation in their social networks (typically on Facebook and Twitter), and then add esoteric ingredients like Wikipedia entries, comment frequency and other social criteria. The idea is to arrive at a magic number for each individual that captures her influence. But does it? We think not.
Metrics used to capture influence must be ones well understood and accepted by marketers — such as product awareness, brand engagement, trial and purchase. But most social-influence metrics use inventions such as Likes that convey little. And, unlike digital programs that measure actions such as open-rates and click-rates, social influence metrics are rarely based on measurements of causal behavior.
Influence depends on context. A young mother may likely know other young mothers, hence may impact their decisions on baby products and schools, yet have no influence over their financial choices. A universal influence metric for such an individual doesn’t capture this. Similarly, a high score doesn’t translate to commercial impact. President Obama ranks among the top five Klout scores, but that doesn’t translate into anything meaningful for any product, service, or brand.
Another weakness of current approaches is that they fail to find creative ways to identify, engage and integrate influencers into brand initiatives like promotions, advocacy, market/concept testing and retention programs. Influence and social information get wasted by not being integrated with other consumer databases that business already track and use.
What’s the alternative?
One way is to integrate social referrals into marketing programs focused on customer acquisition and retention. The impact of social engagement can be precisely measured and attributed against relevant acquisition and conversion metrics. Some market examples:
- AT&T has institutionalized a referral program for all its product suites that awards promotion cards, so influencers can be identified and then targeted.
- Marriott ties referrals from customers into a rewards (loyalty) program, by automatically awarding bonus points and letting customers track it themselves.
- Large CPG companies like Sara Lee, Unilever and others use social referrals to amplify their promotions programs by getting consumers to share coupons — driving additional awareness and conversion.
- ING Brokerage (now Capital One) tracks referrals for its customer acquisition program to study how customers are being influenced.
A big difference is that these approaches measure actual, rather than presumed consumer influence. They also present opportunities for innovation:
- Influence is measured against actual marketing metrics like impressions, clicks and conversions, rather than trumped-up social metrics.
- Impact is measured against specific program objectives: such as the number of new membership signups.
- Influence scores can be calculated to uncover the biggest advocates.
- Information on the social influence of consumers can be integrated with CRM data (already in brand databases), improving targeting and engagement for future programs.
Influence measured through direct observation of advocacy on behalf of a specific brand shows consumers’ true commercial clout.