In today’s Podcast, we have Gaurav joining us. Gaurav has been part of mission critical teams at two San Francisco startups – LinkedIn Slideshare and Klout. He previously co-founded ThisYaThat, an online book portal for the Indian market, which won the Wharton Entrepreneurship VIP Seed Award and was the first non-US venture to receive an entrepreneurial grant from Wharton’s Innovation Fund.
He is featured in Hindustan Times, Yahoo Finance, YourStory, and TechCircle. He is also a co-inventor of application-agnostic user search engine and most recently, he had been building Profillic, a product to fix candidate’s screening by applying machine learning to the problem of skill-validation.
He is finishing up his masters from Columbia University with a focus on computational linguistics and natural language processing. And he is also collaborating with the data services and machine learning team at Google Nest this summer.
An interesting fun fact about Gaurav is that he likes to turn off microwave at a perfect square number.
In this podcast, we catch up on:
- Linguistics and Growth Hacking
- Scientific Copywriting
- Computational Linguistics and Growth Hacking
- Real world examples of copy hacks
- Popular Language Datasets and Tools of Trade
- Gaurav’s next big project (top secret project I had to twist his arm to get him to speak about it)
- Top resources and books on Natural Language Processing
Note: Transcripts may contain grammatical errors.
Welcome to the show Gaurav! How are you doing today?
Gaurav: I’m doing well. How are you?
Dhaval: I’m doing great as well. I’m here in Toronto at a SaaS entrepreneurship event. And it’s going to be fun, but I’m glad we got a chance to catch up here with you. So, I heard a lot of great things about you Gaurav the first time I heard you speak was at customer acquisition conference at Growth Hacks event. And just to bring our audience up to speed on the topic, you mentioned you were talking about how language plays such an important role in growing business. Can you bring our audience up to speed on what kind of role does language play in helping business & marketing growth, growth hacking, and all of that good stuff?
Gaurav: Sure. So linguistics, which is what I study, is the study of human language and everything about it, structures, syntax, semantics, but the study of language form meaning and in context. So, speaking about products and services, the main two things you can change about the user’s direct interaction with your products is design and copy.
Design , of course, is how things feel and look visually, virtually, tangibly to do whatever the product is intended to do. But a copy is any sort of language text related to a product. You can write a website, email, product descriptions online or offline, any design, any ad, any sort of language – that is a what copy is about.
Now, Noam Chomsky, who is a professor at MIT, who also is very well-known in the field of linguistics and psychology, taps with all the intersection of linguistics and culture, how linguistics had an impact on psychology and how language influences that way people see the world. How it shapes the way they think and by extension also what kind of things they use and how they see themselves with.
Experiment Engine and a few other places have really good numbers on this, where copy and design being the two things that people can touch to grow their business besides the product itself.
Design test are used more often than copy test. But in fact, the impact of copy-based test is twice as much as design test. So, just changing words on your messaging and how you reach to people and how you brand yourself can have a huge impact. But people tend to underestimate the value of this.
So this is what I’m going to talk about today, just the value populating in language and how you position yourself as a brand and your product and services for people.
Dhaval: Very cool. I did not know that, more so the tests. For most of the test, copy outweighs design by at least, twice. That’s pretty awesome. A lot of people, rightfully so, pay a lot of attention to user experience like you mentioned and design, but we also had to realize that language is a critical part of that user experience and triggering the right emotions at the right time can mean a lot of difference for different scenarios.
So, thanks for pointing it out. I really want to understand a little more since you brought up a couple of legends here – Noam Chomsky and linguistics and culture.
What difference does it make when a brand decides to add a bit of scientific approach to creative language? Is there anything even like that taking a scientific approach towards creative language and copywriting and what difference does it make?
Gaurav: Sure, so the most common example that comes to mind are things like when people, their products, and services mention features as oppose to the benefits that something provides to people. Social Proof is another thing, where if someone sees that other people are using the same product, they’re more likely to associate themselves with it. And going even deeper in terms of how specific words and specific phrasing can have an impact on how people view a brand and how much they’re saying you’re willing to pay for something.
Here’s a really good piece of work by Daniel Jurafsky, who’s a professor at Stanford in the field of natural language processing. His work on the language of food talks about how essentially even the same food in the same conditions, if you use certain words that sound fancier or sound more sophisticated and refined, people are willing to pay more because they see themselves and their personality as tightened better to those things versus food that seems like it’s fast food or made quickly or it’s just something that is more convenient than a luxury item.
Those are some of the examples that definitely come to mind. The other aspect of having a scientific approach to the copy you use is, if you’re not exactly measuring or going about it in a casting approach, you’re just basically trying different things and hoping something works or something is fixed, which is commonly known as a spray and pray approach. So essentially can and prove we can measure. So adding in a basic level of a scientific approach or A/B testing then effectively can make you see what words runs innately well, what kind of phrasing, what kind of messaging works well with people looks or audience is gonna filter.
Dhaval: Yeah, that’s a great reference there, Dan Jurafsky. I’m gonna check out his book that you’re referring to or his course. One thing that comes to mind Gaurav is, taking this more scientific approach towards copywriting and I’m wondering if you’re aware of any products or services that actually, taking this whole natural language processing field and applying that to their copywriting and doing it very well. Are you aware of any companies or products that are doing that?
Gaurav: Sure. So, there are plenty companies that are doing that and we’ll definitely talk about them. However, the most things that… the things that stick out the most that are not say, super technical or you wouldn’t even imagine are two such example. One was in fact Obama’s 2012 campaign where the use of scientific approach to the wordings in the campaign message and emails, they try 2 versions where for unsubscription from their emails, one just simply said “Unsubscribe” in Obama’s free email and one says, “If you like to unsubscribe from these messages, click here”.
The added length of sentence and the cognitive overhead influence those reading the sentence and making sure that they know what they are doing before they click through, basically yielded them a 20% drop in unsubscribes, which is not insignificant, that’s a decent number.
The other big one was a study that people learn on the donation process of American Cancer Society. And one of the wordings were simply, “Would you be willing to help by giving a donation?” And, the other which they were testing this test was they added at the end of the sentence, “Every penny will help.” And they saw that given when people add minimum parameters to kind of such some phrasing that people are reading, they saw up to 2x improvement and people actually donating. It’s just because this all boils on to essentially the same thing which is they reduce the cognitive overhead on people’s minds. They didn’t have to think how much should I donate as a decent amount. They saw that even a single penny will help. That reduces the thought process and make them easier to do… easier to be alert and so much into offers and take an action step forward.
Dhaval: I love it. The reduction of cognitive load results in a reduction of unsubscribes, that’s good stuff right there. A lot of times those little things towards the end of the email make a lot of difference, And, thanks for sharing that.
Gaurav: Absolutely, it’s often not intuitive, for sure.
Dhaval: Yeah. Yeah. The thing that I wanna go back to is, your background which is… you have a pretty good technical background, but you also understand human psychology and the whole behavioral psychology aspect of the marketing, which brings me to my next question which is the background that you have with computational linguistics and natural language processing, how does that… how do you apply that beyond your traditional copywriting skills that you already possess? How do you apply that towards growth hacking?
Gaurav: Sure. So, the main things to keep in mind here is you’re trying to develop user empathy. As in you’re trying to see how the target audience that you’re trying to get to, the people who use whatever you’re selling, how they talk, how they see themselves and how they are likely to use your product in the way you position themselves.
The technical tools that are often used for this, given that it come from the technical background are: Python is really useful, both for generally coding stuff as well as for web scrapers as in getting data, textual data from say, blog post, or just the different variety of sources and they have really good libraries. The famous one being Stanford NLTK (The Natural Language Toolkit) which is in Python but varieties are available in different languages as well. There is Word2Vec which use dataset of word embeddings, which is what kind of words are related to each other, synonyms and similar words. You can do all these word stuff. Word algebra like king minus man plus woman will give you queen.
This sort of interesting math stuff or it’s designed to fix stuff if you will with language reference.
And then, famous sets of data that give information specifically in English documents are the Brown Corpus and Penn Treebank data set. They have done a lot of textual mining on a variety of sources and actually done a lot of heavy lifting for people where they have word embeddings, they have a part of speech embeddings, just you can understand textual data way better than if you were to do it manually.
Now, if you use these tool sets, if you wanna understand your users, some things could be as interesting as looking at social data, saying what people are talking about your competitors – the good points, the bad points, how people view the general space, look at Twitter data, look at data from Amazon book reviews, look at data from quora, and then mind goes to see what kind of keywords, what kind of topics occur most commonly, what… which one of those have a positive sentiment, which has a negative sentiment, and which one of those are most unique and often not used by the brand’s messaging.
That would be a clever way to honed in on what people are trying to say, what people like and also, things that your brands are not noticing especially if they’re not as data-driven as they’re trying to be. So, word klout analysis of competitive landscape and also trying to see where you wanna put your brand personality.
So, the main 5 dimensions of brand personality which are sincerity, excitement, competence, sophistication and ruggedness. So say, GoPro would lie in excitement and ruggedness. Uber would be within the dimension of competence and sophistication. You wanna analyze based on these also, how your end-users see themselves, how you can classify them in this category and what kind of wording you wanna use that overlaps the most with these as well.
So, boiling down to user empathy, what makes you most likely for the end users to see themselves in you and mentally associate yourself… mentally associate themselves with your product, that includes kind of the whole business strategy.
Dhaval: Wow! So, where… I’m very curious about the 5 brand personality… 5 dimensions of brand personality. The sincerity, excitement, competence, sophistication and ruggedness. Where is there a reference to where we can find more of these? Or where we can learn more about these?
Gaurav: Yes, this was actually a paper by Jennifer Aaker, I believe a professor at Stanford. It was then, the late 90s or so and they’re talking about how it can humanize brand and show brands as a human personality. And then do lots of intelligent marketing around that as opposed to people not having an idea… not having any idea about what a brand is but it all boils down to, like you said, user empathy.
If people can kind of humanize your brand even, they’re more like to see it. It’s usually emotions that can generate any people when they look at your product. And, then they rationalize a part of the vision as opposed to the other way around.
If something makes your people feel happy, you’ll love using your product, they will rationalize staying for your product or using it more often than if there’s something else that may do the job better but doesn’t make them feel in a good way, you’re gonna outperform them.
Dhaval: Yeah, that’s very interesting. Even I’ve heard that sometime in the past where people tend to use emotions to make decisions and then rationalize it using logic.
Dhaval: And if you understand those emotions, you can easily help them make those decisions using this technology. So, very powerful stuff here; do you know of any brands of products are actually doing this kind of reverse engineering of emotions?
Gaurav: Sure. So, some examples would be, Airbnb is a very good example. They started out as kind of being a competitor to Craigslist home inventory listings. But gradually, they started to positions themselves as a friendly home away from home then to something hospitality and lodging, but they talk about how Airbnb is a way to book homes from local people and experience the place like you live there.
It kinda puts a whole human touch too, would you wanna get a place from Craigslist or hotel where in fact, instead you could travel and experience a nice place and live like local people there.
Another one as a good example is Textio, which is definitely really interesting where they do a lot of NLP analysis and are able to look at your job listings and show you what might be wrong or what key phrases you can change around that can work well or what kind of words and stuff add bias.
So, for example, if you have certain words where typically you say some tech star, we’re looking for a rock star a ninja or killer kind of coders. That tends to attract only, say a male coder of a certain variety. Whereas, if you want a more diverse pool and people who’re a little more mature, you might want to be careful about how you’re positioning yourself to prospective job candidates.
Another one that I use to trust in a lot for keeping notes with everything is Evernote. And I noticed how they do A/B testing on some kinda messaging they have on their landing page. One is just inspiration strikes everywhere, and Evernote lets you capture, nurture, share ideas across some device. And, the other is modern lesson can be complicated, simplify it with Evernote. That is interestingly free that even the company at this stage actively A/B test and sees what kind of things to focus more on and how people see themselves using Evernote. Is it more about the fact that they wanna have mobility and store stuff everywhere whether they’re on the go or just sitting at the desktop or is this more just about Evernote being an organizational kind of tool and managing those efficiently?
Another example that is very relevant to the process of understanding end users as a tool is Lithium. They actually have… they actually have a service where they work on the background with stuff like consumer forums. Say for Sephora, for Microsoft, for Verizon, all those big companies, they’re multinationals.
What kind of stuff would people think about them on the internet?
What kind of stuff will people think about them on their forums or other forums?
And what kind of keywords in profits are trending?
What kind of analysis can you do to see what things might be bubbling up or becoming bigger issues?
Or how do people feel about their brands?
And these are really popular things to understand are made possible only by a computational linguistics analogy.
Dhaval: Wow, those are some great examples. I’m definitely going to give it another listen. Thanks for sharing all those awesome examples Gaurav, so…
Dhaval: Yeah. Yeah. So I’m curious about, you got a lot of things going on in your life, but are you passionate about any personal projects at this time? Are you working anything exciting that you wanna share with us?
Gaurav: Sure. So I have been working on the side on Profillic. The idea boils down to how the current hiring process and more specifically candidates cleaning still has a lot of human biases in play. To a large extent, it’s often people looking at resumes and looking up proxies, or how a person do on a job. So they look at their GPA, schools, brand names, people you work with and subconsciously, even your actual names which is a study done by Harvard saying that certain kinds of names with exactly the same resume, if you changed the name, the performance varies, which is extremely odd which means that since we’re humans, there’s a lot of subconscious bias that goes on, and that’s looking at something that’s objective piece of paper like resume.
The other aspect is, say you and I work on the same… you and I have the same kind of skill set. If someone’s looking at our resume or piece of paper and text on it to kind of gauge who is a better fit for the job, you’re actually just optimizing for who writes a better resume than who actually have a better skill, whereas if I write something like, “I have a lot experience in this and got it really well.” Or if you’re not able to portray yourself as well, but you were to say, a little bit cooler, I might have the upper hand in getting in and you might be filtered out. It’s a lot of thoughts, positive thoughts, negatives based on people’s biases on such textual data.
But what I’m trying to solve is to be kind of a blind analysis to not look at people’s names. Put their resume aside, and then roughly do a data modeling of their performance on a short kind of quiz and see through a bunch of data points how well it matched out to the ideas of a candidate.
This is still in the very early stages, but a lot of companies have shown interest like there’s a kind of gets around the whole aspect of people from not top to your schools, are not from good backgrounds, tend to be biased against seeing that they are really well in what they do.
This also ties into like what we just talked about earlier, job descriptions, what kind of words you used, how even computer science is portrayed to people in terms of say learning the technical skill set versus what kind of impact you have when you talk about technology, and you talk about what kind of things you can do like databases operating systems, web development that tends to attract more from male crowd. Or as we talk about, “Hey you can use Computer Science and say it make cool stuff and have an impact on society or understand say, how traffic works and all the kinds of stuff like that. It tends to attract a more diverse crowd.
So those are some interesting things I’m working on the side, but also applies to this whole thing of how the language and psychology kind applies together in a very interesting manner.
Dhaval: Wow! That sounds like a product that would definitely be used… that could definitely be used by every organization that wants to recruit anybody. That could be huge. I’m looking forward to the evolution of it.
Gaurav: Absolutely. It’s a… it is a very powerful thing to do. And so I’m very excited to work on it and, hopefully putting it in the hands of few people in the next couple of months. Keep you posted for sure.
Dhaval: Yeah man, yeah. Totally, I wanna hear more about it. One thing I always do with my guest, Gaurav, is I ask them this round of rapid fire questions which is my favorite part of the whole Podcast which is, asking you some questions off-the-cuff and you will answer them. So here we go, I’m gonna ask my very first question: if you could put a data science related billboard anywhere with anything on it, where would it be? And what would it say?
Gaurav: Sure. So, this would be in downtown San Francisco. This is in the mecca of tech startups, as you probably know. And, I would put it to say, “The plural of anecdote is not data”. Now, to elaborate, it would be around the concept of population bias. People tend to make statistic fail whenever they already kind of having thought all along.
And, that’s a very big problem toward like you can make… you can massage numbers to say whatever you wanna say. But, data science is nothing about enough good data – garbage in, garbage out. So, to put it simply, the plural of anecdote is not data.
Dhaval: Wow. That’s… that’s amazing. I love it. I love it. The second question is, what data or decision-related product that you’ve purchased in the last 6 months that was under hundred bucks but had a huge impact on your life? A product that would have something to do with either data or decision making?
Gaurav: Cool. So, since I’m a lot into NLP, I was recently checking out neural networks, which are a lot more powerful than say, traditional machine learning just for the sheer amount of analysis you can do. And, here is one, an actual physical copy of the Speech and Language Processing built by Daniel Jurafsky.
And, also about 30 bucks of AWS credits to work with Jurafsky processing units. It’s really powerful for running things like TensorFlow. That was my significant purchase in the last 6 months.
Dhaval: Nice. Always, always like the answer to those questions. It’s always unique for everyone. So, thanks for sharing that Gaurav. When you think of a data person being successful, who is the first person that comes to mind, and, if I may ask, why?
Gaurav: Sure. So Andrew Chen, who’s currently working on Growth at Uber, essentially coined the term growth hacking. And said that Growth Hacker is the new VP of marketing. And, he talks about the very technical approach of how you market a product at the intersection of engineering and marketing. And he currently advises startups like Dropbox, Barks and then AngelList as well. And, he is one of the best people in Growth Marketing, Growth Hacking in Silicon Valley.
Dhaval: Cool. I’ve definitely heard of him, and I’m looking forward to exploring more now that you recommend him. One other question I have is, what’s a good data related book that you often go back to the most?
Gaurav: Sure. Maybe it’s a tie between 2. These are very strong fundamental books and explain everything in great detail. The machine learning side and data science side is “Pattern Recognition and Machine Learning“ by Christopher Bishop. And on the computational linguistic side and understanding of language and the intersection with data is “Speech and Language Processing“ by Daniel Jurafsky and James Martin. Dan Jurafsky is the same person from Stanford we talked about the language of food in case you remember but really cool stuff.
And in case you are working with Python and Natural Language Processing there, a book that might be good for you is Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit, explains everything in great detail, very simply and helps anyone gets started with NLP in Python.
Dhaval: Yeah, I’ll check that out. I definitely heard a few of them. So, looking forward to learning more about it. This is great. Thanks for sharing that, Gaurav.
Gaurav: Yeah, you’re welcome.
Dhaval: And, my final question, this is my favorite question which is, you can kind of refer to it earlier: which story do you use most often when communicating with C-Suite about data science?
Gaurav: Sure. So I have had a good fortune… a good fortune of working with people especially Silicon Valley who understand the importance of data science and are pretty data-driven. However, just in general, it is good for mind people that the bottom line is the impact in a business like, if we’re to able to understand the people using the product and service, that is what underscores and drives company’s numbers.
So for competitive advantage, an efficiency of processes and everything in general, it’s like shooting in the dark and using a spray and pray strategy, you won’t be able to measure things and improve them. You cannot improve what you cannot measure. And the way to make smart decisions is only through data science.
Dhaval: Very cool. Very cool. So, I’m so glad we had a chance to chat today. And, I’m sure there’s gonna be a lot of people wanting to reach out to you and connect with you. Is there anything you can share with the audience on how to connect with your… I’ll be sharing that in the blog post at www.dataleaders.io. Is social a good medium to reach you?
Gaurav: Yeah, sure. I’m pretty social savvy so, it’s totally fine to reach out to me on LinkedIn or Twitter, Facebook, pretty good about them as you were.
Dhaval: Great. Awesome. So, I’ll share those links in the blog post, and I’m sure a lot of people would wanna reach out to you after that. So, thanks so much for being on the show Gaurav. I’m really glad we had a chance to connect and talk about some of your passions and where some of my passions intersect.
So, thank you so much once again and looking forward to catching up with you once you have an update on Profillic.
Gaurav: Yeah, thanks for having me on the show. It’s always a pleasure talking about this kind of stuff. Thanks again, man.
Dhaval: Yeah, we’ll catch up soon!