Part 3 of 3 of my Trends in Publishing post, which covers trends in metadata and collecting publishing data. From the original post: “While researching my thesis project for my M.S. in publishing at NYU, I came across several interesting trends that I think will affect publishing. In fact, I’ve already seen some of these at work in the three months since I wrote the report, and I plan on using this information for my own startup, Write or Read.” I’ve broken up this post into three parts. Read parts one and three at Trends in Publishing and Trends in Publishing (Part 2).
Liz Scheier, who works on the Content team at PubIt, said in a personal interview that the “number one question [for authors] is, how does anyone know my book exists?” The answer lies in metadata. As more books become digitized, discoverability becomes increasingly important. Metadata is data about data, and all book metadata “gets indexed by search engines, [which] decide which page should come on top on a search for the content included in that page.” Currently most publishers only use basic metadata for distribution.
Basic metadata typically includes ISBN, price, pub date, description, author name, and publisher. Enhanced metadata, which may include relevant tagging and other pieces of information, can help sell more e-books. Author bios, excerpts, media reviews, prizes and regional codes are all metadata that can be fed into algorithms that will recommend books to readers and help them find new content. In this way, metadata is a marketing tool. Therefore, it’s important that publishers provide accurate data, which unfortunately is not usually the case. People often cannot find certain books because incorrect data prevents search engines from indexing them. Many publishers would benefit from an automated system that could accurately record all the metadata.
But metadata is more than a marketing tool. Working with metadata on a more granular level, such as defining key words per paragraph or per chapter instead of only adding a basic description of the book, can help publishers analyze their catalogues and make smart business decisions in the future. If each book had large amounts of enhanced metadata, eventually the publishing industry could combine the information and analyze the data, looking for trends. After enough data is collected, publishers could detect patterns, such as association learning, cluster detection, classification, and regression. Association learning is used to drive Amazon’s recommendation system—if someone likes x he or she will like y—, cluster detection is used to recognize categories or sub-categories in data, classification can help filter information, and regression “can be used to construct predictive models based on many variables.” An example of regression is if Facebook used factors such as number of photos tagged, likes, and comments to predict future engagement on its site.
Many companies already use data mining, and if publishers were to increase their use of metadata and mine data, then they could predict bestsellers, determine what makes a book successful, and find gaps in their catalogues. There is already a trend toward using metadata more effectively.
Book Country, a social community for self-publishers, uses metadata for its ‘Genre Map.’ The ‘Genre Map’ allows users to explore the number of books on the site that fit into subcategories of the main genres: mystery, romance, thriller, fantasy, and science fiction. Clicking on a subcategory reveals where all the books in a section fall in terms of tone, including sexy, funny, dark, or realistic. This system helps readers find stories they will want to read.
Several publishers have described the process of creating books as a “crapshoot,” meaning they never know what will be a bestseller or what makes a book successful. Both agents and publishers hear about new ways to publish digitally everyday via industry blogs, such as F + W Media’s new subscription sites, Graphicly’s distribution platform, and BookBaby’s e-book conversion services.
But having metrics on books is becoming increasingly important. Novelist Scott Turow said he’s long been frustrated by the industry’s failure to study its customer base. “I once had an argument with one of my publishers when I said, ‘I’ve been publishing with you for a long time and you still don’t know who buys my books,’ and he said, ‘Well, nobody in publishing knows that.’ If you can find out that a book is too long and you’ve got to be more rigorous in cutting, personally I’d love to get the information.”