The task for the data scientists at Bloomberg is to make the ocean of information discoverable and relevant for its subscribers.
This cool button delivers CIO stories to you on Facebook:
Financial data specialist Bloomberg employs hundreds of data scientists to keep users hooked on its ubiquitous Terminals - the keyboards and monitors that give financial staff access to reams of market information.
In the background the Bloomberg Terminal crunches through 80,000 news wires, 4,000 FX feeds and 370 exchanges, driving around 60 billion data points a day to keep terminal users up to date on the global financial markets. The task for the data scientists at Bloomberg is to make this ocean of information discoverable and relevant for all of its 325,000 subscribers.
Gideon Mann is head of data science at Bloomberg. His role is to manage all of the data science that occurs across the large and varied organisation.
"The bulk of data science that happens is building products," Mann told Computerworld UK. "So my role ends up being mostly managing strategic, technical initiatives in basically three areas: natural language processing, search and machine learning that are embedded into products which serve the terminal."
The overall aim for Bloomberg is to have "any data that is relevant to the financial industry on the terminal in a normalised way," Mann said.
While many will think of Bloomberg as a media organisation, subscriptions to the terminal and all of the associated data services - bundled together for around £20,000 a year - is used by thousands of bankers, traders, analysts and financial reporters, and is core to Bloomberg as a business.
In terms of where the data science comes in, Bloomberg started by experimenting with machine learning for sentiment analysis nearly a decade ago. Mann admits that it took some time to get the organisation to fully commit to machine learning - the computer science technique of teaching a machine to learn and adapt on the fly as it is fed large volumes of data - but the success of this project legitimised it for upper management.
"It took a number of years before the company realised that this particular competency takes a while," he said. "Engineers can do it but it is not simple. So then the company started to commit to hiring and investing in quantitative programmers." Now Bloomberg has between one and two hundred data science specialists within the organisation, according to Mann.
Once the organisation had proved the usefulness of the technique, and built the skills in-house, it started to apply the technique to the internal search on the terminal to improve data discovery through better ranking algorithms.
A more recent project used computer vision to pick out data from tables embedded deep within financial reports and filings, a task that was once performed manually by programmers.
"What is much more successful is to use object recognition techniques on those tables," Mann explained. "So it will recognise the boundary of the table, overlay with pick out columns and rows from that table into our database." This means higher accuracy and speed.
Next, Mann wants to use techniques like computer vision and natural language processing to improve the breadth of financial information available through the terminal. The aim is to allow users to increasingly make queries on the terminal using natural language instead of specialised commands.
"A lot of financial data is numbers, but a lot of the things that happen in the world that are pertinent to finance are expressed in language, either news stories we generate or aggregate, or press releases or documents the companies put out themselves, or even statement by officials," Mann said. "All of that has a dramatic and fast change on the market.
"So the bulk of the data science and machine learning work we do is language processing, applying structure on top of it."
Mann believes that Bloomberg has got much better at hiring data scientists over the years, as it has grown to understand which people the organisation needs - mainly computer science PhDs, if you were wondering. "I don't want to come off as too braggadocious, but we have got better at [hiring]. We spend a lot of energy on it," he said.
"We understand what we want and look for and over the past year we have significantly increased the quality of applicants," Mann explained. "They were always very good but the change has been we are able to hire people with a skill mix which is closer to what we need, cutting down on the training they need. I think we understand more of the people we need and the places we need to go and the universities they are coming out of."
Essentially Bloomberg has steadily shifted away from statisticians and more towards quantitative programmers for any data science that occurs within its walls.
Now Mann is taking this strategy a step further. "We used to have the idea that each of these quantitative programmers had to be full stack," he explained. "So take the data, clean it, structure it, support infrastructure, build a machine learning model, deploy it, babysit it and do fixes."
Now he wants smaller teams of specialists working on projects. For example, a data engineer, data scientist and production engineer working on a specific product within the terminal.
He recognises that the industry is changing so fast that close ties to academia are integral to stay up to date with the latest technology trends.
Mann not only spends a lot of his time engaging with academia, either through publications, bringing in guest speakers on a monthly basis or Bloomberg's own faculty grant programme, but Bloomberg itself promotes attending conferences for its technical staff.
"We send a tremendous amount of people to academic conferences, with the main aim being for them to learn and to be challenged by the experience of what is happening in academia," he said. For example, Bloomberg registered for 44 staff members to attend the the Machine Learning Symposium in New York this month.
In terms of tooling, Bloomberg has steadily shifted away from proprietary systems and vendors for data collection, processing and search to more open source solutions like Apache Spark and Solr.
Mann admits that moving from vendors and proprietary software was something of a culture shift.
"When people talk about free software they say 'it is not free like beer, it is free like puppies' because it needs a lot of love and care," he said, adding that the people at Bloomberg eventually saw the benefit of contributing to open source and the sense of control it brings.
"Open source has really changed the way that we do business," Mann said. "Traditionally we built most of our stuff from scratch, for example generations of database technology, which created speed and reliability constraints.
"With big data processing, over the past five to ten years the impact of Hadoop and now Spark has given us a whole new set of tools, and we are investing heavily in both of those. There was a time we were involved heavily with HBase but we are very aggressive with Spark right now. I don't know if we are an early adopter but we are certainly all in."
The company has started surveying users to assess their interest in an improved TweetDeck
Advanced tech conferences, which draw large international audiences, could suffer.
Working with data scientists requires an alternative approach to business in which logic overrules creativity