Paper Title
Machine Learning using Instruments for Text Selection: Predicting Innovation Performance

Abstract
In machine learning we utilize the idea of employing instrumental variable such as patent records to train the texts. Patent records are highly correlated with R&D expenditures, but are not necessarily correlated with performance residuals not linked to R&D. Thus, using instrumental patent records to train word counts of selected texts to serve as a proxy for firm R&D expenditure, we show that the texts and associated word counts provide effective prediction of firm innovation performances such as firm market value and total sales growth. Keywords - Machine Learning; R&D Reporting; Textual Analyses; Firm Innovation