A pipeline has been developed to extract executive compensation data from SEC filings, specifically targeting Summary Compensation Tables within DEF-14A proxy statements. Utilizing MinerU for parsing PDFs and extracting table images, along with Qwen3-VL-32B for classifying and structuring the data, the project addresses challenges such as tables spanning multiple pages and format variations between pre- and post-2006 filings. Although still in development with some bugs, the pipeline aims to compile a comprehensive dataset of executive compensation from 2005 to the present for all US public companies. This initiative is crucial for improving transparency and accessibility of executive compensation data, potentially aiding research and analysis in corporate governance and financial studies.
Read Full Article: Pipeline for Extracting Executive Compensation Data