Public float data from Ewens, Xiao and Xu (2020)

Data for "Regulatory Costs of Being Public: Evidence from Bunching Estimation" (Ewens, Xiao and Xu)

Public float data from Ewens, Xiao and Xu (2020)

This repository contains data and some code for the research paper “Regulatory Costs of Being Public: Evidence from Bunching Estimation” (Ewens, Xiao and Xu).

Public float data

We collect public float data from firms’ 10-K filings. These filings disclose the market value of all outstanding common equity (voting and non-voting) held by non-affiliates at the end of the second fiscal quarter. The data (public_float.csv) contains cik (SEC identifier), the year (publicfloat_year) and public float in millions (publicfloat_mil)

Length of 10-K and 10-KSB parts: small firms, 1994-2007

For the set of public firms with $25m or less public float, we collected all the 10-K and 10-KSB filings from 1994-2007. We then parsed each file (after removing html and line breaks) to create simple string lengths of each of the 4 parts of the annual report. The goal of this exercise was to determine the real differences in total disclosure (in characters) for “small business issuers” vs. other firms during the sample period. The data is available here and has the following variables:

  • cik: SEC identifier
  • form: form type
  • coname: company name
  • fyear: fiscal year
  • fsize: the raw file size (in MB) of the 10-K or 10-KSB filing
  • url: the link to the filing
  • sbi: equal to one if the company was a “small business issuer” in that year, 0 otherwise.
  • lengthpart*: the length (i.e., non-html or line break characters) for sections n=1,2,3 and 4
  • missingPart*: equal to 1 if the part reference – 1,2,3 or 4 – was not found in the filing

Urls to 10-K filings

This data(zipped csv) contains the URLs to all 10-K and 10-KSB links from 1994 to 2020 (last update 12/1/2020). We used a subset of these links to build the 10-K(SB) length data posted on the repository. The variables in the file are:

  • cik: SEC identifier
  • reportingdate: date of 10-K fiscal year
  • form: form type
  • filedate: date of filing
  • fname: the URL to the filing
  • coname: company name


Please use the following citation if you use this data:

Ewens, Michael, Kairong Xiao, and Ting Xu. 2020. “Regulatory Costs of Being Public: Evidence from Bunching Estimation.” doi:10.31235/


@article{ewens_xiao_xu_bunching, title={Regulatory Costs of Being Public: Evidence from Bunching Estimation}, author={Ewens, Michael and Kairong Xiao and Ting Xu}, journal={Working paper}, year=2020 }